0% found this document useful (0 votes)
3 views37 pages

Topic 3_simple Regression Analysis

The document discusses simple regression analysis, emphasizing its purpose in estimating and predicting the average value of a dependent variable based on one or more independent variables. It outlines the differences between correlation and regression, the importance of including an error term in regression models, and the assumptions underlying the classical linear regression model. Additionally, it distinguishes between simple and multiple regression analysis, highlighting the significance of economic theory in guiding regression relationships.

Uploaded by

maddoxvictor5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views37 pages

Topic 3_simple Regression Analysis

The document discusses simple regression analysis, emphasizing its purpose in estimating and predicting the average value of a dependent variable based on one or more independent variables. It outlines the differences between correlation and regression, the importance of including an error term in regression models, and the assumptions underlying the classical linear regression model. Additionally, it distinguishes between simple and multiple regression analysis, highlighting the significance of economic theory in guiding regression relationships.

Uploaded by

maddoxvictor5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

TOPIC THREE

SIMPLE REGRESSION ANALYSIS

3.1 INTRODUCTION

Just like correlation, regression analysis is concerned with the study of the dependence of one
variable (the explained/dependent variable) on one or more other variables (the
explanatory/independent variable(s) with a view to estimating and/ or predicting the average value
of the dependent variable. The main advantage of regression is that it can easily be extended to
more than two variables.

To illustrate, consider the scatter graph below which shows the relationship between two variables
x and y and a line fitted among the scatter points.

y . +u
. -u

. . .
x
From the scatter graph, we notice that despite variability between X and Y, there is a general
“tendency” for the variables to move together- i.e., as X increases, so does Y; as shown by the line
of best fit. The fitted linen among the scatter points is actually the regression line. Thus regression
analysis aims at finding the line of best fit among variables. The regression line is thus as:
Y    X , such that:
Y≡ dependent variable  = intercept

X≡ explanatory  = slope or gradient


The relationship Y    X is called a deterministic or functional relationship since it shows
us that X and Y have an exact relationship. However, in real life, not many variables have such an
exact relationship. Indeed, not all points lie on the regression line as shown. Some points are above

1
the line while others are below the line. Only a few are above the line while others are below the
line. Only a few are on the line.

Thus a more realistic relationship between X and Y will actually be given as Y    X  u ,

such that u is called an error term or the disturbance term. The function Y    X  u is
called a stochastic or statistical function. For points above the regression line, u is positive
while for points below the regression line, u is negative. A variable such as u which can take on
any set of values, positive or negative, with a given probability, is called a random variable or a
stochastic variable. Thus the error term implies that not all points will lie on the lines. It actually
represents the variations in Y than cannot be explained by X.

The reason why we include an error term in a regression model is due to:

i) Measurement errors; i.e. errors in measuring any of the variables.


ii) Omitted variable bias; i.e. it could be likely that we have omitted some very important
variables in the regression model
iii) Human behavior which is random and unpredictable e.g. due to differences in tastes
and preferences in tastes and preferences; or also shocks.
iv) Specification errors i.e. assuming a linear relationship when it should actually be non-
linear and so on.

A point to note about regression analysis is that any regression analysis is that any regression
must be guided a prior by economic theory.
Thus, although the function Y    X  u assumes causation, i.e. that X and Y but this
causation must be informed by economic theory. Thus, if we say:

 𝐶𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 = 𝛼 + 𝛽. 𝐼𝑛𝑐𝑜𝑚𝑒 + 𝜇, this is true regression since it is derived


from economic theory of consumption;
 𝐼𝑛𝑐𝑜𝑚𝑒 = 𝛼 + 𝛽. 𝐶𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 + 𝜇, this is an untrue regression as no theory
provides for such a relationship.

A distinction is always made between correlation and regression. While correlation analysis
aims at finding the strength of linear association between variables, regression analysis, on the
other hand, aims at finding the direction of relationship between variables. There are thus two
(2) primary differences between correlation and regression analysis, as outlined below:

2
CORRELATION REGRESSION
1) Assumes symmetry between X and Y Assumes asymmetry between X and Y;
i.e., there is no distinction as to which i.e., distinguishes which variable is
variable is dependent (causality is not dependent and which is explanatory
important) (causality is important)

2) Both X and Y are assumed to be Only Y is assumed to be statistical but


statistical, random or stochastic X is assumed to be fixed.

Thus, correlation does not imply causality, but regression does so.

There are basically two types of regression analysis, i.e.,

i) Simple regression analysis


ii) Multiple regression analysis

In simple regression analysis, we study the effect of only one explanatory variable on the
dependent variable. For example, how X affects Y in: Y    X  u . Thus, Y = f(X)

For this reason, simple regression analysis is also known as two-variate regression analysis.,
or bivariate regression

In multiple variable analysis, we study the effect of more than one explanatory variable on the
dependent variable. For example, how X 1, X2 and X3 will affect Y in:
Y    X 1   2 X 2   3 X 3  u . Thus Y= f(X1, X2, …………, Xn).

3.2 THE SIMPLE LINEAR REGRESSION MODEL AND ITS ASSUMPTIONS

The simple linear regression model is:

𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜀𝑖 ………………………………………………………………………….. (3.1)

Where β0 and β1 are unknown but fixed parameters known as the regression coefficients. They are
also known as intercept and slope coefficients, respectively. In regression analysis our interest is

3
in estimating the values of unknowns β0 and β1 on the basis of observations on Y and X. ɛi is an
unobservable random variable known as the stochastic disturbance or stochastic error term. The
stochastic term ɛi captures other variables besides X that affect Y and are not included in the model.

As indicated, the disturbance term ɛi is a surrogate for all those variables that are omitted from the
model but that collectively affect Y. The obvious question is: Why not introduce these variables
into the model explicitly? I.e. why not develop a multiple regression model with as many variables
as possible? The reasons are many.

1. Vagueness of theory: The theory, if any, determining the behaviour of Y may be, and often
is, incomplete. We might know for certain that X influences Y, but we might be ignorant or
unsure about the other variables affecting Y. therefore, ɛi may be used as a substitute for all the
excluded or omitted variables from the model.

2. Unavailability of data: Even if we know what some of the excluded variables are and
therefore consider a multiple regression rather than a simple regression, we may not have
quantitative information about these variables. It is a common experience in empirical analysis,
that the data we would ideally like to have often are not available. For example, in principle
we could introduce family wealth as an explanatory variable in addition to the income variable
to explain family consumption expenditure. But unfortunately, information on family wealth
generally is not available. Therefore, we may be forced to omit the wealth variable from our
model despite its great theoretical relevance in explaining consumption expenditure.

3. Core Variables versus Peripheral Variables: Assume in our consumption income example
that besides income X1, the number of children per family X2, sex X3, religion X4, education
X5, and geographical region X6 also affect consumption expenditure. But it is quite possible
that the joint influence of all or some of these variables may be so small and at best
nonsystematic or random that as a practical matter and for cost considerations it does not pay
to introduce them into the model explicitly. One hopes that their combined effect can be treated
as a random variable ɛi.

4. Intrinsic Randomness in Human Behaviour. Even if we succeed in introducing all the


relevant variables into the model, there is bound to be some “intrinsic” randomness in

4
individual Y’s that cannot be explained no matter how hard we try. The disturbances, the ɛ’s,
may very well reflect this intrinsic randomness.

5. Poor Proxy Variables: Although the classical regression model: Assumes that the variables
Y and X are measured accurately, in practice the data may be plagued by error of measurement.
Consider, for example, Milton Friedman’s well-known theory of the consumption function. He
regards permanent consumption (Yp) as a function of permanent income (Xp). But since data
on these variables are not directly observable, in a practice we use proxy variables, such as
current consumption (Y) and current income (X), which can be observable. Since the observed
Y and X may not equal Yp and Xp, there is the problem of errors of measurement. The
disturbance term u may in this case then also represent the errors of measurement. As we will
see in a later chapter, if there are such errors of measurement, they can have serious
implications for estimating the regression coefficients, the β’s.

6. Principle of Parsimony: Following Occam’s razor, (that descriptions be kept as simple as


possible until proved inadequate), we would like to keep our regression model as simple as
possible. If we can explain the behaviour of Y “substantially” with two or three explanatory
variables and if our theory is not strong enough to suggest what other variables might be
included, why introduce more variables? Let ɛi represent all other variables. Of course, we
should not exclude relevant and important variables just to keep the regression model simple.

7. Wrong Functional Form: Even if we have theoretically correct variables explaining a


phenomenon and even if we can obtain data on these variables, very often we do not know the
form of the functional relationship between the regress and the regressors. Is consumption
expenditure a linear (invariable) function of income or a nonlinear (invariable) function? If it
is the former, Yi = β1 + β2Xi + ɛi is the proper functional relationship between Y and X, but if
it is the latter, Yi = β1 + β2Xi + β3X2i + ɛi may be the correct functional form. In two-variable
models the functional form of the relationship can often be judged from the scattergram. But
in a multiple regression model, it is not easy to determine the appropriate functional form, for
graphically we cannot visualize scattergrams in multiple dimensions.

5
For all these reasons, the stochastic disturbances ɛi assume an extremely critical role in regression
analysis, which we will see as we progress.

Equation 3.1 describes a Population Regression Function (PRF). However, in most practical
situations what we have is but a sample of Y values corresponding to some fixed X’s. Therefore,
our task now is to estimate the PRF on the basis of the sample information. That is, our primary
objective in regression analysis is to estimate the PRF (equation 3.1) on the basis of the Sample
Regression Function (SRF) because more often than not our analysis is based upon a single sample
from some population. The SRF is given as follows: -

̂0 + 𝛽
𝑌𝑖 = 𝛽 ̂1 𝑋𝑖 + 𝑒̂………………………………………………………………..………….
𝑖 (3.2)

̂0 and 𝛽
Where 𝛽 ̂1 are estimators of 𝛽0 and 𝛽1 respectively. Because of sampling fluctuations our
estimate of the PRF based on the SRF is at best an approximate one. The critical question now is:
given that the SRF is but an approximation of the PRF, can we devise a rule or a method that will
make this approximation as “close” as possible? In other words, how should the SRF be
̂0 is as “close” as possible to the true 𝛽0 and 𝛽
constructed so that 𝛽 ̂1 is as “close” as possible to the

true 𝛽1 even though we will never know the true 𝛽0 and 𝛽1 ? To that end, we must not only specify
the functional form of the model, but also make certain assumptions about the manner in which Y i
are generated. To see why this requirement is needed, look at the PRF: Yi = β1 + β2Xi + ɛi. It shows
that Yi depends on both Xi and ɛi. Therefore, unless we are specific about how Xi and ɛi are created
or generated, there is no way we can make any statistical inference about the Y i and also about β0
and β1. Thus, the assumptions made about the Xi variable(s) and the error term are extremely
critical to the valid interpretation of the regression estimates.

The Classical Linear Regression Model (CLRM), which is the cornerstone of most econometric
theory, makes 10 assumptions. We first discuss these assumptions in the context of the two
variables regression model; and in Chapter 4 we extend them to multiple regression models.

1. Linear Regression Model: The regression model is linear in the parameters, correctly
specified, and has an additive error term (the regress and Y and the regressor X themselves
may be nonlinear). The model parameters being linear means the regression coefficients don’t
enter the function being estimated as exponents (although the variables can have exponents).

6
2. X values are fixed in repeated sampling: Values taken by the regressor X are considered
fixed in repeated samples. More technically, X is assumed to be non-stochastic. “Fixed values
in repeated sampling, can be explained using an example. Let Y be consumption expenditure
and X weekly income. We can keep the value of income X fixed, say, at level $ 80, we draw
at random a family and observe its weekly family consumption expenditure Y as, say $ 60.
Still keeping X at $ 80, we draw at random another family and observe its Y value as $ 7. In
each of these drawings (i.e., repeated sampling), the value of X is fixed at $ 80. We can repeat
this process for all the X values we want. What all this means is that our regression analysis is
conditional regression analysis, that is, conditional on the given values of the regressor(s) X.
3. Zero mean value of disturbance ɛi: Given the value of X, the mean, or expected, value of the
random disturbance term ɛi is zero. Technically, the conditional mean value of ɛi is zero.
Symbolically, we have
𝐸 (𝜀𝑖|𝑋𝑖 ) = 0

This assumption implies that the average or mean value of these deviations corresponding to
any given X should be zero. That is, the factors not explicitly included in the model, and
therefore subsumed in ɛi, do not systematically affect the mean value of Y.

4. Homoscedasticity or equal variance of 𝜺𝒊 : Given the value of X, the variance of 𝜀𝑖 is the


same for all observations. That is, the conditional variances of 𝜀𝑖 are identical. Symbolically,
we have
𝑣𝑎𝑟(𝜀𝑖 |𝑋𝑖 ) = 𝐸[𝜀𝑖 − 𝐸(𝜀𝑖 |𝑋𝑖 )]2

= 𝐸 (𝜀𝑖2 |𝑋𝑖 ) because of assumption 3

= 𝜎2

5. The error terms are uncorrelated with each other: No autocorrelation or serial correlation.
Given any two X values, Xi and Xj (i_= j), the correlation between any two ɛi and ɛj (i_= j) is
zero.
6. Disturbance ɛ and explanatory variable X are uncorrelated: X and ɛ (which represents the
influence of all the omitted variables) have separate (and additive) influence on Y. but if X and
ɛ are correlated, it is not possible to assess their individual effects on Y. Thus, if X and ɛ are

7
positively correlated, X increases when ɛ increases and it decreases when ɛ decreases.
Similarly, if X and ɛ are negatively correlated, X increases when ɛ decreases and it decreases
when ɛ increases. In either case, it is difficult to isolate the influence of X and ɛ on Y.
7. The number of observations n must be greater than the number of parameters to be
estimated: Alternatively, the number of observations n must be greater than the number of
explanatory variables.
8. The regression model is correctly specified: Alternatively, there is no specification bias or
error in the model used in empirical analysis.
9. There is no perfect multicollinearity: That is, there are no perfect linear relationships among
the explanatory variables.
10. The error term is normally distributed (optional assumption for hypothesis testing) with
zero mean and constant variance. Symbolically, 𝜀~𝑁(0, 𝜎 2 ).
11. The values for the independent variables are derived from a random sample of the population,
and they contain variability.

3.3 THE ORDINARY LEAST SQUARES ESTIMATORS

The ordinary squares (OLS) estimators are the main techniques used to estimate regression models.
The name OLS is derived from the fact that OLS aims at minimizing the sum of squared residuals.
 
In so doing, OLS finds the values of the model parameters (  0 and  1 ) that fits a line of best fit.

 
The ordinary least squares estimators (  0 and  1 ) are derived using the following eight steps:

STEP 1: Begin with a sample and population regression lines and obtain the residual, as a
difference of the two:

 Population regression line: Yi   0  1 X i  ui


   
 Sample regression line: Y   0   1 X i  u

 Residual ( ei ) = Yi  Y i

8
 
ei  Yi   0   1 X 1

STEP 2: Square on both sides of the equation and take summations:

 
ei  (Yi   0   1 X 1 ) 2
2

2

 ei   Yi   0   1 X 1 
 
2


2
The result, i.e. ei is called the SUM OF SQUARED RESIDUALS (RSS)

 
STEP3: Obtain the partial derivatives of the sum of squared residuals with respect to (  0 and  1
) as follows:

  ei
2

 2  Yi   0   1 X 1 .  1
 

0

 

 2  Yi   0   1 X 1 
 

 
  
 2 Yi  2  0  2  1 X 1 , , , , but,2  0
 
 2 Yi  2n  0  2  1  X 1..............................(1)

  ei
2

 2  Yi   0   1 X 1 .  X 1
 

1

 

 
 2 X 1 . (Yi   0   1 X 1 )
 
 2 Yi X i  2  0  X i  2  1  X i .................................(2)
2

STEP 4: The first –order necessary condition for maxima/minima requires that each partial
derivative is equal to zero. Thus we shall equate equation 1 and equation 2 to zero:

From Equation 1:

9
 
 2 Yi  2n  0  2  1  X i  0
 
2n  0  2  1  X i  2 Yi
 
n  0   1  X i   Yi ...............................................(3)

From Equatio0n 2:
 
 2 Yi X i  2  0  X i  2  1  X i  0
2

 
2  0  X i  2  1  X i  2 Yi X i
2

 
 0  X i   1  X i   Yi X i ..........................................(4)
2

The two resulting equation 3 and 4 are the famous normal equations

STEP 5: Check that the normal equations will maximize or minimize the residual sum of squares


2
ei
To do so, we use the second –order conditions on equations 1 and 2 as follows.

e
2

2


i
 2n  0(min ima)
0
2

 2  ei
2

 2 X i  (min ima)
2

1
Since the second-order conditions are positive definite, it means that and will indeed minimize
the residual sum of squares.

STEP 6: Express the two normal equations from equation 3 and equation 4 into matrix format as
follows,

n     Y 
 
n  0   1  X i  Yi In matrix form 
 i    0    i 
X

 i   1   i i 
 2 

  YX 
 X i X 
 
 0  X i   1  X i  Yi X i
2

10

STEP7: solve for  1 using the method of Crammer’s Rule :( replace column).

n Y i


1 
 X Y X i i i

n Yi X i   Yi . X i
n X i   X i 
2 2

n X i

X X
2
i i


Thus, in a simpler way, the OLS estimator  1 is given by the formula:
 n Yi X i   Yi . X i
 1 
n X i   X i 
2 2

  _  _
Step 8: Obtain the intercept parameter  0 is given by the formula:  0  Y   1 X

Where: Y 
_
Y and X 
_
X i.e. the mean values.
n n

Example: Estimating a Regression Model Using Ordinary least squares

Recall the example (on the sales and profit of ABC Company limited) provided under Correlation
Analysis:

The data table is reproduced as under for convenience: we got the following:

X  550,  Y  90,  XY  6,340,  X 2


 38,500, Y 2
 1,054

 
Using the information, we can now obtain the OLS estimators  0 and  1 respectively as
follows


1 
n Yi X i   Yi . X i

10  6340   55  90 
n X i   X i  10  38500   550 2
2 2

11
1  
 xy
 x2

63,400  49,500 13,900 
1390
1    0.1685 , Similarly 1 
82,500 82,500 8250

 1  0.1685

 Y  90  9
_  0  9  0.1685  55 
 _  _
Y  
For  0  Y   1 X but
n 10  0  9  9.2667  0.2667

X
 X  550  55 

n 10  0  0.2667


Thus the OLS regression Equation is Y  0.2667  0.1685 X

Interpretation of the results:



 0  0.2667
Holding sales(x) constant, the expected or mean profit (Y) is ksh-0.2667(a loss)

 1  0.1685
An increase in sales by one unit, will lead to an increase in profit by 0.1685 units, ceteris paribus

From the OLS regression equation, we can also predict or forecast the value of Y for any given
value of X.
For example, given that X= 150, we can now predict Y as follows:

Y  0.2667  0.1685 X

Y  0.2667  0.1685(150)

Y  25
Apart from prediction or forecasting, we can equally calculate the elasticity of profit with respect
to sales using either point elasticity or arc elasticity as follows:

For point elasticity, the elasticity of profit with respect to sales is.

12
p s
e p ,s  .
s p

Thus, since Pr ofit  0.2667  0.1685( sales) , then at the mean values of profit and
sales ,we shall obtain :
_
p S
e p ,s  ._
s P

 0.1685 , S     P  90  9
p _
S 550 _
Where:  55 P 
s n 10 n 10

55
Thus, e p ,s  0.1685   1.0297 , e p ,s  1.0297
9

Interpretation: A 1% increase in the value of sales, will lead to a 1.0297 %increase in profit,
ceteris paribus. Thus, sales and profit are relatively elastic.

For Arc elasticity, the elasticity of profit with respect to sale is:

p  S1  S 2 
e p ,s 
s  P1  P2 
.

For example, if S1  40 and S 2  60 , then:

At S1  40 ; profit =  0.2667  0.1685(40) =6.4733

At S 2  60 ; profit =  0.2667  0.1685(60) =9.8433

 40  60 
Arc elasticity = e p , s  0.1685 .  1.0327
  6.4733  9.8433 
e p ,s  1.0327


Finally, we can also obtain Y (i.e. The estimated value of profits), the residual ( ei ) and the

e
2
squared residuals ( i ) or RSS as follows:

13
Time X Y   2
Y  0.2667  0.1685 X ei  Y  Y ei

1 10 2 1.4183 0.5817 0.3384

2 20 3 3.1033 -0.1033 0.0107

3 30 5 4.7883 0.2117 0.0448

4 40 7 6.4733 0.5267 0.2774

5 50 8 8.1583 -0.1583 0.0251

6 60 9 9.8433 -0.8433 0.7111

7 70 11 11.5283 -0.5283 0.2791

8 80 12 13.2133 -1.2133 1.4721

9 90 14 14.8933 -0.8933 0.8069

10 100 19 16.5833 2.4167 5.8404

e i
 0.008 e i
2
=9.806

Thus, the expected value or mean of the residual ei is:

e i  E ei  
_
e i

 0.008
 0.0008
n 10

Actually, the expected value or mean of the residual or error term should be zero. In this case,
it is not exactly zero due to the rounding off.
Thus E( ei )= Zero (0)

The value 9.806 is called the sum of squared residuals, i.e.  ei  9.806( RSS)
2

14
ASSUMPTIONS OF THE ORIDINARY LEAST SQUARES

The OLS methodology is based on the following assumptions.

i. The expected value or the mean of the error term is Zero E ei  
e i
0
n
This concept was introduced in the table provided above

ii. The variance of the error term is constant, i.e.

Varei   Eei  E ei   E ei    2 (sigma, , squared)


2 2

When the variance of the error term is constant, this is the assumption of
Homoscedasticity; otherwise, if the variance is not constant, that is a case of
Heteroscedasticity, which is actually a violation of the OLS assumption of
homoscedasticity. Therefore, the error term should be homoscedastic. The problem of
Heteroscedasticity is common in cross-sectional data.

iii. The Assumption of Normality:

The error term is assumed to follow a normal distribution with a mean of zero and a

variance of 2 , ei N 0, 2 

iv. There is a linear relationship between the dependent variable and the independent
variables: Y    X  e
Thus, the relationship between X and Y is linear in the OLS parameters  and 

v. Assumption of no Multicollinearity

Multicollinearity is a situation in which the independent variables ( X 1 , X 2 , X 3 ) are


correlated. It is also a violation of OLS assumption. If there is a problem of
multicollinearity, it means that we cannot obtain the values of the OLS parameters
 
,  1,  2 etc. Thus, there should not be a problem of multicollinearity

corr  X 1 , X 2   0 .

15
vi. Assumption of zero correlation between the independent variable and the error
term ; i.e., The error term and the independent variable should not be correlated
Cov ( X i , ei )  0

vii. The assumption of zero autocorrelation:

The error term in period i  and the error term in period  j  should not be corrected. Thus,
there should be no autocorrelation, otherwise known as Serial correlation.
Cov(ei , e j )  Eei e j   0 For all i  j .

The problem of autocorrelation is therefore a violation of the OLS assumption, and is


common in time series Data.

viii. No outliers in the data

An outlier is a value that is very large or very small, in relation to the rest of the other
observations

THE VARIANCE OF THE ERROR TERM

2
The variance of the error term, i.e. VarU i Is given by:

u
2

 
2 2 i
VarUi
n2

N/B n  2 is called the degree of freedom (df) such that we minus 2 since the regression
 
model we obtained had 2 OLS estimators  and  .

e  9.806
2 2
U i Is the sum of squared residuals (RSS) which we found earlier i

u
2
 2
9.806 9.806
     1.22575
2 i
Thus VarUi
n2 10  2 8

16
THE STANDARD ERROR OF THE REGRESSION MODEL

The standard error of the regression model ( se ) is obtained by taking the square root of the

U
2

variance of the error term. i.e. se  var ui    i

n2
Hence se  1.22575  1.10714

N/B. The standard error of the regression model is actually the standard deviation of the Y values
about the mean of Y. Thus, it is also the standard deviation of Y.

THE STANDARD ERROR OF THE OLS COEFFICIENT

a) The standard error of the slope coefficient



The standard error of the slope coefficient  , which is denoted se  by is given



by: se  

x 2


From our example, we notice that   1.10714  x 2  8,250

Thus,
1.10714 1.10714
se     0.01219
 8,250 90.8295
se   0.01219

b) The standard error of the intercept parameter



The standard error of the intercept parameter  which is denoted by se is given by:

X
2

se  i
.x 
 n x 2

17

that   1.10714 ,  X i  38,500 , n  10 x  8,250
2 2
From our example, we notice

Thus,

se  
 38,500  1.10714
 10  8,250
se   0.68313  1.10714

se   0.75632

c) The t-value ratio


The t-value ratio is an important test statistic that we use in determining whether a
particular variable or parameter is significant or not. This process is actually referred to as
Hypothesis Testing:
The t student ratio is thus given by the formula:

OLS.estimator
t
s tan dard.error .of .OLS.estimator

Therefore, the t-value for the slope coefficient is given as:



 0.1685
t  
 se  0.01219

t   13.8228


This is actually, the calculated t-statistics for 
On the other hand, the t-value for the intercept parameter can be obtained in a similar way as
follows.

  0.2667
t  
 se 0.75632

t   0.35258

18
THE COMPLETE REGRESSION MODEL

By complete regression model, we mean, a regression model should at a snapshot, show:

- the OLS estimators,


- the standard errors of the OLS estimators
- the t-values for the OLS estimators and
- the goodness of fit or coefficient of determination

Hence we can now present the complete regression model for ABC company where we regressed
profit (Y) on sales (X) as follows:
Pr ofit  0.2667  0.1685(sales)
Se (0.75632) (0.01219)
t-values 0.35258 13.8228 R 2  0.9598

More formally, these results can be presented in a table of regression results as follows:
Profit Coefficient Std Errors t-value

Constant -0.2667 0.75632 -0.3525

Sales 0.1685 0.01219 13.8228

R 2  0.9598
2
THE ADJUSTED, R

Although the coefficient of determination or goodness of fit ( R ) was found as r  0.9598 ,


2 2

this statistic usually has problems ‘i.e.’

2
i. We cannot compare the r computed from models which have different dependent
variables.
2
Thus, any re-arrangement of the model will yield different values for r

19
2
ii. The values of r usually tends to increase as the numbers of independent variables
2
increase in the model. with this, r loses its usefulness since we cannot tell
whether it is measuring the goodness of fit or the number of independent variables

iii. r 2 Also cannot discriminate among models, i.e. It cannot tell us which particular
model to choose among 2 or more models.
2
Due to the above limitations of r , an alternative measure of goodness of fit ,known
_
2
as adjusted r or commonly r 2 has been developed to help overcome these limitation
2
of the simple r .

2
The adjusted r is modified or adjusted so as to accommodate the changes in degrees
of freedom that results due to addition or removal of some independent variables in
a regression model.

2
The formula for Adjusted R is:

  n 1  2 
1  R 
_
R 2  1  
 nk 

Thus, from our example above, we notice that


2
N=10, K=2 R =0.9598,

  10  1  
1  0.9598 
_
R 2  1  
  10  2  
 9 
R 2  1   .0.0402 
_

 8 
R 2  1  0.04521
_

_
R 2  0.9548
Interpretation:
Holding all other factors constant, sales (X) explains or accounts for 95.48% of changes in
_
2
profit (Y), when adjusted for degrees of freedom. Always R 2 < R

20
PROPERTY OF THE ORDINARY LEAST SQUARES ESTIMATORS

An OLS estimator, such as  is said to be BLUE - i.e. best linear unbiased estimator, if it has the
following properties:

i. LINEAR

The dependent variable Y should be linear in the parameters (  ) as shown below;

Y    X  ei

Notice that Y is linear in  and  .

If indeed Y is linear in  and  , then we can write:



   wiYi  w1Y1  w2Y2  w3Y3  ....

  hiYi  h1Y1  h2Y2  h3Y3  ........
Where wi and hi are simply weights

ii. UNBIASEDNESS
 
The average or expected value of  is denoted by E (  ) is equal to its true value  . Thus
  
E (  ) =  or also E (  ) -  =0. In such a case, we say  is an unbiased estimator of 
  
Similarly E (  ) =  , i.e.,  is an unbiased estimator of  . To demonstrate that  is an

unbiased estimator of  we proceed as follows:



o Assume:  1   0  1 X i  ei
o Multiplying throughout this equation by ai yields:

ai  1  ai  0  ai 1 X i  ai ei Such that a i
0
o Taking summation operators on both sides and simplifying, we get:

21

a  i 1   ai  0   ai 1 X i   ai ei

 1  ai   0  ai  1  ai X i  ei  ai
Assumptions:


 ai  1  E  1  ,  ai  0 , and
 

a X i i
1

   
Thus, E (  1 )  0    0 , E (  1 )  1 . Thus  is an unbiased estimator of  1

Next, we now want to demonstrate that E ( )  
 _  _
We start from the formula that  Y  X

However, recall that:


_
Y
Y i
and

   wiYi
n
 _  _
If we substitute these into the formula   Y   X , we get:


 Y _
i
  wiYi X
n
or


 Y _
  wi X Yii

n

1 _

     wi X Yi
n 
1 _ 
Let  wi X  hi be a constant .Therefore    hiYi
n

 1  _ 
Since      wi X Yi and that Yi    X i  ei , we substitute Yi into  as
n 

1 
 n  wi X   X i  ei 
_
follows:   

Expand this expression as follows:

22
 1
 _
 1 _
 1 _
 _
    wi X i     wi X  X i    wi X  wi X 
 n  n  n  

 _
  X _
 1 _

     wi X     i  wi X i X i      wi X ei
n   n  n 

    w  X
  _
1
_ _

Xi  i
  wi X X i    wi X i   ei
n 
i
n n
_
..............wi X i  0, , ,
 X i
_
  wi X X i  0, , ,  ei  0, ,
n
 
  E( ) And also, E( )  

Thus (  ) is an unbiased estimator of 
In general, therefore, the OLS estimators are unbiased estimators of their actual or
population values.

iii. EFFICIENT ESTIMATOR


 By efficient estimator, we mean that the OLS estimators have minimum variance in the
class of linear and unbiased estimators.
 To demonstrate this, we shall work out the two variance

i. The variance of the OLS estimator (  ).


ii. The variance of another estimator (  * ) which is obtained by another econometric

method.

 First, we obtain the variance of the OLS estimator. (  ).


2
  
 

 
2
2
Var    se     
     x2   x2
 

 Next, we obtain the variance of another estimator (  * ) which is obtained

econometric method

23
  
We first define (  * ) as  * =   wi
2
    
The variance for  * is defined as Var  *       wi .ei 

     
   
2
  
 

 
Var  *     wi  .E ei 
2

    
  
   
 
   2

2 
Var  *     2  wi  wi .E ei 
2

    
  
 
 
2
Var  *    2  2   wi   wi .E ei 
 
2

 
 

w  zero (0) and that E (ei )  Varei   2


2 2
But we recall that i

Thus

   
Var  *     2   wi   2

2

   

   
1
Var  *    2  2   wi  2 , , , letting  wi 
2 2

 xi
2


Var  *  Var     2   i
  
2

 

 
  

From the above equation we can note that Var    Var  *
   

Thus the OLS estimator (  ) has minimum variance when compared to the variance of another
 
estimator (  * ) obtained from another econometric method. Thus  is an efficient estimator.

24
In summary, the Gauss-Markov Theorem states as follows: “Given the assumptions of the
classical linear regression model, the ordinary least squares (OLS) By goodness estimators
,in the class of unbiased linear estimators, have minimum variance i.e. They are BLUE.

GOODNESS OF FIT

By goodness of fit, we mean: “How well does the sample regression line fit the data?” The
2
goodness of fit, otherwise known as coefficient of determination, is denoted by r .

2
The value of r ranges from 0 to 1, i.e. from no goodness of fit to a perfect goodness of fit.
Therefore: 0  r  1.
2

2
The following steps illustrate derivation of r

 
Step1: Begin with an OLS regression model: Yi     X i  ei
   
Recall that the sample regression line is: Y     X i . Thus: Yi  Yi  ei

Step2: Subtract the mean value of Y, i.e.


_  _
Yi  Y  Y  ei  Y
_  _
Or Yi  Y  Y  Y  ei

2 2
 _
  _
Step 3: Square both sides and take summations   Yi  Y     Y  Y   ei
2

   

From the above equation, we obtain 3 important squares:


2
 Yi  Y_   y 2  Total Sum of Squares (TSS)
   

2 
 Y  Y   y 2  Explained sum of squares (ESS)
   

e 
2
i =Residual sum of squares (RSS)

25
Therefore: TSS=ESS+RSS

Step 4: Divide both sides of the equation by TSS,

TSS ESS RSS ESS RSS ESS RSS


  >>>>>>>>>>>> 1   or.. 1
TSS TSS TSS TSS TSS TSS TSS

Step 5: The goodness of fit.

ESS 2
Now, the ratio is called the goodness of fit ( r ). Therefore:
TSS

ESS  y 2  ei
2
RSS
r 
2
 ..or ..r 2  1  . 1
TSS  y 2
TSS  y2

 xy  2
  x2 
Another formula for r is: r
2 2
 ..or..r   
2 2

2 
 x . y 2 2
 y

Recall
e
2
9.806
r 2
1 i
..  1 
y 2
244
r 2  0.9598

 xy  2
1390  2

Or, r 2
   0.9598 .or 95.98%
 x . y 2 2
8250  244

  x2  8250
Or, r      0.1685 2   0.9598 .or.95.98%
2 2
2 
  y  244

CONFIDENCE INTERVAL ESTIMATION

Confidence interval estimation aims at constructing an interval around the OLS estimators. The

confidence interval for the slope coefficient  is given as follows:

26
 
  t .se        t .se   .  1  
2   2  
   
   

Where:

-  is the estimated OLS estimator for 
- t Is the critical t value for a two tailed test at n-k degrees of freedom.
2


- se   is the standard error of the slope coefficient (  )
 
 
 

-  is the level of significance, e.g. 1%, 5%, and 10%

- 1-  is the coincidence interval, e.g. 99%, 95%, and 90%.



The figure below illustrates the confidence interval for  :-

 
  t .se      t .se  
2   2  
   
   

 
1
2 2

In the diagram above, the shaded part is the rejection region, while the un-shaded part is the
acceptance region.

The following table shows the appropriate critical t values at various levels of significance and
at one-tail and two-tail tests:

Level of significance One-tail Two-tail t-critical, one t-critical, two 1- 


tail tail

 =1% 0.01 0.005 2.326 2.576 99%

27
 =5% 0.05 0.0025 1.645 1.960 95%

 =10% 0.10 0.05 1.282 1.645 90%

For example, in order to calculate a 95% confidence interval for  at a two-tailed test is
obtained as follows.

 =0.1685; n=10; k=2 ; n-k= (10-2)=8 degrees of freedom ; and se    0.1219 and 1- 
 
 
 

=0.95 ; hence  =5%=0.05

Thus:

 
  t ,8df .se        t ,8df .se    1  
2   2  
   
   
 
  t 0.025 ,8df .se        t 0.025 ,8df .se    95%
   
   
   

0.1685  2.306  0.1219     0.1685  2.306  0.1219   95%


0.1685  0.02811    0.1685  0.02811  95%
0.1404    0.1966  95%

Hence, the 95% confidence interval for  is 0.1404    0.1966

HYPOTHESIS TESTING

A hypothesis is a guess or a hunch about something.

By hypothesis is testing, we mean: “can our regression results be trusted?” or also, “Do our
regression estimates matter?”

There are 2 types of hypothesis.


- The null hypothesis

28
- The alternative hypothesis
The null hypothesis is the hypothesis of interest. It is usually denoted by Ho. For example, to test

whether the slope coefficient is significant, we state: Ho:  =0.

The alternate hypothesis is the hypothesis that is tested against the hypothesis of interest, i.e. the
null hypothesis. The alternate hypothesis is denoted is denoted by H 1 or HA. For example, the
alternate hypothesis to test whether the slope coefficient is significant, we state as follows:

- H1:   0 for the case of a two-tailed test
 
- H1:  >0 or H1:  <0 for the case of a one-tail test.
Point to note

The hypothesis Ho:  =0.means as follows:
- The slope coefficient is equal to zero, or
- The slope coefficient is not statistically significant, or
- X does not influence Y

The hypothesis H1:   0 means as follows:
- The slope coefficient is different from zero,
- The slope coefficient is statistically significant
- X does influence Y
In hypothesis testing, there are 2 possible types of errors that can be committed, i.e. Type I error
and type II error.
Type I error occurs when we reject the null hypothesis, when in actual sense, it should not
have been rejected; i.e. “killing an innocent man”

Type II error occurs when we do not reject (accept) the null hypothesis when in actual
sense, it should have been rejected, i.e. “letting a guilty man away Scott-free”

The aim of hypothesis testing is to reduce the chances of committing both type I and type II errors.
This is the reason why in hypothesis testing, we specify the level of significance (  =1% or 5%
or 10% ).
There are 3 common approaches used in hypothesis testing:

29
1. The confidence interval approach
2. The test of significance approach
3. The probability-value (P-Value) approach

1. HYPOTHESIS TESTING USING CONFIDENCE INTERVAL APPROACH

The decision rule for hypothesis testing using the confidence interval approach states as follows:
“If the OLS parameter of interest under the Null hypothesis falls within the constructed confidence
interval, we do not reject the Null hypothesis. However, if it falls outside the confidence interval,
then we reject the Null hypothesis.”
This decision rule is demonstrated as under:

Rejection region Rejection region .


Acceptance region

Earlier on, we had constructed a 95% confidence interval for  and obtained the confidence
interval as: 0.1404    0.1966  95%
From this confidence interval, we can test the following hypotheses:
 
Ho :   0 Ho :   0.16
i). 
ii). 
HA :  0 H A :   0.16

For the first set of hypothesis, we notice that the value   0 does not lie within the confidence
interval, i.e. it lies in the REJECTION REGION. Thus, we reject the null hypothesis or accept the
alternative hypothesis.
 
In conclusion, it means  is not equal to zero, or we could say,  is statistically different from
zero.

30

For the second set of hypothesis, we notice that the value   0.16 actually lies within the
confidence interval, i.e. it lies in the acceptance region. Thus, we accept or (do not reject) the null
hypothesis.
 
In conclusion, it means that  is statistically equal to 0.16 or that  is not statistically different
from 0.16.

2. HYPOTHESIS TESTING USING TEST OF SIGNIFICANCE APPROACH

The test of significance (t-test) approach is the most commonly used to test for hypothesis testing
in econometrics. In this approach, which is similar in spirit to the confidence interval approach,
the null and alternate hypotheses are stated respectively as:

Ho :    *

HA :   *

Such that: Ho :    * is the true value of the estimated OLS coefficient and,

H A :    * Is a hypothesized or guessed value of 


  *
The general formula for the t-test is as follows: t  calculated 
se  
 
 
 


Where: se   is the standard error of the OLS parameter 
 
 
 


If    *, then t-calculated will be positive

If    *, then t-calculated will be negative
Irrespective of the value of t-calculated, we always take its absolute value.

Having obtained t-calculated, we then proceed to obtain the critical value for the t-statics, i.e. t-
critical, from the t-tables.

31
The critical t is obtained as follows:

t-critical = t , n  k.df , for a two-tailed test


2

t-critical= t , n  k.df , for a one tailed test


2

The decision Rule for hypothesis testing using the test of significance approach states as follows:
“If t-calculated is greater than t-critical, reject the Null hypothesis, but if t-calculated is less than
t-critical, do not reject (accept) the null hypothesis.”

For example, we can now test for the following hypothesis using the t-test approach; assuming a
level of significance   5%
 
Ho :   0 Ho :   0.16
i. 
ii 
HA :  0 H A :   0.16

For the first set of hypothesis, we can obtain t-calculated as follows:



  *
: t  calculated  ,
se  
 
 
 


Where:   0.1685 ,   0 , and se.   = 0.01219, thus,
 
 
 

0.1685  0
t  calculated   13.8228
0.01219


Then t-critical= t , n  k , where   5% ,  2.5%  0.025 , n=10, k=2, n-
2 2
k=8df. Thus, t-critical = t 0.025 ,8df  2.306

Upon comparing t-calculated and t-critical, we notice that: t-calculated>t-critical.

Thus, according to our decision rule, we reject the Null hypothesis but do not reject (accept) the
alternative hypothesis.

32
 
In conclusion, we can therefore say that  is not equal to zero, or we could say,  is statistically
different from zero.
For the second set of hypothesis, we can obtain t-calculated as follows:

   * 0.1685  0.16
t  calculated    0.6973
se   0.01219
 
 
 

The value for t-critical will remain the same t t-critical=2.306. Upon comparing t-calculated and
t-critical, we notice that: t  calculated  t  critical . Thus, following the decision rule, we

do not reject (accept) the null hypothesis. In conclusion, we can therefore say that  is statistically
equal to 0.16.

NOTE: The conclusions from the confidence interval approach actually resemble the conclusions
from the test of significance approach and this must always be so. Indeed, the confidence interval
approach is simply a mirror image of the test of significance approach.

3. HYPOTHESIS TESTING USING THE PROBABILITY (P) VALUE APPROACH

The probability (P) value approach is also an ideal way for testing hypothesis. The P-value states
the smallest level of significance (  ) for which the null hypothesis can be rejected.

The beauty with P-value approach is that most computer software (Excel, SPSS, STATA, E-views,
SHAZAM, RATS, etc) automatically provide this P-value whenever you run a regression.

For example, if the software reports a P-Value of 0.07, it means there is a 7% chance that we can
reject the Null hypothesis. Thus, we can reject the Null hypothesis at   10% , but we cannot
reject the null hypothesis at   5%or  1%

The table below summarizes some P-values and significance level.

33
P-value Details Is coefficient significant at

  1%   5%   10%

P=0.0000  Yes Yes Yes


 is significant at all levels

P=0.035  No Yes Yes


 is significant at 3.5%

P=0.074  No No Yes
 is significant at 7.4%

P=0.1025  No No No
 is significant at 10.25%


In summary, the smaller the P-Value, the more significant is  .

REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE


Analysis of variance (ANOVA) is a study of Total Sum of Squares, (TSS) and its components,
i.e., Explained Sum of Squares (ESS) and Sum of Squared residuals (RSS).
The concept here is that: ESS + RSS = TSS
 

 y 2  u 2   y 2
By dividing the sum of squares (SS) by their associated degrees of freedom (df), we get the mean
sum of squares (MSS). The ANOVA table therefore shows the sum of squares (SS), degrees of
freedom (df), mean sum of squares (MSS) and source of variation.

34
Source of variation Sum of squares (SS) df Mean sum of squares (MSS)

Due to regression (ESS)   k-1 

 y or   x
2 2 2
ESS  2  x 2
  MSSreg
df k 1

Due to residuals (RSS)  n-k 

u 2
RSS  u 2
  MSSres
df nk

Total (TSS) y 2 n-1


F
MSSreg
MSSres

From the ANOVA table, the F statistic is computed as follows:

Mean sum of squares due to regression ESS


F  k 1
RSS
Mean sum of squares due to residual nk

The F statistic follows the F distribution with (k-1) degrees of freedom on the numerator and (n-
k) degrees of freedom on the denominator. The F statistic is used to test for overall significance of
the model.
If F-calculated >F-critical, the model is statistically significant
If F-calculated <F-critical, the model is not statistically significant.
Example
Recall the example of the sales (X) and profit (Y) of ABC Company limited for a period of 10
years. The following values were obtained:

 x 2  8,250 ,  y 2  244,   0.1685, e  9.806, and n=10, k=2


2
i

Thus, Total Sum of Squares (TSS) =  y 2  244



Explained Sum of Squares (ESS) = 
2
x 2
 (0.1685) 2  8,250  234.36

Sum of Squared residuals (RSS) =  u 2 =9.806

35
Notice that 234.236+9.806 = (244)

Hence, the ANOVA table is as follows:

Source of Variation Sum of Squares (SS) df Mean Sum of Squares (MSS)

Due to regression (ESS) 234.2306 2-1=1 234.2306


 234.2306
1

Due to residuals (RSS) 9.806 10-2=8 9.806


 1.2258
8

Total (TSS) 244.0000 10-1=9 234.2306


F  191.09
1.2258

The critical value of F at 5% is given as follows:

Critical F= Fk 1 , n  k ,  F1 , 8 , 5%  5.32

N.B: The F ratio is always a one-tailed test. We notice that calculated-F is greater than critical-F,
i.e. (Fcal>Fcrit)

Conclusion: The overall model is statistically significant.

36
Summary

If a model contains only one explanatory variable, then it is called a simple regression model.
When there are more than one independent variables, then it is called a multiple regression
model. When there is only one study variable, the regression is termed as univariate regression.
When there are more than one study variables, the regression is termed as multivariate
regression. Note that the simple and multiple regressions are not same as univariate and
multivariate regressions. The simple and multiple regression are determined by the number of
explanatory variables, whereas univariate and multivariate regressions are determined by the
number of study variables.

Regression uses the historical relationship between an independent and a dependent variable to
predict the future values of the dependent variable. Businesses use regression to predict such things
as future sales, stock prices, currency exchange rates and productivity gains resulting from a
training program, etc.

37

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy