0% found this document useful (0 votes)
57 views20 pages

Multiple Regression Analysis: Võ Đ C Hoàng Vũ

The document discusses multiple regression analysis and its key concepts: - Multiple regression estimates the relationships between a dependent variable and multiple independent variables simultaneously. - It allows for estimating the effect of each independent variable on the dependent variable while controlling for the effects of the other independent variables. - The OLS estimators are unbiased only if the model is correctly specified and certain assumptions like exogeneity of regressors and homoskedasticity are satisfied. Otherwise, omitted variable bias may occur.

Uploaded by

Vu Duc Hoang Vo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views20 pages

Multiple Regression Analysis: Võ Đ C Hoàng Vũ

The document discusses multiple regression analysis and its key concepts: - Multiple regression estimates the relationships between a dependent variable and multiple independent variables simultaneously. - It allows for estimating the effect of each independent variable on the dependent variable while controlling for the effects of the other independent variables. - The OLS estimators are unbiased only if the model is correctly specified and certain assumptions like exogeneity of regressors and homoskedasticity are satisfied. Otherwise, omitted variable bias may occur.

Uploaded by

Vu Duc Hoang Vo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Multiple Regression Analysis

V c Hong V
University of Economics HCMC

June 2015

V c Hong V (UEH)

Applied Econometrics

June 2015

1 / 17

Estimation

The model is : y = 0 + 1 x1 + 2 x2 + . . . + k xk
0 is still the intercept
1 to k all called slope parameters
u is still the error term (or disturbance)
Still need to make a zero conditional mean assumption, so now
assume that E (u|x1 , x2 , . . . , xk ) = 0
Still minimizing the sum of squared residuals, so have k + 1 first
order conditions

V c Hong V (UEH)

Applied Econometrics

June 2015

2 / 17

Interpreting Multiple Regression

0 + 1 x1 + . . . + k xk , so

y = 1 x1 + 2 x2 + . . . + k xk ,
so holding x2 , . . . , xk fixed implies that
y = 1 x1 , that is
each has a ceteris paribus interpretation.

V c Hong V (UEH)

Applied Econometrics

June 2015

3 / 17

"Partialling Out" Interpretation


Consider the case where k = 2, i.e.
y = 0 + 1 x1 + 2 x2 , then
P
P
1 = ( ri1 yi )/ ri12 , where ri1 are the residuals from the
estimated regression x1 = 0 + 2 x2
Previous equation implies that regressing y on x1 and x2 gives
same effect of x1 as regressing y on residuals from a regression
of x1 on x2 .
This means only the part of xi1 that is uncorrelated with xi2 are
being related to yi so were estimating the effect of x1 on y after
x2 has been "partialled out".

V c Hong V (UEH)

Applied Econometrics

June 2015

4 / 17

Simple vs Multiple Regression Estimate

Compare the simple regression y = 0 + 1 x1 with the multiple


regression haty = 0 + 1 x1 + 2 x2
Generally, 1 6= 1 unless: 2 = 0 (i.e. no partial effect of x2 ) OR
x1 and x2 are uncorrelated in the sample

V c Hong V (UEH)

Applied Econometrics

June 2015

5 / 17

Goodness-of-fit
We can think of each observation as being made up of an
explained part, and an unexplained part - yi = yi + ui - We then
define the following:
P
(yi y)2 is the total sum of squares (SST)
P
(
yi y)2 is the explained sum of squares (SSE)
P 2
ui is the residual sum of squares (SSR)
Then SST = SSE + SSR
How do we think about how well our sample regression line fits
our sample data?
Can compute the fraction of the total sum of squares (SST) that
is explained by the model, call this the R-squared of regression
R 2 = SSE /SST = 1 SSR/SST
V c Hong V (UEH)

Applied Econometrics

June 2015

6 / 17

Goodness-of-fit (cont)
We can also think of R 2 as being equal to the squared
correlation coefficient between the actual yi and the value yi

P
( (yi y)(
yi y ))2
P
R = P
( (yi y)2 )( (
yi y )2 )
2

R 2 can never decrease when another independent variable is


added to regression, and usually will increase
Because R 2 will usually increase with the number of independent
variables, it is not a good way to compare models.

V c Hong V (UEH)

Applied Econometrics

June 2015

7 / 17

Assumption for Unbiasedness


Population model is linear in parameters:
y = 0 + 1 x1 + 2 x2 + . . . + k xk + u
We can use a random sample of size
n, {(xi1 , xi2 , . . . , xik , yi ) : i = 1, 2, . . . , n}, from the population
model, so that the sample model is
yi = 0 + 1 xi1 + 2 xi2 + . . . + k xik + ui
E (u|x1 , x2 , . . . , xk ) = 0, implying that all of the explanatory
variables are exogenous.
None of the xs is constant, and there are no exact linear
relationships among them.

V c Hong V (UEH)

Applied Econometrics

June 2015

8 / 17

Too Many or Too Few Variables

What happens if we include variables in our specification that


dont belong?
There is no effect on our parameter estimate, and OLS remains
unbiased
What if we exclude a variable from our specification that does
belong?
OLS will usually be biased.

V c Hong V (UEH)

Applied Econometrics

June 2015

9 / 17

Omitted Variable Bias

Suppose the true model is given as


y = 0 + 1 x1 + 2 x2 + u, but we estimate y = +1 x1 + u, then
P
(xi1 x1 )yi

1 = P
(xi1 x1 )2
Recall the true model, so that yi = 0 + 1 xi1 + 2 xi2 + ui , so the
numerator
becomes
P
(x

1 )(0 + 1 xP
i1 + 2 xi2 + ui ) =P
Pi1
2
1 (xi1 x1 ) + 2 (xi1 x1 )xi2 + (xi1 x)ui

V c Hong V (UEH)

Applied Econometrics

June 2015

10 / 17

Omitted Variable Bias (cont)


P
P
(xi1 x1 )xi2
(xi1 x1 )ui

= 1 + 2 P
+ P
2
(xi1 x1 )
(xi1 x1 )2
since E (ui ) = 0, taking expectations we have
P
(xi1 x1 )xi2
E () = 1 + 2 P
(xi1 x1 )2
Consider the regression of x2 on x1
P
(xi1 x1 )xi2
x2 = 0 + 1 x1 then 1 = P
(xi1 x1 )2
so E (1 ) = 1 + 2 1

V c Hong V (UEH)

Applied Econometrics

June 2015

11 / 17

Summary of Direction of Bias

Corr(x1, x2) > 0 Corr(x1, x2) < 0


2 > 0

Positive bias

Negative bias

2 < 0

Negative bias

Positive bias

V c Hong V (UEH)

Applied Econometrics

June 2015

12 / 17

Omitted Variable Bias Summary


Two cases where bias is equal to zero
2 = 0 that is x2 doesnt really bebong in model
x1 and x2 are uncorrelated in the sample

If correlation between x2 , x1 and x2 , y is the same direction, bias


will be positive
If correlation between x2 , x1 and x2 , y is the opposite direction,
bias will be negative
Technically, can only sign the bias for the more general case if al
of the included xs are uncorrelated.
Typpically, then, we work through the bias assuming the xs are
uncorrelated, as a useful guide even if this assumption is not
strictly true.
V c Hong V (UEH)

Applied Econometrics

June 2015

13 / 17

Variance of the OLS Estimators


Now we know that the sampling distribution of our estimate is
centered around the true parameter
Want to think about how spread out this distribution is
Much easier to think about this variance under an additional
assumption, so
Assume Var (u|x1 , x2 , . . . , xk ) = 2 (Homoskedasticity)
Let x stand for (x1 , x2 , . . . , xk )
Assuming that Var (u|x) = 2 also implies that Var (y |x) = 2
The 4 assumptions for unbiasedness, plus this homoskedasticity
assumption are known as the Gauss-Markov assumptions

V c Hong V (UEH)

Applied Econometrics

June 2015

14 / 17

Variance of OLS (cont.)

Given the Gauss-Markov assumptions


2

Var (j ) =
, where
SSTj (1 Rj2)
P
SSTj = (xij xj )2 and Rj2 is the R 2 from
regressioing xj on all other xs

V c Hong V (UEH)

Applied Econometrics

June 2015

15 / 17

Components of OLS Variances

The error variance: a larger 2 implies a larger variance for the


OLS estimators
The total sample variation: a larger SSTj implies a smaller
variance for the estimators
Linear relationships among the independent variables: a larger
Rj2 implies a larger variance for the estimators

V c Hong V (UEH)

Applied Econometrics

June 2015

16 / 17

Misspecified Models

Consider again the misspecified model


2

y = 0 + 1x1, so that Var (1) =


SST1
Thus, Var (1) < Var (1) unless x1 and x2 are
uncorrelated, then theyre the same

V c Hong V (UEH)

Applied Econometrics

June 2015

17 / 17

Misspecified Models (cont.)

While the variance of the estimator is smaller for the misspecified


model, unless 2 = 0 the misspecified model is biased
As the sample size grows, the variance of each estimator shrinks
to zero, making the variance difference less important

V c Hong V (UEH)

Applied Econometrics

June 2015

18 / 17

Estimating the Error Variance

We dont know what the error variance, 2 , is, because we dont


observe the errors, ui
What we observe are the residuals, ui
We can use the residuals to form an estimate of the error
variance
P

2 = ( ui2 )/(n k 1) SSR/df


thus, se(j ) =
/[SSTj (1 Rj2 )]1/2
df = n (k + 1)

V c Hong V (UEH)

Applied Econometrics

June 2015

19 / 17

The Gauss-Markov Theorem

Given our 5 Gauss-Markov Assumption it can be shown that


OLS is "BLUE"
Best
Linear
Unbiased
Estimator
Thus, if the assumptions hold, use OLS

V c Hong V (UEH)

Applied Econometrics

June 2015

20 / 17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy