ch9_Model Specification and Data Problems
ch9_Model Specification and Data Problems
APPLIED ECONOMETRICS
BITS Pilani Specification and Data Problems
NVM Rao
Pilani Campus
Chapter 9
Specification and Data Problems
N V M Rao
Multiple Regression Analysis
y = b 0 + b 1x 1 + b 2 x 2 + . . . b k x k + u
3
Functional Form
• We’ve seen that a linear regression can really
fit nonlinear relationships
• Can use logs on RHS, LHS or both
• Can use quadratic forms of x’s
• Can use interactions of x’s
• How do we know if we’ve gotten the right
functional form for our model?
4
Functional Form (continued)
• First, use economic theory to guide you
• Think about the interpretation
• Does it make more sense for x to affect y in
percentage (use logs) or absolute terms?
• Does it make more sense for the derivative of
x1 to vary with x1 (quadratic) or with x2
(interactions) or to be fixed?
5
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Specification- Functional Form -
Review
Functional Form (continued)
• We already know how to test joint exclusion
restrictions to see if higher order terms or
interactions belong in the model
• It can be tedious to add and test extra terms,
plus may find a square term matters when
really using logs would be even better
• A test of functional form is Ramsey’s
regression specification error test (RESET)
15
RESET
Ramsey’s RESET
• RESET relies on a trick similar to the special
form of the White test
• Instead of adding functions of the x’s directly,
we add and test functions of ŷ
• So, estimate
y = b0 + b1x1 + … + bkxk + d1ŷ2 + d2ŷ3 +error
and test
• H0: d1 = 0, d2 = 0 using F~F2,n-k-3 or LM~χ22
17
RESET
EXAMPLE
Example : Suppose that the correct specification of the wage equation is
Consider the house price data (Exercise 3.1) and estimate
Nonnested Alternative Tests
• If the models have the same dependent
variables, but nonnested x’s could still just
make a giant model with the x’s from both and
test joint exclusion restrictions that lead to
one model or the other
• An alternative, the Davidson-MacKinnon test,
uses ŷ from one model as regressor in the
second model and tests for significance
30
Nonnested Alternatives (cont)
• More difficult if one model uses y and the
other uses ln(y)
• Can follow same basic logic and transform
predicted ln(y) to get ŷ for the second step
• In any case, Davidson-MacKinnon test may
reject neither or both models rather than
clearly preferring one specification
31
why ulta?
Points to remember
Proxy Variables
• What if model is misspecified because no data
is available on an important x variable?
• It may be possible to avoid omitted variable
bias by using a proxy variable
• A proxy variable must be related to the
unobservable variable – for example:
x3* = d0 + d3x3 + v3, where * implies
unobserved
• Now suppose we just substitute x3 for x3*
38
Proxy Variables (continued)
• What do we need for for this solution to give
us consistent estimates of b1 and b2?
• E(x3* | x1, x2, x3) = E(x3* | x3) = d0 + d3x3
• That is, u is uncorrelated with x1, x2 and x3* and
v3 is uncorrelated with x1, x2 and x3
• So really running
y = (b0 + b3d0) + b1x1+ b2x2 + b3d3x3 + (u + b3v3)
and have just redefined intercept, error term x3
coefficient
39
Proxy Variables (continued)
• Without out assumptions, can end up with
biased estimates
• Say x3* = d0 + d1x1 + d2x2 + d3x3 + v3
• Then really running
y = (b0 + b3d0) + (b1 + b3d1) x1+ (b2 + b3d2) x2 +
b3d3x3 + (u + b3v3)
• Bias will depend on signs of b3 and dj
• This bias may still be smaller than omitted
variable bias, though
40
In simple –
Proxy Variables
example
example
Example
Example
Example
Example
Lagged Dependent Variables
• What if there are unobserved variables, and
you can’t find reasonable proxy variables?
• May be possible to include a lagged
dependent variable to account for omitted
variables that contribute to both past and
current levels of y
• Obviously, you must think past and current y
are related for this to make sense
54
Measurement Error
• Sometimes we have the variable we want, but
we think it is measured with error
• Examples: A survey asks how many hours did
you work over the last year, or how many
weeks you used child care when your child
was young
• Measurement error in y different from
measurement error in x
59
Measurement Error in a
Dependent Variable
• Define measurement error as e0 = y – y*
• Thus, really estimating
y = b0 + b1x1 + …+ bkxk + u + e0
• When will OLS produce unbiased results?
• If e0 and xj, u are uncorrelated is unbiased
• If E(e0) ≠ 0 then b0 will be biased, though
• While unbiased, larger variances than with no
measurement error
60
Measurement Error in an
Explanatory Variable
• Define measurement error as e1 = x1 – x1*
• Assume E(e1) = 0 , E(y| x1*, x1) = E(y| x1*)
• Really estimating y = b0 + b1x1 + (u – b1e1)
• The effect of measurement error on OLS
estimates depends on our assumption about
the correlation between e1 and x1
• Suppose Cov(x1, e1) = 0
• OLS remains unbiased, variances larger
62
Measurement Error in an
Explanatory Variable (cont)
• Suppose Cov(x1*, e1) = 0, known as the classical
errors-in-variables assumption, then
• Cov(x1, e1) = E(x1e1) = E(x1*e1) + E(e12) = 0 + se2
• x1 is correlated with the error so estimate is
Cov x , u b e
biased bs 2
plim bˆ1 b1 1 1 1
b1 1 e
Var x1 s x2* s e2
se
2
s x*
2
b1 1 2 b1 2
2
2
s x* s e s x* s e
63
Measurement Error in an
Explanatory Variable (cont)
• Notice that the multiplicative error is just
Var(x1*)/Var(x1)
• Since Var(x1*)/Var(x1) < 1, the estimate is
biased toward zero – called attenuation bias
• It’s more complicated with a multiple
regression, but can still expect attenuation
bias with classical errors in variables
64
Missing Data – Is it a Problem?
• If any observation is missing data on one of
the variables in the model, it can’t be used
• If data is missing at random, using a sample
restricted to observations with no missing
values will be fine
• A problem can arise if the data is missing
systematically – say high income individuals
refuse to provide income data
67
Nonrandom Samples
• If the sample is chosen on the basis of an x
variable, then estimates are unbiased
• If the sample is chosen on the basis of the y
variable, then we have sample selection bias
• Sample selection can be more subtle
• Say looking at wages for workers – since
people choose to work this isn’t the same as
wage offers
70
Outliers
• Sometimes an individual observation can be
very different from the others, and can have a
large effect on the outcome
• Sometimes this outlier will simply be do to
errors in data entry – one reason why looking
at summary statistics is important
• Sometimes the observation will just truly be
very different from the others
71
Outliers (continued)
• Not unreasonable to fix observations where
it’s clear there was just an extra zero entered
or left off, etc.
• Not unreasonable to drop observations that
appear to be extreme outliers, although
readers may prefer to see estimates with and
without the outliers
• Can use Stata/ eviews/ any software to
investigate outliers
72
Outliers
Example Continue
Least Absolute Deviations Estimation
(LAD)
figure
Thanks