Slide
Slide
Unobserved heterogeneity
Yutec Sun
ENSAI
Unobserved heterogeneity in panel data
In panel data analysis, one of the most widely used models looks like
where ci is the eect of unobserved heterogeneous factors, which persist over time.
For example, the minimum-wage experiment of Card and Krueger () and the water source experiment of Snow ().
Data structure in Dierence-in-dierences analysis
Water suppliers
Dierence -
Table : Average deaths per , before/aer Lambeth moved the water source
• Panel of sub-districts for treated & control groups over time periods
Dierence-in-dierences estimator
Why need unobserved heterogeneity?
• Ineicient estimator
• Omitted variables bias
• Poor prediction of individual behavior
These problems motivate the use of random eects, fixed eects, and hierarchical Bayesian methods.
Illustration of omitted variables bias in panel data
Cov(xit , ci ) ∕= 0,
which can arise when one of xit is correlated with ci : Cov(xikt , ci ) ∕= 0 for some k.
Two strategies available to solve this endogeneity problem: control function and IV methods.
But in panel data, we have one more option available.
We will see panel data methods based on the textbook of Wooldridge ().
Illustration of dierence method
yt = β0 + x t β + c + ut t = 1, 2.
∆y = ∆xβ + ∆u.
E∆x′ ∆u = 0 ()
Does this approach always work? Let’s look more closely at the consistency conditions () and ().
This is oen called mean independence condition since the conditional mean of u is independent of x and c.
Exogeneity condition
E(x1 u1 + x2 u2 + x2 u1 + x1 u2 ) = 0.
This shows
• the mean independence condition (Eq ) is not enough because it guarantees only
x1 u1 = x2 u2 = 0.
E(xt us ) = 0 ∀t ∕= s.
Rank conditions
• If x includes a constant, its dierence in ∆x is , and the full rank condition (Eq ) fails.
• Hence, the constant term in x cannot be separately identified from unobserved heterogeneity c.
Taxonomy of unobserved eects
where xit is a vector of observable variables. ci is the eect of unobserved heterogeneity, and uit is
idiosyncratic error.
There exist two views of the unobserved heterogeneity ci .
which implies that the random eect ci is uncorrelated with xit for ∀t.
. Fixed eect: no assumption imposed on the distribution of ci . Arbitrary correlation with xit is
allowed.
The name idiosyncratic implies that uit is independently distributed.
Historically, in the random eects model ci is considered as a random variable while it is a parameter in the fixed eects model.
Distinguishing the two views in this way has little implication for estimation.
Random eects model
Motivation
Heteroscedastic error model
The RE model puts the unobserved heterogeneity into the error term by writing the panel data model
(Eq ) as
yit = xit β + vit , where vit = ci + uit . ()
The RE estimation needs exogeneity of xit , which can be decomposed into two parts:
Assumption .
The error term vit is serially correlated due to ci .
Strict exogeneity
Assumption . (repeated)
This implies:
It assumes that all unobservables do not create endogeneity bias for coeicients of xit .
Strict exogeneity
• Given (xit , ci ), E(yit |xi1 , ..., xiT , ci ) does not depend on xis for s ∕= t since
• yt must not have feedback from past {xτ }τ ≤t−1 and into future {xτ }τ ≥t+1 .
• It’s stronger than standard exogeneity condition E(uit |xit ) = 0 since it implies
Example : Program evaluation
either by individual or administrator
This can happen, for example, when low uit prompts individuals to participate in future job training program.
Example : Distributed lag model
where RDit is firm i’s R&D spending at time t, z it contains firm size. ci is unobserved firm
heterogeneity that may be correlated with current, past & future R&Ds.
Validity of Assumption
• Will today’s shocks uit to patents aect the future R&D spending?
• Will RD’s be allowed to depend on ci ?
Oen measured by sales revenue or employees.
Example : Lagged dependent variable
E[uit |xis , ci ] = 0 s ∕= t?
E[ci |xit ] = 0?
The answer to this question boils down to nature vs nurture in a causal relationship.
How to estimate the random eects model?
Assumption .
rank E(X ′i Ω−1 X i ) = K where ΩT ×T = E(Vi Vi′ ).
Random eects estimator
as N → ∞.
In this way, the RE estimator can improve the estimation eiciency if the covariates xit do not depend
on the unobserved heterogeneity. But when they do, endogeneity bias will arise.
What if we cannot rule out such possibility?
Ω̂ can be obtained from the residuals of an OLS regression in the first stage.
Fixed eects model
Fixed eects model
Remarks
Now xit is allowed to depend on ci .
For example, gender, race, industry, and city specific attributes. See next slide.
Dierencing out the fixed eects
The dierencing cancels out unobserved heterogeneity ci as well as all time-constant terms in xit .
Therefore, the source of endogeneity bias is eliminated, and we can apply the simple OLS for
estimation.
Fixed eects estimator
where ỹit = yit − ȳi , x̃it = xit − x̄i , and ũit = uit − ūi . Then the FE estimator is
N
−1 N
′ ′
β̂F E = X̃ i Ω̂−1 X̃ i X̃ i Ω̂−1 Ỹi , ()
i=1 i=1
where
N
1 ˆ ˆ′
Ω̂ = Ũi Ũi ,
N i=1
ˆ = Ỹ − X̃ β̂
Ũ i i i F EOLS ,
This FE estimator is called within estimator since it uses the time variation within each panel i.
Consistency of FE estimator
Assumption .
′
E(Ũ i Ũ i |X i , ci ) = Ω.
Under Assumptions -, β̂F E is consistent since the strict exogeneity implies
This shows why time-constant variables are not allowed in xit since the corresponding columns in X̃ i will be zero for all i.
Alternative FE approach
The FE estimator β̂F E remains unchanged with appropriately changed ỹit , x̃it , and ũit .
Under unrestricted covariance of Ũi , both dierencing methods generate no dierence asymptotically
(Wooldridge, ).
Which to choose between RE and FE?
. xit is endogenous to ci .
Hausman statistic
−1
H = (β̂F E − β̂RE )′ Avar(
ˆ β̂F E ) − Avar(
ˆ β̂RE ) (β̂F E − β̂RE ) ∼ χ2K .
Idea
Hierarchical Bayesian
Hierarchical Bayesian model
Hierarchical Bayesian model
. Likelihood
yit = xit βi + uit , uit ∼ F
. Prior
βi ∼ N (µi , σi )
. Hyperprior
µi ∼ N (µ0 , σ0 ), σi ∼ IG
Illustration: Demand estimation
Rossi et al. () estimate the consumer utility for canned tunas (thon)
uij = xj βi + ui
βi = ∆zi + vi vi ∼ N (0, Vβ ).
Consumer preference heterogeneity by demographics
Consumer preference heterogeneity estimated across datasets
Distribution of choice probabilities
Application: Domestic violence
A panel data model
Aizer () estimates the eect of wage gap between married couple on domestic violence using the
model
Impact of wage gaps on domestic violence
Application: Airbnb
Airbnb’s impact on housing prices
Panel data model
Barron et al. () analyze panel data of U.S. zip codes for –.
For zip code i at CBSA c in year-month t:
where
Estimating the eect of Airbnb on rental rates ln(ZRI)
Questions?
Bibliography
References
Aizer, Anna, “The Gender Wage Gap and Domestic Violence,” American Economic Review, September
, (), –.
Barron, Kyle, Edward Kung, and Davide Proserpio, “The Eect of Home-Sharing on House Prices and
Rents: Evidence from Airbnb,” Marketing Science, , (), –.
Card, David and Alan B. Krueger, “Minimum Wages and Employment: A Case Study of the Fast-Food
Industry in New Jersey and Pennsylvania,” American Economic Review, , (), –.
Rossi, Peter E., Robert E. McCulloch, and Greg M. Allenby, “The Value of Purchase History Data in
Target Marketing,” Marketing Science, , (), –.
Wooldridge, Jerey M, Econometric Analysis of Cross Section and Panel Data, MIT press, .