0% found this document useful (0 votes)
457 views48 pages

Panel 101

This document provides an overview of panel data analysis using Stata. It discusses fixed effects models, which control for time-invariant characteristics of observational units. The document explains that panel data is structured with entities observed over time, and discusses transforming wide format data to long format for analysis in Stata. It also covers setting up the data as a panel in Stata once in long format.

Uploaded by

Acilgunanin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
457 views48 pages

Panel 101

This document provides an overview of panel data analysis using Stata. It discusses fixed effects models, which control for time-invariant characteristics of observational units. The document explains that panel data is structured with entities observed over time, and discusses transforming wide format data to long format for analysis in Stata. It also covers setting up the data as a panel in Stata once in long format.

Uploaded by

Acilgunanin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Panel Data Analysis

Fixed and Random Effects


using Stata
(v. 6.0)

Oscar Torres-Reyna
otorres@princeton.edu

December 2007 http://www.princeton.edu/~otorres/


What panel data looks like…
Panel data (also known as Entity Year Y X1 X2 X3 …..
longitudinal or cross- 1 1 # # # # …..
sectional time-series data) 1 2 # # # # …..
is a dataset in which the 1 3 # # # # …..
behavior of entities (i) are
: : : : : : :
observed across time (t).
2 1 # # # # …..
2 2 # # # # …..
(Xit, Yit), i=1,…n; t=1,…T 2 3 # # # # …..
: : : : : : :

These entities could be 3 1 # # # # …..

states, companies, families, 3 2 # # # # …..


individuals, countries, etc. 3 3 # # # # …..

OTR See Stock and Watson, Introduction to Econometrics, chapter 10 “Regression with Panel Data”. 2
Usage
Panel data deals with omitted variable bias due to heterogeneity in
the data. It does this by controlling for variables that we cannot
observe, are not available, and/or can not be measured but are
correlated with the predictors. Two types:
1. Variables that do not change over time but vary across entities
(cultural factors, difference in business practices across
companies, etc.) → Entity fixed effects.
2. Variables that change over time but not across entities (i.e.
national policies, federal regulations, international
agreements, etc.) → Time fixed effects.
Some drawbacks when working with panel data are data collection
issues (i.e. sampling design, coverage), non-response in the case of
micro panels or cross-country dependency in the case of macro
panels (i.e. correlation between countries).
For a comprehensive list of advantages and disadvantages of panel data see Baltagi, Econometric
Analysis of Panel Data (chapter 1).
OTR 3
FIXED-EFFECTS MODEL
(Covariance Model, Within
Estimator, Individual Dummy
Variable Model, Least Squares
Dummy Variable Model)

OTR 4
The fixed effects idea
Entities have individual characteristics that may
or may not influence the outcome and/or
predictor variables. For example, the business
practices of a company may influence its stock
price or level of spending; attitudes or policies
towards guns in a particular state may affect its
levels of gun violence. Business practices,
cultural, or political variables are, most of the
time unavailable or hard to measure.
OTR 5
The fixed effects idea
Since individual characteristics are not random
and may impact the predictor or outcome
variables, we need to control for them. In this
way, the effect of the predictors will not be
influenced by those fixed characteristics.*
In entity’s fixed effects it is assumed a
correlation between the entity’s error term and
predictor variables. However, an entity’s fixed
effects cannot be correlated with another
entity’s.
OTR * See Stock and Watson, 2003, p.289-290 6
The model (1)
The entity fixed effects regression model is
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡
i = 1…n ; t = 1….T
Where:
𝑌𝑖𝑡 outcome variable (for entity i at time t).
𝛼𝑖 is the unknown intercept for each entity (n entity-specific intercepts).
𝑋𝑖𝑡 is a vector of predictors (for entity i at time t) .
𝑢𝑖 within-entity error term ; 𝑒𝑖𝑡 overall error term.

Interpretation of the 𝛽 coefficient: for a given entity, when a


predictor changes one unit over time, the outcome will
increase/decrease by 𝛽 units (assuming no transformation is
applied).* Here, 𝛽 represents a common effect across entities
controlling for individual heterogeneity.
* See Bartels, Brandom, “Beyond “Fixed Versus Random Effects”: A framework for improving substantive and statistical
OTR analysis of panel, time-series cross-sectional, and multilevel data”, Stony Brook University, working paper, 2008 7
The model (2)
The entity and time fixed effects regression model is
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝛿𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡
i = 1…n ; t = 1….T
Where:
𝑌𝑖𝑡 outcome variable (for entity i at time t).
𝛼𝑖 is the unknown intercept for each entity (n entity-specific intercepts).
𝑋𝑖𝑡 is a vector of predictors (for entity i at time t) .
𝛿𝑡 is the unknow coefficient for the time regressors (t)
𝑢𝑖 within-entity error term ; 𝑒𝑖𝑡 overall error term.

Interpretation of a 𝛽 coefficient: for a given entity, when a


predictor changes one unit over time, the outcome will
increase/decrease by 𝛽 units (assuming no transformation is
applied).* Here, 𝛽 represents a common effect across entities
controlling for individual and time heterogeneity.
* See Bartels, Brandom, “Beyond “Fixed Versus Random Effects”: A framework for improving substantive and statistical
OTR analysis of panel, time-series cross-sectional, and multilevel data”, Stony Brook University, working paper, 2008 8
The data: the long form
To analyze panel data:
Entity Year Y X1 X2 X3 …..
1 1 # # # # …..
• Variables should be in 1 2 # # # # …..

columns. 1 3 # # # # …..
: : : : : : :
• Entity and time in 2 1 # # # # …..
rows. 2 2 # # # # …..
2 3 # # # # …..
: : : : : : :
This format is known as 3 1 # # # # …..
long form. 3 2 # # # # …..
3 3 # # # # …..

OTR 9
Wide form data (time in columns)
If your dataset is in wide format, either entity or time
are in columns, you need to reshape it to long format
(you can do this in Stata).
Beware that Stata does not like numbers as column
names. You need to add a letter to the numbers
before importing into Stata. If you have something
like the following:

OTR 10
Wide form data (time in columns)
Add a letter to the numeric column names, for example,
an ‘x’ before the year:

Import into Stata

OTR 11
Reshaping from wide to
long

Once in Stata, you can reshape


it using the command
reshape:
gen id = _n

order id

reshape long x , i(id) j(year)

rename x gdp

Type help reshape for more details

OTR 12
Wide form data (entity in columns)
If the wide format data has the entities in column
and time in rows, like this example:

OTR 13
Wide form data (entity in columns)
Import it into Stata:

OTR 14
Reshape wide to long format
Once in Stata, you can reshape it
using the command reshape:
* Adding the prefix ‘gdp’ to column names.
Command ‘renvars’ is user-written, you need
to install it, see note below

renvars A-G, pref(gdp)

gen id = _n
order id
reshape long gdp , i(id) j(country) str

Type help reshape for more details.


You need to install renvars, type:
search renvars
OTR Click on the link for dm88_* then install. 15
Setting data as panel
Once the data is in long form, we need to set it as panel so we can
use Stata’s panel data xt commands and the time series operators.
Using the example from the previous page type:

xtset country year


string variables not allowed in varlist;
Country is a string variable

Given the error, we need to have ‘country’ in numeric format.


Type
encode country, gen(country1)

Balanced panel: all entities are


Then using ‘country1’ type observed across all times.
Unbalanced panel: some entities
are not observed in some years.
xtset country1 year Stata algorithms automatically
Panel variable: country1 (strongly balanced)
Time variable: year, 1995 to 2005 account for this.
Delta: 1 unit
OTR 16
Assign numbers to strings

The encode command used in


the previous slide assigns a
number to the string variable in
alphabetical order.
The new variable is a labeled
variable where the labels are
the original strings assigned to
specific number.
Notice that string variables
have the color red, while
labeled variables have color
blue.
Type help encode for more
info.
OTR 17
Visualizing panel data
Once the data is set as panel, you can use a series of xt commands
to analyze it. For more information type:
help xt
A useful visualization command is xtline, type:
xtline gdp

OTR 18
Visualizing panel data
* All in one, type:
xtline gdp, overlay

OTR 19
Data example
The data used in the following slides was extracted from the World
Development Indicators database:
https://databank.worldbank.org/source/world-development-indicators

Selected variables since 2000, all countries only:

• GDP per capita (constant 2015 US$)


• Exports of goods and services (constant 2015 US$)
• Imports of goods and services (constant 2015 US$)
• Labor force, total

Data was further cleaned to remove regions, subregions, and missing values
across years and variables resulting in 126 countries.
Variable ‘trade’ was added by adding imports + exports.

OTR 20
Data example – histograms

hist gdppc
hist labor
hist trade

OTR 21
Data example – transformations

To log-transformed a variable use the function ln():


gen ln_gdppc = ln(gdppc)
gen ln_labor = ln(labor)
gen ln_trade = ln(trade)
If the variable has negative values, you need to add a value high
enough so the minimum value is over zero (preferable 1). For
example, if the lowest value in ‘varX’ is -1, then type:
gen ln_varX = ln(varX + 2)
The natural log of 1 is zero.

OTR 22
Data example – histograms

hist ln_gdppc
hist ln_labor
hist ln_trade

OTR 23
Setting data as panel
The panel variable (country) is in string format (red color, type
browse country to see it), we need to convert it to labeled
format (numbers with labels, blue color):

encode country, gen(country1)

Then using ‘country1’ type

xtset country1 year

Panel variable: country1 (strongly balanced)


Time variable: year, 2000 to 2021
Delta: 1 unit

OTR 24
Descriptive statistics
. sum gdppc trade labor // Pooled data

Variable | Obs Mean Std. dev. Min Max


-------------+---------------------------------------------------------
gdppc | 2,772 14925.78 19561 261.0194 112417.9
trade | 2,772 2.39e+11 5.33e+11 1.28e+08 5.58e+12
labor | 2,772 1.70e+07 4.54e+07 85987 4.89e+08

. xtsum gdppc trade labor // Heterogeneity by panel and time

Variable | Mean Std. dev. Min Max | Observations


-----------------+--------------------------------------------+----------------
gdppc overall | 14925.78 19561 261.0194 112417.9 | N = 2772
between | 19404.61 293.4895 104003.7 | n = 126
within | 2991.204 -14918.74 52165.38 | T = 22
| |
trade overall | 2.39e+11 5.33e+11 1.28e+08 5.58e+12 | N = 2772
between | 5.20e+11 3.14e+08 4.33e+12 | n = 126
within | 1.27e+11 -1.14e+12 1.49e+12 | T = 22
| |
labor overall | 1.70e+07 4.54e+07 85987 4.89e+08 | N = 2772
between | 4.54e+07 132657 4.53e+08 | n = 126
within | 3154440 -4.24e+07 5.27e+07 | T = 22

OTR See https://www.stata.com/manuals/xtxtsum.pdf 25


Fixed effects regression using xtreg, fe
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡
Fixed effects option

Controlling for Total number of


Outcome Predictor(s)
heteroskedasticity cases (rows) Total number of entities (i)

. xtreg ln_gdppc ln_trade ln_labor, fe robust

Fixed-effects (within) regression Number of obs = 2,772


If this number is < 0.05 then
Group variable: country1 Number of groups = 126
your model is ok. This is an F-
The within entity errors ui
test to see whether all the
are correlated with the R-squared: Obs per group:
coefficients in the model are
regressors in the fixed Within = 0.6267 min = 22
jointly different than zero.
effects model. Between = 0.3872 avg = 22.0
Overall = 0.3906 max = 22

F(2,125) = 87.57
corr(u_i, Xb) = 0.1067 Prob > F = 0.0000

Beta coefficients indicate the (Std. err. adjusted for 126 clusters in country1) Two-tail p-values test the
change in the output (y) when ------------------------------------------------------------------------------ hypothesis that each coefficient is
the predictors change one | Robust different from 0 (according to its
unit over time. In this ln_gdppc | Coefficient std. err. t P>|t| [95% conf. interval] t-value).
example, all the variables are -------------+---------------------------------------------------------------- A value lower than 0.05 will reject
log-transformed, the ln_trade | .3603947 .0737076 4.89 0.000 .2145182 .5062712 the null and conclude that the
interpretation is: when the ln_labor | .053167 .1608747 0.33 0.742 -.265224 .371558 predictor has a significant effect
predictor increases 1% over _cons | -.9384681 1.075791 -0.87 0.385 -3.067592 1.190656 on the outcome (95%
time, the output (y) changes -------------+---------------------------------------------------------------- significance).
𝛽% (elasticity). sigma_u | 1.1155513
sigma_e | .10989953
rho | .99038791 (fraction of variance due to u_i)
------------------------------------------------------------------------------

Intraclass correlation (rho), shows how much


of the variance in the output is explained by 𝑠𝑖𝑔𝑚𝑎_𝑢 2
the difference across entities. In this example 𝑟ℎ𝑜 =
is 99%.
𝑠𝑖𝑔𝑚𝑎_𝑢 2 + 𝑠𝑖𝑔𝑚𝑎_𝑒 2

sigma_u = sd of residuals within groups 𝑢𝑖


sigma_e = sd of residuals (overall error term) 𝑒𝑖𝑡
OTR 26
Entity and time fixed effects regression using xtreg, fe
Time fixed effects
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝛿𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡 Total number of
Fixed effects option cases (rows)

Controlling for
Outcome Predictor(s)
heteroskedasticity Total number of entities (i)

. xtreg ln_gdppc ln_trade ln_labor i.year, fe robust

Fixed-effects (within) regression Number of obs = 2,772


If this number is < 0.05 then
Group variable: country1 Number of groups = 126
your model is ok. This is an F-
The within entity errors ui
test to see whether all the
are correlated with the R-squared: Obs per group:
coefficients in the model are
regressors in the fixed Within = 0.7083 min = 22
jointly different than zero.
effects model. Between = 0.7977 avg = 22.0
Overall = 0.7581 max = 22

F(23,125) = 34.28
corr(u_i, Xb) = 0.7525 Prob > F = 0.0000

Beta coefficients indicate the (Std. err. adjusted for 126 clusters in country1) Two-tail p-values test the
change in the output (y) when ------------------------------------------------------------------------------ hypothesis that each coefficient is
the predictors change one | Robust different from 0 (according to its
unit over time. In this ln_gdppc | Coefficient std. err. t P>|t| [95% conf. interval] t-value).
example, all the variables are -------------+---------------------------------------------------------------- A value lower than 0.05 will reject
log-transformed, the ln_trade | .2401329 .0695213 3.45 0.001 .1025416 .3777242 the null and conclude that the
interpretation is: when the ln_labor | -.2958837 .081081 -3.65 0.000 -.456353 -.1354145 predictor has a significant effect
predictor increases 1% over | on the outcome (95%
time, the output (y) changes year | significance).
𝛽% (elasticity). 2001 | .0119809 .0042779 2.80 0.006 .0035144 .0204475
... ... ... ... ... ... ...
... ... ... ... ... ... ...
2021 | .2878247 .0705454 4.08 0.000 .1482065 .4274428
|
Intraclass correlation (rho), _cons | 7.213881 1.961627 3.68 0.000 3.331578 11.09619
shows how much of the -------------+----------------------------------------------------------------
variance in the output is sigma_u | 1.0561892
explained by the difference sigma_u = sd of residuals within groups 𝑢𝑖
sigma_e | .09753735
across entities. In this sigma_e = sd of residuals (overall error term) 𝑒𝑖𝑡
rho | .99154389 (fraction of variance due to u_i)
example is 99%. ------------------------------------------------------------------------------

OTR 𝑠𝑖𝑔𝑚𝑎_𝑢 2 27
𝑟ℎ𝑜 = 2 2
𝑠𝑖𝑔𝑚𝑎_𝑢 + 𝑠𝑖𝑔𝑚𝑎_𝑒
Fixed effects regression using xtreg, fe (with lags on predictors)
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡−1 + 𝑢𝑖 + 𝑒𝑖𝑡
Fixed effects option

Controlling for Total number of


Outcome Predictor(s)
heteroskedasticity cases (rows) Total number of entities (i)

. xtreg ln_gdppc l1.ln_trade l1.ln_labor, fe robust

Fixed-effects (within) regression Number of obs = 2,646


If this number is < 0.05 then
Group variable: country1 Number of groups = 126
your model is ok. This is an F-
The within entity errors ui
test to see whether all the
are correlated with the R-squared: Obs per group:
coefficients in the model are
regressors in the fixed Within = 0.6054 min = 21
jointly different than zero.
effects model. Between = 0.3771 avg = 21.0
Overall = 0.3799 max = 21

F(2,125) = 81.17
corr(u_i, Xb) = 0.1265 Prob > F = 0.0000

(Std. err. adjusted for 126 clusters in country1) Two-tail p-values test the
Beta coefficients indicate hypothesis that each coefficient is
the change in the output ------------------------------------------------------------------------------
| Robust different from 0 (according to its
(y) when the predictors one t-value).
unit over time (a year ln_gdppc | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+---------------------------------------------------------------- A value lower than 0.05 will reject
before –”L1.”). In this the null and conclude that the
example, all the variables ln_trade |
L1. | .3385586 .0703993 4.81 0.000 .1992297 .4778875 predictor has a significant effect
are log-transformed, the on the outcome (95%
interpretation is: when the |
ln_labor | significance).
predictor increases 1% over
time (a year before –”L1.”), L1. | .0581167 .1566956 0.37 0.711 -.2520033 .3682367
the output (y) changes 𝛽% |
(elasticity). _cons | -.4600892 1.082489 -0.43 0.672 -2.60247 1.682291
-------------+----------------------------------------------------------------
sigma_u | 1.1260807
Intraclass correlation (rho), sigma_e | .10685653
shows how much of the rho | .99107579 (fraction of variance due to u_i)
variance in the output is ------------------------------------------------------------------------------
explained by the difference
across entities. In this 𝑠𝑖𝑔𝑚𝑎_𝑢 2 sigma_u = sd of residuals within groups 𝑢𝑖
example is about 98%. 𝑟ℎ𝑜 = sigma_e = sd of residuals (overall error term) 𝑒𝑖𝑡
𝑠𝑖𝑔𝑚𝑎_𝑢 2 + 𝑠𝑖𝑔𝑚𝑎_𝑒 2
OTR 28
Entity fixed effects regression using reghdfe
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡
Fixed effects option Controlling for
correlation within
panels Total number of
Outcome Predictor(s) cases (rows)

. reghdfe ln_gdppc ln_trade ln_labor , absorb(country1) vce(cluster country1)


(MWFE estimator converged in 1 iterations) If this number is < 0.05 then
your model is ok. This is an F-
HDFE Linear regression Number of obs = 2,772 test to see whether all the
Absorbing 1 HDFE group F( 2, 125) = 87.57 coefficients in the model are
Statistics robust to heteroskedasticity Prob > F = 0.0000 jointly different than zero.
Total number of entities (i) R-squared = 0.9943
Adj R-squared = 0.9940
Within R-sq. = 0.6267
Number of clusters (country1) = 126 Root MSE = 0.1099
R-squared shows the percent
(Std. err. adjusted for 126 clusters in country1) of the variance in the outcome
------------------------------------------------------------------------------ explained by the model. The Adj
Beta coefficients indicate | Robust R-squared, accounts for the
the change in the output (y) ln_gdppc | Coefficient std. err. t P>|t| [95% conf. interval] number of variables and their
when the predictors change -------------+---------------------------------------------------------------- significant contribution to
one unit over time. In this ln_trade | .3603947 .0737076 4.89 0.000 .2145182 .5062712 explaining the variation in the
example, all the variables ln_labor | .053167 .1608747 0.33 0.742 -.265224 .371558 output variable.
are log-transformed, the _cons | -.9384681 1.075791 -0.87 0.385 -3.067592 1.190656
interpretation is: when the ------------------------------------------------------------------------------
predictor increases 1% over
time, the output (y) changes Absorbed degrees of freedom:
𝛽% (elasticity). -----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs | Two-tail p-values test the
-------------+---------------------------------------| hypothesis that each coefficient is
country1 | 126 126 0 *| different from 0 (according to its
-----------------------------------------------------+ t-value).
* = FE nested within cluster; treated as redundant for DoF computation A value lower than 0.05 will reject
the null and conclude that the
predictor has a significant effect
on the outcome (95%
significance).
NOTE: Use reghdfe when controlling for multiple fixed effects or when xtreg,fe cannot run due to the number
OTR of panels. 29
Entity and time fixed effects regression using reghdfe
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡
Fixed effects option Controlling for
correlation within
panels Total number of
Outcome Predictor(s) cases (rows)

. reghdfe ln_gdppc ln_trade ln_labor , absorb(country1 year) vce(cluster country1)


(MWFE estimator converged in 2 iterations) If this number is < 0.05 then
your model is ok. This is an F-
HDFE Linear regression Number of obs = 2,772 test to see whether all the
Absorbing 2 HDFE groups F( 2, 125) = 11.37 coefficients in the model are
Statistics robust to heteroskedasticity Prob > F = 0.0000 jointly different than zero.
Total number of entities (i) R-squared = 0.9955
Adj R-squared = 0.9953
Within R-sq. = 0.3050
Number of clusters (country1) = 126 Root MSE = 0.0976
R-squared shows the percent
(Std. err. adjusted for 126 clusters in country1) of the variance in the outcome
------------------------------------------------------------------------------ explained by the model. The Adj
Beta coefficients indicate | Robust R-squared, accounts for the
the change in the output (y) ln_gdppc | Coefficient std. err. t P>|t| [95% conf. interval] number of variables and their
when the predictors change -------------+---------------------------------------------------------------- significant contribution to
one unit over time. In this ln_trade | .2401329 .0695213 3.45 0.001 .1025416 .3777242 explaining the variation in the
example, all the variables ln_labor | -.2958837 .081081 -3.65 0.000 -.456353 -.1354145 output variable.
are log-transformed, the _cons | 7.381277 1.999695 3.69 0.000 3.423632 11.33892
interpretation is: when the ------------------------------------------------------------------------------
predictor increases 1% over
time, the output (y) changes Absorbed degrees of freedom:
𝛽% (elasticity). -----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs | Two-tail p-values test the
-------------+---------------------------------------| hypothesis that each coefficient is
country1 | 126 126 0 *| different from 0 (according to its
year | 22 0 22 | t-value).
-----------------------------------------------------+ A value lower than 0.05 will reject
* = FE nested within cluster; treated as redundant for DoF computation the null and conclude that the
predictor has a significant effect
on the outcome (95%
significance).

OTR 30
Entity fixed effects regression with lags using reghdfe
𝑌𝑖𝑡 = 𝛼𝑖 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖 + 𝑒𝑖𝑡
Fixed effects option Controlling for
correlation within
panels Total number of
Outcome Predictor(s) cases (rows)

reghdfe ln_gdppc l1.ln_trade l1.ln_labor , absorb(country1 ) vce(cluster country1)


(MWFE estimator converged in 1 iterations) If this number is < 0.05 then
your model is ok. This is an F-
HDFE Linear regression Number of obs = 2,646 test to see whether all the
Absorbing 1 HDFE group F( 2, 125) = 81.17 coefficients in the model are
Statistics robust to heteroskedasticity Prob > F = 0.0000 jointly different than zero.
Total number of entities (i) R-squared = 0.9946
Adj R-squared = 0.9943
Within R-sq. = 0.6054
Number of clusters (country1) = 126 Root MSE = 0.1069
R-squared shows the percent
(Std. err. adjusted for 126 clusters in country1) of the variance in the outcome
------------------------------------------------------------------------------ explained by the model. The Adj
Beta coefficients indicate | Robust R-squared, accounts for the
the change in the output (y) ln_gdppc | Coefficient std. err. t P>|t| [95% conf. interval] number of variables and their
when the predictors change -------------+---------------------------------------------------------------- significant contribution to
one unit over time. In this ln_trade | explaining the variation in the
example, all the variables L1. | .3385586 .0703993 4.81 0.000 .1992297 .4778875 output variable.
are log-transformed, the |
interpretation is: when the ln_labor |
predictor increases 1% over L1. | .0581167 .1566956 0.37 0.711 -.2520033 .3682367
time, the output (y) changes |
𝛽% (elasticity). _cons | -.4600892 1.082489 -0.43 0.672 -2.60247 1.682291
------------------------------------------------------------------------------ Two-tail p-values test the
hypothesis that each coefficient is
Absorbed degrees of freedom: different from 0 (according to its
-----------------------------------------------------+ t-value).
Absorbed FE | Categories - Redundant = Num. Coefs | A value lower than 0.05 will reject
-------------+---------------------------------------| the null and conclude that the
country1 | 126 126 0 *| predictor has a significant effect
-----------------------------------------------------+ on the outcome (95%
* = FE nested within cluster; treated as redundant for DoF computation significance).

OTR NOTE: must type xtset country1 year, before using lags in reghdfe 31
A note on fixed effects
“...The fixed-effects model controls for all time-invariant
differences between the individuals, so the estimated coefficients
of the fixed-effects models cannot be biased because of omitted
time-invariant characteristics...[like culture, religion, gender, race,
etc].
One side effect of the features of fixed-effects models is that they
cannot be used to investigate time-invariant causes of the
dependent variables. Technically, time-invariant characteristics of
the individuals are perfectly collinear with the person [or entity]
dummies. Substantively, fixed-effects models are designed to
study the causes of changes within a person [or entity]. A time-
invariant characteristic cannot cause such a change, because it is
constant for each person.” [(Underline is mine) Kohler, Ulrich,
Frauke Kreuter, Data Analysis Using Stata, 2nd ed., p.245]

OTR 32
RANDOM-EFFECTS MODEL
(Random Intercept, Partial
Pooling Model)

OTR 33
The random effects idea
The rationale behind random effects model is that, unlike the
fixed effects model, the variation across entities is assumed
to be random and uncorrelated with the predictor or
independent variables included in the model:
“...the crucial distinction between fixed and random effects is
whether the unobserved individual effect embodies elements that
are correlated with the regressors in the model, not whether these
effects are stochastic or not” [Green, 2008, p.183]
If you have reason to believe that differences across entities
have some influence on your dependent variable then you
should use random effects. An advantage of random effects is
that you can include time invariant variables (i.e. gender). In
the fixed effects model these variables are absorbed by the
intercept.
OTR 34
The random effects idea
Random effects assume that the entity’s error term is not
correlated with the predictors which allows for time-
invariant variables to play a role as explanatory variables.
In random-effects you need to specify those individual
characteristics that may or may not influence the
predictor variables. The problem with this is that some
variables may not be available therefore leading to
omitted variable bias in the model.
RE allows to generalize the inferences beyond the sample
used in the model.

OTR 35
Random effects regression using xtreg, re
𝑌𝑖𝑡 = 𝛼 + 𝛽𝑋𝑖𝑡 + 𝑢𝑖𝑡 + 𝑒𝑖𝑡
Random effects option

Controlling for Total number of


Outcome Predictor(s)
heteroskedasticity cases (rows) Total number of entities (i)

. xtreg ln_gdppc ln_trade ln_labor, re robust

Random-effects GLS regression Number of obs = 2,772


If this number is < 0.05 then
Group variable: country1 Number of groups = 126
your model is ok. This is an F-
The between entity errors
test to see whether all the
uit are uncorrelated with R-squared: Obs per group:
coefficients in the model are
the regressors in the Within = 0.6110 min = 22
jointly different than zero.
random effects model. Between = 0.7295 avg = 22.0
Overall = 0.7212 max = 22

Wald chi2(2) = 192.71


corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

Beta coefficients indicate the (Std. err. adjusted for 126 clusters in country1) Two-tail p-values test the
change in the output (y) when ------------------------------------------------------------------------------ hypothesis that each coefficient is
the predictors change one | Robust different from 0 (according to its
unit over time and across ln_gdppc | Coefficient std. err. z P>|z| [95% conf. interval] t-value).
entities (average effect). In -------------+---------------------------------------------------------------- A value lower than 0.05 will reject
this example, all the variables ln_trade | .4175909 .0760404 5.49 0.000 .2685543 .5666274 the null and conclude that the
are log-transformed, the ln_labor | -.1597685 .1312262 -1.22 0.223 -.4169671 .0974302 predictor has a significant effect
interpretation is: when the _cons | .9295612 .6361615 1.46 0.144 -.3172923 2.176415 on the outcome (95%
predictor increases, on -------------+---------------------------------------------------------------- significance).
average, 1%, the output (y) sigma_u | .41594682
changes 𝛽% (elasticity). sigma_e | .10989953
rho | .93474564 (fraction of variance due to u_i)
------------------------------------------------------------------------------

Intraclass correlation (rho), shows how much


of the variance in the output is explained by 𝑠𝑖𝑔𝑚𝑎_𝑢 2
the difference across entities. In this example 𝑟ℎ𝑜 =
is 99%.
𝑠𝑖𝑔𝑚𝑎_𝑢 2 + 𝑠𝑖𝑔𝑚𝑎_𝑒 2

sigma_u = sd of residuals within groups 𝑢𝑖


sigma_e = sd of residuals (overall error term) 𝑒𝑖𝑡
OTR 36
FIXED OR RANDOM?

OTR 37
Which to choose?
Whenever there is a clear idea that individual characteristics of
each entity or group affect the regressors, use fixed effects. For
example, macroeconomic data collected for most countries
overtime. There might be a good reason to believe that
countries’ economic performance may be affected by their
own internal characteristics: type of government, political
environment, cultural characteristics, type of public policies,
etc.
Random effects is used whenever there is reason to believe
that individual characteristics have no effect on the regressors
(uncorrelated).

OTR 38
Which to choose?
The Hausman-test tests whether the individual characteristics are correlated with the regressors
(see Green, 2008, chapter 9). The null hypothesis is that they are not (random effects).

xtreg ln_gdppc ln_trade ln_labor, fe


estimates store fixed
xtreg ln_gdppc ln_trade ln_labor, re
estimates store random
hausman fixed random, sigmamore
. hausman fixed random, sigmamore
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| fixed random Difference Std. err.
-------------+----------------------------------------------------------------
ln_trade | .3603947 .4175909 -.0571962 .0026039
ln_labor | .053167 -.1597685 .2129354 .012825
------------------------------------------------------------------------------
b = Consistent under H0 and Ha; obtained from xtreg.
B = Inconsistent under Ha, efficient under H0; obtained from xtreg.
Test of H0: Difference in coefficients not systematic
chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 484.43
Prob > chi2 = 0.0000 If Prob > chi2 is < 0.05 use fixed effects

OTR 39
TESTS / DIAGNOSTICS

OTR 40
Do we need time fixed effects?
To see if time fixed effects are needed when running a FE model use
the command testparm. It is a joint F-test to if all years jointly
equal to 0 (type help testparm for more details).

xtreg ln_gdppc ln_trade ln_labor i.year, fe robust


testparm i.year
. testparm i.year

( 1) 2001.year = 0
( 2) 2002.year = 0
( 3) 2003.year = 0
( 4) 2004.year = 0
( 5) 2005.year = 0
( 6) 2006.year = 0
( 7) 2007.year = 0
( 8) 2008.year = 0
( 9) 2009.year = 0
(10) 2010.year = 0
(11) 2011.year = 0
(12) 2012.year = 0 The Prob > F is < 0.05, we fail to
(13) 2013.year = 0
(14) 2014.year = 0 accept the null that the coefficients for
(15) 2015.year = 0
(16) 2016.year = 0 the years are jointly equal to zero. In this
(17) 2017.year = 0
(18) 2018.year = 0
case, time fixed effects are needed.
(19) 2019.year = 0
(20) 2020.year = 0
(21) 2021.year = 0

F( 21, 125) = 4.44


OTR Prob > F = 0.0000 41
Do we need random effects?
The LM test helps you decide between a random effects regression
and a simple OLS regression. The null hypothesis in the LM test is
that variances across entities is equal to zero. This is, no significant
difference across units (i.e. no panel effect). The command in Stata
is xttset0 type it right after running the random effects model

xtreg ln_gdppc ln_trade ln_labor, re robust


xttest0
. xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

ln_gdppc[country1,t] = Xb + u[country1] + e[country1,t]

Estimated results:
| Var SD = sqrt(Var)
---------+-----------------------------
ln_gdppc | 2.022383 1.422105
e | .0120779 .1098995
u | .1730118 .4159468

Test: Var(u) = 0 Prob > chibar2 < 0.05, we fail to


chibar2(01) = 19981.51
accept the null hypothesis and conclude
Prob > chibar2 = 0.0000
that random effects are needed.
OTR 42
Are the panels correlated? [B-P/LM test]
According to Baltagi, cross-sectional dependence is a problem in macro panels
with long time series (over 20-30 years). The null hypothesis in the B-P/LM test of
independence is that residuals across entities are not correlated. The user-
defined command to run this test is xttest2 (run it after xtreg, fe):

ssc install xttest2

xtreg ln_gdppc ln_trade ln_labor, fe robust

xttest2

. xttest2

Correlation matrix of residuals:

[OMITTED]

Breusch-Pagan LM test of independence: chi2(7875) = 73886.228, Pr = 0.0000


Based on 22 complete observations over panel units

Pr < 0.05, we fail to accept the null hypothesis and conclude that panel are
correlated (cross-sectional dependence).

OTR 43
Are the panels correlated? [Pasaran CD test]
As mentioned in the previous slide, cross-sectional dependence is more of an issue in macro panels
with long time series (over 20-30 years) than in micro panels.
Pasaran CD (cross-sectional dependence) test is used to test whether the residuals are
correlated across entities*. Cross-sectional dependence can lead to bias in tests results (also called
contemporaneous correlation). The null hypothesis is that residuals are not correlated. The command
for the test is xtcsd, you have to install it typing:

ssc install xtcsd Pr < 0.05, we fail to accept the null


xtreg ln_gdppc ln_trade ln_labor, fe robust hypothesis and conclude that panel
are correlated (cross-sectional
xtcsd, pesaran abs dependence).

. xtcsd, pesaran abs

Pesaran's test of cross sectional independence = 9.266, Pr = 0.0000

Average absolute value of the off-diagonal elements = 0.588

Had cross-sectional dependence be present Hoechle suggests to use Driscoll and Kraay standard errors
using the command xtscc (install it by typing ssc install xtscc). Type help xtscc for more
details.
*Source: Hoechle, Daniel, “Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence”,
http://fmwww.bc.edu/repec/bocode/x/xtscc_paper.pdf

OTR 44
Testing for heteroskedasticity
A test for heteroskedasticiy is avalable for the fixed- effects model using the
command xttest3. The null hyphotesis is homoskedasticity (or constant
variance). This is a user-written program, to install it type:
ssc install xttest3
xtreg ln_gdppc ln_trade ln_labor, fe robust
xttest3
. xttest3
Modified Wald test for groupwise heteroskedasticity
in fixed effect regression model

H0: sigma(i)^2 = sigma^2 for all i

chi2 (126) = 3.3e+05 We reject the null and conclude


heteroskedasticity.
Prob>chi2 = 0.0000

NOTE: Use the option ‘robust’ to obtain heteroskedasticity-robust standard errors (also known
as Huber/White or sandwich estimators).
OTR 45
Testing for serial correlation
Serial correlation tests apply to macro panels with long time series (over 20-30 years).
Not a problem in micro panels (with very few years). Serial correlation causes the
standard errors of the coefficients to be smaller than they actually are and higher R-
squared. A Lagram-Multiplier test for serial correlation is available using the command
xtserial. This is a user-written program, to install it type:
ssc install xtserial
xtreg ln_gdppc ln_trade ln_labor, fe robust
xtserial ln_gdppc ln_trade ln_labor
. xtserial ln_gdppc ln_trade ln_labor

Wooldridge test for autocorrelation in panel data


H0: no first order autocorrelation
F( 1, 125) = 289.854
We reject the null and conclude
Prob > F = 0.0000 serial correlation.

The null is no serial correlation. Above we fail to reject the null and conclude the data does not have first-
order autocorrelation. Type help xtserial for more details.
OTR 46
Source: Hoechle, Daniel, “Robust Standard Errors for Panel Regressions with Cross-Sectional
OTR 47
Dependence”, page 4, http://fmwww.bc.edu/repec/bocode/x/xtscc_paper.pdf
Suggested books / references
• Introduction to econometrics / James H. Stock, Mark W. Watson. 2nd ed., Boston: Pearson
Addison Wesley, 2007.
• Econometric Analysis of Panel Data, Badi H. Baltagi, Wiley, 2008.
• Econometric Analysis / William H. Greene. 6th ed., Upper Saddle River, N.J. : Prentice Hall, 2008.
• An Introduction to Modern Econometrics Using Stata/ Christopher F. Baum, Stata Press, 2006.
• Data analysis using regression and multilevel/hierarchical models / Andrew Gelman, Jennifer Hill.
Cambridge ; New York : Cambridge University Press, 2007.
• Data Analysis Using Stata/ Ulrich Kohler, Frauke Kreuter, 2 nd ed., Stata Press, 2009.
• Statistics with Stata / Lawrence Hamilton, Thomson Books/Cole, 2006.
• Statistical Analysis: an interdisciplinary introduction to univariate & multivariate methods / Sam
Kachigan, New York : Radius Press, c1986
• “Beyond “Fixed Versus Random Effects”: A framework for improving substantive and statistical
analysis of panel, time-series cross-sectional, and multilevel data” / Brandom Bartels
http://polmeth.wustl.edu/retrieve.php?id=838
• “Robust Standard Errors for Panel Regressions with Cross-Sectional Dependence” / Daniel
Hoechle, http://fmwww.bc.edu/repec/bocode/x/xtscc_paper.pdf
• Designing Social Inquiry: Scientific Inference in Qualitative Research / Gary King, Robert
O.Keohane, Sidney Verba, Princeton University Press, 1994.
• Unifying Political Methodology: The Likelihood Theory of Statistical Inference / Gary King,
Cambridge University Press, 1989.

OTR 48

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy