Ch10 Slides .Econometrics - MBA
Ch10 Slides .Econometrics - MBA
Panel Data
1
‘Introductory Econometrics for Finance’ © Chris Brooks 2008
The Nature of Panel Data
• Panel data, also known as longitudinal data, have both time series and cross-
sectional dimensions.
• They arise when we measure the same collection of people or objects over a
period of time.
• Econometrically, the setup is
There are a number of advantages from using a full panel technique when a
panel of data is available.
• We can address a broader range of issues and tackle more complex problems
with panel data than would be possible with pure time series or pure cross-
sectional data alone.
• One approach to making more full use of the structure of the data would be to
use the SUR framework initially proposed by Zellner (1962). This has been used
widely in finance where the requirement is to model several closely related
variables over time.
• A SUR is so-called because the dependent variables may seem unrelated across
the equations at first sight, but a more careful consideration would allow us to
conclude that they are in fact related after all.
• Under the SUR approach, one would allow for the contemporaneous
relationships between the error terms in the equations by using a generalised
least squares (GLS) technique.
• The idea behind SUR is essentially to transform the model so that the error terms
become uncorrelated. If the correlations between the error terms in the individual
equations had been zero in the first place, then SUR on the system of equations
would have been equivalent to running separate OLS regressions on each
equation.
• There are two main classes of panel techniques: the fixed effects estimator
and the random effects estimator.
• The fixed effects model for some variable yit may be written
yit xit i vit
• We can think of i as encapsulating all of the variables that affect yit cross-
sectionally but do not vary over time – for example, the sector that a firm
operates in, a person's gender, or the country where a bank has its
headquarters, etc. Thus we would capture the heterogeneity that is
encapsulated in i by a method that allows for different intercepts for each
cross sectional unit.
where D1i is a dummy variable that takes the value 1 for all observations on the
first entity (e.g., the first firm) in the sample and zero otherwise, D2i is a
dummy variable that takes the value 1 for all observations on the second entity
(e.g., the second firm) and zero otherwise, and so on.
• The LSDV can be seen as just a standard regression model and therefore it can
be estimated using OLS.
• Now the model given by the equation above has N+k parameters to estimate. In
order to avoid the necessity to estimate so many dummy variable parameters, a
transformation, known as the within transformation, is used to simplify matters.
• The within transformation involves subtracting the time-mean of each entity away
from the values of the variable.
• So define yi Tt1 yitas the time-mean of the observations for cross-sectional unit i,
and similarly calculate the means of all of the explanatory variables.
• Then we can subtract the time-means from each variable to obtain a regression
containing demeaned variables only.
• Note that such a regression does not require an intercept term since now the
dependent variable will have zero mean by construction.
• The model containing the demeaned variables is
yit yi ( xit xi ) uit ui
• We could write this as uit
yit xit
where the double dots above the variables denote the demeaned values.
• This model can be estimated using OLS, but we need to make a degrees of
freedom correction.
• Time-variation in the intercept terms can be allowed for in exactly the same way as
with entity fixed effects. That is, a least squares dummy variable model could be
estimated
yit xit 1D1t 2 D 2t ... T DTt vit
where D1t, for example, denotes a dummy variable that takes the value 1 for the first
time period and zero elsewhere, and so on.
• The only difference is that now, the dummy variables capture time variation rather
than cross-sectional variation. Similarly, in order to avoid estimating a model
containing all T dummies, a within transformation can be conducted to subtract
away the cross-sectional averages from each observation
• Finally, it is possible to allow for both entity fixed effects and time fixed effects
within the same model. Such a model would be termed a two-way error component
model, and the LSDV equivalent model would contain both cross-sectional and
time dummies
• The model also includes several variables that capture time-varying bank-
specific effects on revenues and costs, and these are: RISKASS, the ratio of
provisions to total assets; ASSET is bank size, as measured by total assets;
BR is the ratio of the bank's number of branches to the total number of
branches for all banks.
• Finally, GROWTHt is the rate of growth of GDP, which obviously varies
over time but is constant across banks at a given point in time; i is a bank-
specific fixed effects and vit is an idiosyncratic disturbance term. The
contestability parameter, H is given as 1 + 2 + 3
• Unfortunately, the Panzar-Rosse approach is only valid when applied to a
banking market in long-run equilibrium. Hence the authors also conduct a
test forlnthis,
ROAwhich centres
' ' ln PL on
the
' ln regression
PK ' ln PF ' ln RISKASS
it 0 1 it 2 it 3 it 1 it
• The explanatory variables for the equilibrium test regression are identical to
those of the contestability regression but the dependent variable is now the
log of the return on assets (lnROA).
• Equilibrium is argued to exist in the market if 1' + 2' + 3'
• Matthews et al. employ a fixed effects panel data model which allows for
differing intercepts across the banks, but assumes that these effects are fixed
over time.
• The fixed effects approach is a sensible one given the data analysed here
since there is an unusually large number of years (25) compared with the
number of banks (12), resulting in a total of 219 bank-years (observations).
• The data employed in the study are obtained from banks' annual reports and
the Annual Abstract of Banking Statistics from the British Bankers
Association. The analysis is conducted for the whole sample period, 1980-
2004, and for two sub-samples, 1980-1991 and 1992-2004.
• The null hypothesis that the bank fixed effects are jointly zero (H0: i = 0) is
rejected at the 1% significance level for the full sample and for the second
sub-sample but not at all for the first sub-sample.
• Overall, however, this indicates the usefulness of the fixed effects panel
model that allows for bank heterogeneity.
• The main focus of interest in the table on the previous slide is the
equilibrium test, and this shows slight evidence of disequilibrium (E is
significantly different from zero at the 10% level) for the whole sample, but
not for either of the individual sub-samples.
• Thus the conclusion is that the market appears to be sufficiently in a state of
equilibrium that it is valid to continue to investigate the extent of competition
using the Panzar-Rosse methodology. The results of this are presented on the
following slide.
• The value of the contestability parameter, H, which is the sum of the input
elasticities, falls in value from 0.78 in the first sub-sample to 0.46 in the second,
suggesting that the degree of competition in UK retail banking weakened over
the period.
• However, the results in the two rows above that show that the null hypotheses
that H = 0 and H = 1 can both be rejected at the 1% significance level for both
sub-samples, showing that the market is best characterised by monopolistic
competition.
• As for the equilibrium regressions, the null hypothesis that the fixed effects
dummies (i) are jointly zero is strongly rejected, vindicating the use of the fixed
effects panel approach and suggesting that the base levels of the dependent
variables differ.
• Finally, the additional bank control variables all appear to have intuitively
appealing signs. For example, the risk assets variable has a positive sign, so that
higher risks lead to higher revenue per unit of total assets; the asset variable has
a negative sign, and is statistically significant at the 5% level or below in all
three periods, suggesting that smaller banks are more profitable.
• Unlike the fixed effects model, there are no dummy variables to capture the
heterogeneity (variation) in the cross-sectional dimension.
• Instead, this occurs via the i terms.
• Note that this framework requires the assumptions that the new cross-
sectional error term, i, has zero mean, is independent of the individual
observation error term vit, has constant variance, and is independent of the
explanatory variables.
• The parameters ( and the vector) are estimated consistently but
inefficiently by OLS, and the conventional formulae would have to be
modified as a result of the cross-correlations between error terms for a given
cross-sectional unit at different points in time.
• Instead, a generalised least squares (GLS) procedure is usually used. The
transformation involved in this GLS procedure is to subtract a weighted
mean of the yit over time (i.e. part of the mean rather than the whole mean, as
was the case for fixed effects estimation).
• This transformation will be precisely that required to ensure that there are no cross-
correlations in the error terms, but fortunately it should automatically be
implemented by standard software packages.
• Just as for the fixed effects model, with random effects, it is also conceptually no
more difficult to allow for time variation than it is to allow for cross-sectional
variation.
• In the case of time-variation, a time period-specific error term is included and again,
a two-way model could be envisaged to allow the intercepts to vary both cross-
sectionally and over time.
• It is often said that the random effects model is more appropriate when the entities in
the sample can be thought of as having been randomly selected from the population,
but a fixed effect model is more plausible when the entities in the sample effectively
constitute the entire population.
• More technically, the transformation involved in the GLS procedure under the random
effects approach will not remove the explanatory variables that do not vary over time,
and hence their impact can be enumerated.
• Also, since there are fewer parameters to be estimated with the random effects model
(no dummy variables or within transform to perform), and therefore degrees of
freedom are saved, the random effects model should produce more efficient
estimation than the fixed effects approach.
• However, the random effects approach has a major drawback which arises from the
fact that it is valid only when the composite error term it is uncorrelated with all of
the explanatory variables.
• This assumption is more stringent than the corresponding one in the fixed
effects case, because with random effects we thus require both i and vit to be
independent of all of the xit.
• This can also be viewed as a consideration of whether any unobserved
omitted variables (that were allowed for by having different intercepts for
each entity) are uncorrelated with the included explanatory variables. If they
are uncorrelated, a random effects approach can be used; otherwise the fixed
effects model is preferable.
• A test for whether this assumption is valid for the random effects estimator is
based on a slightly more complex version of the Hausman test.
• If the assumption does not hold, the parameter estimates will be biased and
inconsistent.
• To see how this arises, suppose that we have only one explanatory variable,
x2it that varies positively with yit, and also with the error term, it. The
estimator will ascribe all of any increase in y to x when in reality some of it
arises from the error term, resulting in biased coefficients.
• The data cover the period 1993-2000 and are obtained from BankScope.
• These are: weakness parent bank, defined as loan loss provisions made by
the parent bank; solvency is the ratio of equity to total assets; liquidity is the
ratio of liquid assets / total assets; size is the ratio of total bank assets to total
banking assets in the given country; profitability is return on assets and
efficiency is net interest margin.
and the 's are parameters (or vectors of parameters in the cases of 4 and
5), i is the unobserved random effect that varies across banks but not over
time, and it is an idiosyncratic error term.
• de Haas and van Lelyveld discuss the various techniques that could be
employed to estimate such a model.
• OLS is considered to be inappropriate since it does not allow for differences
in average credit market growth rates at the bank level.
• A model allowing for entity-specific effects (i.e. a fixed effects model that
effectively allowed for a different intercept for each bank) is ruled out on the
grounds that there are many more banks than time periods and thus too many
parameters would be required to be estimated.
• They also argue that these bank-specific effects are not of interest to the
problem at hand, which leads them to select the random effects panel model.
• This essentially allows for a different error structure for each bank. A
Hausman test is conducted, and shows that the random effects model is valid
since the bank-specific effects i are found “in most cases not to be
significantly correlated with the explanatory variables.”
‘Introductory Econometrics for Finance’ © Chris Brooks 2008 Source: de Haas and van Lelyveld (2006)
Analysis of Results
• The main result is that during times of banking disasters, domestic banks
significantly reduce their credit growth rates (i.e. the parameter estimate on the
crisis variable is negative for domestic banks), while the parameter is close to
zero and not significant for foreign banks.
• This indicates that, as the authors expected, when foreign banks have fewer viable
lending opportunities in their own countries and hence a lower opportunity cost
for the loanable funds, they may switch their resources to the host country.
• Lending rates, both at home and in the host country, have little impact on credit
market share growth.