Within-And Between-Cluster Effects in Generalized Linear Mixed Models: A Discussion of Approaches and The Xthybrid Command
Within-And Between-Cluster Effects in Generalized Linear Mixed Models: A Discussion of Approaches and The Xthybrid Command
Francisco Perales
Institute for Social Science Research
University of Queensland
Brisbane, Australia
f.perales@uq.edu.au
1 Introduction
Researchers undertaking multilevel and panel analysis of hierarchically clustered data
often face a difficult decision between random- and fixed-effects models. Random-ef-
fects models allow researchers to estimate the effect of cluster-invariant variables (that
is, level-two variables) on the outcome variable but impose the assumption that the
random effects (for example, the level-two error) are uncorrelated with the observed
covariates. If this assumption is violated, the model coefficients are biased. Fixed-
effects models, on the other hand, do not require this assumption—and can provide
unbiased estimates of the level-one variables, even if there is unobserved heterogeneity
c 2017 StataCorp LLC st0468
90 Within- and between-cluster effects in GLMM
at the cluster level. However, fixed-effects model estimation relies only on within-cluster
variation in the explanatory and outcomes variables; thus these models cannot provide
effect estimates for the level-two variables.1
More flexible modeling specifications provide fixed-effects estimates (or estimates
that are close to these) for level-one variables and allow inclusion of level-two vari-
ables, most notably the hybrid (Allison 2009) and correlated random-effects models
(Wooldridge 2010). The latter is also known as the Mundlak model (Baltagi 2006;
Mundlak 1978). These estimation strategies differentiate within- and between-cluster
effects and combine the strengths of random- and fixed-effects models. In the linear case,
they yield estimates of the level-one covariates that are unbiased by cluster-level un-
observed heterogeneity, while allowing for level-two cluster-invariant covariates (Allison
2009; Mundlak 1978; Neuhaus and Kalbfleisch 1998; Rabe-Hesketh and Skrondal 2012;
Raudenbush 1989; Schunck 2013; Snijders and Berkhof 2008).
Schunck (2013) described using these models for continuous outcome variables, pro-
viding a theoretical overview and a practical application in Stata. This article discusses
the applicability of hybrid and correlated random-effects models within the umbrella
of generalized linear mixed models (GLMM) (Brumback et al. 2010). In doing so, we
show how the decomposition of within- and between-cluster effects can be extended
to GLMM, which comprise popular models for binary, ordered, and count outcomes
(Neuhaus and Kalbfleisch 1998; Neuhaus and McCulloch 2006; Brumback et al. 2010).
Importantly, such decomposition can approximate fixed-effects estimates for specifica-
tions in which a fixed-effects estimator is not available or implemented (Neuhaus and
McCulloch 2006).
In the remainder of the article, we first elaborate on the separation of within- and
between-cluster effects in a GLMM framework, then present a user-written command,
xthybrid, that builds on Stata’s meglm command and can fit hybrid and correlated
random-effects models.
1. The terms “fixed effects” and “random effects” are not used consistently across disciplines and
literature. In the multilevel model literature, the term “fixed effects” denotes a model’s regression
coefficients, whereas the term “random effects” refers to a model’s random intercepts and slopes.
In this article, random-effects models refer to models for clustered data that have both random
effects and fixed effects (also known as multilevel models, hierarchical models, and mixed models).
In this context, a fixed-effects model refers to a model that includes only fixed effects, which is
typically a pooled or cross-sectional model that does not consider that the data may be clustered. In
econometric literature, however, the term “fixed effects model” refers to a model for clustered data
that allows for arbitrary dependence between the unobserved effects and the covariates (Wooldridge
2010, 286). The name “fixed-effects model” emerged because these models treat the unobserved
cluster-level effects as fixed rather than random (McCulloch, Searle, and Neuhaus 2008). Whether
these effects are random or nonrandom is not of concern to us. Modern econometrics assumes
they are random (Wooldridge 2010, 286). In this article, we adopt the econometric terminology for
fixed-effects models.
R. Schunck and F. Perales 91
2 GLMM
Generalized linear models (GLM) constitute a unifying framework for an entire class
of models whose response variables follow a distribution from the exponential family.
This includes many popular models such as the standard linear model, models for bi-
nary responses (for example, logit and probit models), models for ordinal responses (for
example, ordered probit and logit models), and models for count responses (for exam-
ple, Poisson and negative binomial models). In this section, we will provide a short
overview of GLM and GLMM (for details, see McCulloch, Searle, and Neuhaus [2008],
Skrondal and Rabe-Hesketh [2003], and Rabe-Hesketh and Skrondal [2012]).
Let μj be the expected response of yj given the covariates {μj = E(yj |xj )}. Then,
a GLM with yj as the response variable and xj as a covariate is defined as
g(·) is the so-called link function, which transforms the mean μj so that it can be
linearly related to the predictors. The link function therefore defines the functional
relationship between the predictors and the response variable (McCullagh and Nelder
1989; McCulloch, Searle, and Neuhaus 2008; Rabe-Hesketh, Skrondal, and Pickles 2004;
Skrondal and Rabe-Hesketh 2003). Specifying a GLM also requires choosing a condi-
tional distribution for the response variable from the exponential family of distributions.
Different permutations of link functions and distributions result in different models (see
table 1).
GLMs can be extended to include random effects and are thus suited for analyzing
clustered data, such as multilevel and panel data. These models are known as gen-
eralized linear mixed models (GLMM). Consider a situation where we have data with
two hierarchical levels. Let i denote level two (for example, schools) and j denote level
one (for example, students). yij is the response (dependent) variable, xij is a level-one
variable that varies within and between clusters, ci is a level-two variable that varies
only between clusters, and ui is the random intercept. A GLMM is specified as
The “mixing” outlined above becomes obvious: this model “mixes” a fixed part (the
fixed coefficients β and γ) and a random part (the random intercept ui ). To relax the
assumption that the effects of level-one covariates are the same across all clusters, we
can include random slopes as follows:
In Stata, the available link functions for GLMM comprise identity, logit, probit, log,
and complementary log-log. The available distributions from the exponential family
of distributions comprise the normal (Gaussian), Bernoulli, binomial, gamma, negative
binomial, ordinal, and Poisson distributions (see table 1). Other link functions and
distributions are theoretically possible.
92 Within- and between-cluster effects in GLMM
Link function
Distribution Identity Log Logit Probit Cloglog
Gaussian x x
Bernoulli x x x
Binomial x x x
Gamma x
Negative binomial x
Ordinal x x x
Poisson x
Using the identity link g(μij ) = μij and the Gaussian distribution for yij yields a
linear random-intercept model,
where the conditional distribution of yij is yij |xij , ci , ui ∼ N (μij , σ 2 ). If the outcome
variable is binary, the expected value for yij is the probability that yij = 1; that is,
μij = Pr(yij = 1|xij , ci , ui ). Combining a Bernoulli or a binomial distribution for the
response variable with a probit link results in the random-intercept probit model
where Φ(·)−1 is the inverse function of the standard normal cumulative distribution.
Here the conditional distribution of yij is yij |xij , ci , ui ∼ B(1, πij ). We specify a random-
effects logit model by choosing the logit link:
This rationale can extend to other, more complex models, for example, models for or-
dered and count outcomes (for an overview, see McCulloch, Searle, and Neuhaus [2008];
Skrondal and Rabe-Hesketh [2003, 2004]).
i
yij = βLSDV xij + βi ki + ij (3)
i=1
This model is the least-squares dummy variable (LSDV) estimator (Wooldridge 2010). It
fits i intercepts—one for each cluster—represented by the variables ki . This approach
provides consistent estimation of the level-one covariates without the assumptions of
the random-effects model that the cluster-specific intercepts are random variables and
uncorrelated with the covariates. Instead, they are explicitly included in the model
and estimated as fixed effects. Thus the estimated level-one effects will be unbiased by
level-two unobserved heterogeneity, because there is none anymore. Note that estimates
may still be biased because of unobserved heterogeneity at level one. The models still
assume that E(ij |xij , ki ) = 0. ij is the level-one error, which we will treat as white
noise for the remainder of this article.
A disadvantage of the LSDV model is that we cannot retrieve the effect of level-
two variables (ci ). Because the model incorporates dummy variables capturing the
overall cluster effects, we cannot identify the effects of cluster-level covariates because
of collinearity. Additionally, estimating each cluster effect is impractical when there
is a large number of clusters. Furthermore, using maximum likelihood to fit these
models leads to inconsistent parameter estimates. This is because of the incidental
parameters problem; the number of fixed effects parameters (or “nuisance parameters”)
increases with sample size, which leads to inconsistent estimates when using maximum
likelihood estimation (Andersen 1970; Chamberlain 1980; Wooldridge 2010). This makes
it infeasible to use this approach for models estimated using maximum likelihood—
including GLMM.
An alternative approach to fitting a fixed-effects (FE) model in the linear case is
demeaning the explanatory and outcome variables. We do this by subtracting the
between model
y i = βxi + γci + ui + i
from the random-effects model
Note that (4) is equivalent to (2.2). Because ci = ci and ui = ui , the subtraction leads
to
(yij − y i ) = βFE (xij − xi ) + (ij − i ) (5)
This technique, also called the “within transformation”, averages out all elements in
(5) that do not vary within clusters, including the level-two error term, ui . Thus this
FE model does not require any assumptions on the distribution of ui or its correlation
with the covariates. The estimated effects of β in the LSDV model in (4) and the
94 Within- and between-cluster effects in GLMM
FE model in (5) are identical (βLSDV = βFE ).2 Thus the fixed-effects model in its
demeaned form (5) also provides estimates of level-one covariates, which are unbiased by
unobserved heterogeneity at level two. Because any level-two characteristic—observed
or unobserved—is removed from the equation, we cannot retrieve the effects of level-two
covariates, just as in the LSDV model (4).
The above refers only to the linear case. For some GLMM that are estimated
with maximum likelihood (for example, logit models), we can compute a similar fixed-
effects estimator using a conditional likelihood approach (Chamberlain 1980; McCulloch,
Searle, and Neuhaus 2008; Wooldridge 2010). The conditional likelihood approach uses
a sufficient statistic to remove the level-two error from the equation (Chamberlain 1980;
McCulloch, Searle, and Neuhaus 2008; Wooldridge 2010). The conditional likelihood
approach can thus forgo any assumptions on the level-two error. Therefore, conditional
likelihood models are also referred to as fixed-effects models. However, conditional like-
lihood approaches are unavailable for most GLMM, so this approach is not always a
viable alternative (McCulloch, Searle, and Neuhaus 2008; Wooldridge 2010).
Altogether, fixed-effects models provide less biased estimates of the level-one covari-
ates than random-effects models, but unlike random-effects models, they fail to retrieve
the effect estimates of level-two variables. Depending on the nature of the research
question, disregarding between-cluster variation can even be seen as an advantage. For
example, this is sometimes regarded as focusing on the informative cases (Halaby 2004;
Wooldridge 2010, 621f.). This holds in particular for longitudinal research investigating
how change in an explanatory variable, xij , is associated with change in an outcome
variable yij . However, failure to retrieve the effect estimates of level-two variables is a
major problem in multilevel analysis, where the interest often lies in these effects, for
example, how the characteristics of neighborhoods, schools, workplaces, or geographical
areas influence individuals’ outcomes (Sampson 2003; Sampson, Raudenbush, and Earls
1997). Because the fixed-effects approach discards all contextual (level-two) informa-
tion, some argue that it is generally less preferable than the random-effects approach
for multilevel analysis (Bell and Jones 2015).
2. To be precise, both (4) and (5) are fixed-effects models. Thus we could label both estimators of β
with the subscript FE.
R. Schunck and F. Perales 95
This is accomplished by including both the deviation from the cluster-specific mean
(xij − xi ) and the cluster-specific mean xi among the model covariates. βW gives the
within-cluster effect, and βB gives the between-cluster effect.
The correlated random-effects model (Wooldridge 2010), sometimes called the Mund-
lak (1978) model, is mathematically equivalent to the hybrid model. However, in con-
trast to (6), it includes the level-one variable (xij ) in its undemeaned form:
for xi , then the estimated effect of ci is not adjusted for between-cluster differences
in xij . Note that consistent estimation of the effects of level-two variables still rests
on the assumption that there is no correlation between these and the level-two error
{E(ui |ci ) = 0}.
In both multilevel and panel-data models, it is helpful to compare between- and
within-cluster effects for pragmatic reasons (Allison 2009; Schunck 2013). The random-
effects model (1) assumes that both effects are the same and uses a weighted average
of within- and between-cluster variation in estimation. However, if this assumption
does not hold, the estimates of the random-effects model are biased. The comparison
of within- and between-cluster effects is in fact a regression-based alternative to the
Hausman specification test (Baltagi 2013, 76–77). In the hybrid model (6), one can test
whether βW = βB using a Wald test. In the correlated random-effects model (7), one
can test whether τ = 0. Note that both tests are mathematically equivalent and yield
the same test statistics. This is because τ = βB − βW . If the between-cluster effect βB
and the within-cluster effect βW are not statistically significantly different from each
other (which implies that τ = 0), this suggests that βW = βB = β. In this case, (6) and
(7) simplify to (1), the standard random-intercept model. Substantively, this means
that the random-effects model’s assumption of a zero correlation between the level-two
error and the level-one covariates holds. In contrast to the Hausman test, this test can
also be used when we estimate (cluster) robust standard errors (SEs). Furthermore, it
also works if the difference of the covariance matrices in the Hausman test is not positive
definite. An additional advantage of this test is that it works at the level of individual
variables. Thus one can use the more efficient random-effects estimate for those variables
for which within and between effects do not differ significantly and retain both within
and between effects for those variables for which they are significantly different.
assumption, that is, if the level-two error and the independent variables at level one
are not completely but only linearly uncorrelated. Failure to meet this assumption
may result in biased estimates (Brumback et al. 2010; Brumback, Dailey, and Zheng
2012). However, note that this assumption is still less restrictive than the assumption
of complete independence between the level-two error and the level-one variables of (1).
If we implement a conditional likelihood approach for a model belonging to the
family of GLMM, we can easily compare the estimates from (6) and (7) against actual
fixed-effects estimates. For example, this is possible for logit models. Unfortunately,
conditional likelihood approaches are not available for many GLMM. Differences between
fixed-effects estimates and within-cluster estimates from hybrid and correlated random-
effects models may suggest that ui depends on xij through functional forms other than
the cluster means of the cluster-varying covariates (Brumback et al. 2010, 1652).3 Given
enough observations within clusters, one can explicitly model other dependencies. For
instance, one can do so by adding polynomial functions of the cluster means of the
level-one covariates to the model (Allison 2014). A correlated random-effects model
with additional quadratic and cubic terms is given by
Here we assume that ui = τ xi + δx2i + ηx3i + vi and vi ∼ N (0, σv2i ). Just as in (7), a
test of τ = 0, δ = 0, and η = 0 can serve as inference regarding dependencies between
ui and xij . If the estimates of δ and η are not statistically significant, we can take this
as evidence that the assumption of the correlated random-effects model is not violated.
The equivalent hybrid model is
Note that the estimated effects of the nonlinear dependencies are the same for both
(8) and (9). In the hybrid model, however, inference regarding dependencies between
ui and xij requires a test of βW = βB , δ = 0, and η = 0. Thus the only difference
between a correlated random-effects model with nonlinear dependencies and a hybrid
model with nonlinear dependencies lies in τ .
One may be tempted to compare the estimated within-cluster effects (βW ) from
(6) or (7) with the effect estimates obtained by analogous models, including nonlin-
ear dependencies. If βW does not differ substantially when we account for additional
nonlinear dependencies, we could take this as evidence that the assumption of the
hybrid and correlated random-effects models holds. However, such a comparison is
complicated by the fact that estimates in nonlinear models with fixed error variance
at level one are not directly comparable because of the “rescaling problem” (Allison
1999; Kohler, Karlson, and Holm 2011). This extends to comparisons of βW from (6)
and βW from (9) and of βW from (7) and βW from (8). Including additional nonlinear
dependencies will affect the estimate of βW , even if these are orthogonal to xij because
of the rescaling problem. Consequently, βW from (6) cannot be identical to βW from
(9), and the same applies to the βW estimates from (7) and (8). Thus we should ex-
ert caution when comparing estimates from nonlinear models with different functional
forms of the dependency between ui and xij . A strict comparison of coefficients is not
possible (Allison 1999; Kohler, Karlson, and Holm 2011). Instead, one should carefully
inspect if the additional nonlinear dependencies are statistically significant or use a
likelihood-ratio (LR) test to decide whether to remove or retain them.
What is the advantage of using a hybrid model with random slopes over a standard
random-slope model? Again, in the standard random-effects model, we assume the
random effects (including the random slope) to be uncorrelated with any unobserved
characteristics at level two. Using a hybrid model relaxes this assumption. One can also
specify correlated random-effects models with random slopes. However, while correlated
random-effects and hybrid models without random slopes produce equivalent results
(both in terms of fixed coefficients and variance components), this is not the case if
random slopes are present (Kreft, de Leeuw, and Aiken 1995).
5.1 Syntax
xthybrid relies on Stata’s meglm command to estimate hybrid and correlated random-
effects versions of any two-level specification that can be fit with meglm. The syntax for
xthybrid is
R. Schunck and F. Perales 99
xthybrid depvar indepvars if in , clusterid(varname) family(type)
link(type) cre nonlinearities(type) randomslope(varlist) use(varlist)
percentage(#) test full stats(list) se t p star vce(vcetype) iterations
meglmoptions(list)
5.2 Options
clusterid(varname) specifies the cluster or grouping variable. clusterid() is re-
quired.
family(type) specifies the distribution of the outcome variable. type may be gaussian,
bernoulli, binomial, gamma, nbinomial, ordinal, or poisson. The default is
family(gaussian).
link(type) specifies the link function. type may be identity, log, logit, probit, or
cloglog. The default is link(identity).
cre requests a correlated random-effects model instead of a hybrid model.
nonlinearities(type) adds polynomial functions of the cluster means to the model.
type may be quadratic, cubic, or quartic.
randomslope(varlist) requests random slopes on the random-effect and within-group
coefficients of selected variables.
use(varlist) splits between- and within-cluster effects only for selected explanatory vari-
ables.
percentage(#) sets the minimum percent within-cluster variance for explanatory vari-
ables to be considered cluster varying.
test presents test results of the random-effects assumption for separate model variables.
full prints the full model output (meglm).
stats(list) allows users to select which model summary statistics are reported.
se requests SEs for the parameters on model variables.
t requests t-values for the parameters on model variables.
p requests p-values for the parameters on model variables.
star requests stars to denote statistically significant parameters on model variables.
vce(vcetype) specifies the type of SE to be reported. vcetype may be oim, robust, or
cluster clustervar.
iterations requests that the command be executed noisily.
meglmoptions(list) enables the user to request options from the meglm command.
100 Within- and between-cluster effects in GLMM
5.4 Applications
We now illustrate the xthybrid command through practical examples. We use Stata’s
nlswork.dta, which contains unbalanced panel data on a sample of 4,711 young, work-
ing, American women, observed up to 15 times between 1968 and 1988. The data are
hence nested so that the level-one units are person-year observations and the level-two
units are individuals. The cluster variable is idcode. Suppose that our interest is on
the relationships between several socioeconomic factors (age, msp, and race) and the
number of weekly hours worked (hours). Two of these factors are level-one variables
that vary both within and between clusters (age and msp), whereas one is a level-two
variable that varies only between clusters (race).
We first open the dataset, describe its contents, and create dummy variables out of
the race variable:
. webuse nlswork
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. describe idcode hours age msp race
storage display value
variable name type format label variable label
We can estimate the relationships between the variables of interest as a linear hybrid
model using the xthybrid command. The se option requests SEs to be displayed below
model coefficients, while the test option requests separate tests of the random-effects
assumption (τ = 0 or βW = βB depending on the specification) for individual regressors.
. xthybrid hours age msp black other, clusterid(idcode) se test
The variable ´black´ does not vary sufficiently within clusters
and will not be used to create additional regressors.
[0% of the total variance in ´black´ is within clusters]
The variable ´other´ does not vary sufficiently within clusters
and will not be used to create additional regressors.
[0% of the total variance in ´other´ is within clusters]
R. Schunck and F. Perales 101
Variable model
hours
R_black 0.5470
0.2278
R_other -0.0404
0.9396
W_age -0.0236
0.0096
W_msp -1.1661
0.1529
B_age 0.0552
0.0200
B_msp -3.3647
0.2685
_cons 36.5595
0.5957
var(_cons[idcode])
_cons 30.2310
0.9900
var(e.hours)
_cons 68.6073
0.6334
Statistics
ll -1.032e+05
chi2 259.5143
p 0.0000
aic 2.065e+05
bic 2.066e+05
legend: b/se
Level 1: 28428 units. Level 2: 4709 units.
Tests of the random effects assumption:
_b[B_age] = _b[W_age]; p-value: 0.0004
_b[B_msp] = _b[W_msp]; p-value: 0.0000
In the xthybrid output, variables with the W prefix denote within-cluster effects,
variables with the B prefix denote between-cluster effects, and variables with the R
prefix are those for which their effects are estimated the same as those in a standard
random-effects model.4
As expected, xthybrid estimates two separate effects for the level-one variables age
and msp. The coefficients W age (−0.024) and W msp (−1.166) give the within-cluster
effects. Within-cluster increases in age are associated with a within-cluster decrease in
hours, as are within-cluster increases in msp. That is, women in this sample work fewer
hours in those years in which they are younger and unmarried relative to those years in
which they are older and married, all else being equal.
4. xthybrid generates these variables in the background during operation. Users should be aware—if
variables with the same name already exist in the active dataset, xthybrid will issue an error.
102 Within- and between-cluster effects in GLMM
The coefficients on the level-one variables B age and B msp give their between-cluster
effects. For B age, the estimated coefficient (0.055) indicates that a between-individual
one-year increase in age is associated with a small increase in work hours, suggesting
that women from younger cohorts work more hours. For B msp, the estimated coefficient
(−3.365) indicates that on average, women who are never married in the data work about
three hours less than women who are always married, all other things being equal.
The within-cluster effects are statistically different from the between-cluster effects,
as can be seen from the small p-values in the formal tests of the random-effects as-
sumption of orthogonality between the observables and the unobservables ( b[B age]=
b[W age] p-value: 0.0004 and b[B msp]= b[W msp] p-value: 0.0000). This con-
stitutes evidence in favor of rejecting such an assumption as well as using a standard
random-effects model.
An analogous correlated random-effects model can be estimated using xthybrid by
adding the cre option (the results are presented in table 2):
. xthybrid hours age msp black other, clusterid(idcode) se test cre
(output omitted )
For the sake of comparison, estimates from analogous, standard, random-effects, and
fixed-effects models (estimated using xtreg) are also presented in table 2. Such models
are fit as follows:5
. xtreg hours age msp black other, i(idcode) fe
(output omitted )
. xtreg hours age msp, i(idcode) fe
(output omitted )
5. Note that we include the level-two variables black and other in the fixed-effects model although
they are omitted in the estimation. This ensures that the fixed-effects model uses the same sample
as the random-effects model. There are, obviously, more elegant ways to define the analysis sample
(see, for example, Schunck [2013]).
R. Schunck and F. Perales 103
Table 2. Coefficients from linear models (identity link and Gaussian distribution)
In the correlated random-effects model, the coefficients W age (−0.024) and W msp
(−1.166) give the within-cluster effects on the age and msp and are identical to those
fit in the hybrid model. In these linear models, the within-cluster effects fit by both the
hybrid and correlated random-effects models are the same as those fit by a standard
fixed-effects model.
104 Within- and between-cluster effects in GLMM
The coefficients D age (0.079) and D msp (−2.199) give the difference between the
between- and within-cluster effects. For example, using the estimated between- and
within-cluster effects for the variable age from the hybrid model (that is, B age and
W age) and the D age coefficient in the correlated random-effects model, we see that
0.055 − (−0.024) = 0.079.
In the correlated random-effects model, the coefficients on the cluster-invariant vari-
ables R black (0.547) and R other (−0.040) are estimated like those in a standard
random-effects regression model and are identical to those in the hybrid model. For
these to be unbiased, the random-effects assumption of orthogonality between observ-
ables and unobservables at level two must still hold. Note that these coefficients are not
identical to those in the standard random-effects model. The inclusion of the cluster-
mean variables accounts for additional sources of between-cluster variation, which affects
the estimated effects of these level-two variables (Schunck 2013, 71).
We can also use xthybrid to fit a model with random slopes for level-one variables.
For instance, if we wanted to allow the slope of age to vary across clusters, we would
specify
. xthybrid hours age msp black other, clusterid(idcode) se randomslope(age)
> iterations
The variable ´black´ does not vary sufficiently within clusters
and will not be used to create additional regressors.
[0% of the total variance in ´black´ is within clusters]
The variable ´other´ does not vary sufficiently within clusters
and will not be used to create additional regressors.
[0% of the total variance in ´other´ is within clusters]
Fitting fixed-effects model:
Iteration 0: log likelihood = -105142.73
Iteration 1: log likelihood = -105142.73
Refining starting values:
Grid node 0: log likelihood = -104177.16
Fitting full model:
Iteration 0: log likelihood = -104177.16 (not concave)
Iteration 1: log likelihood = -104034.22
Iteration 2: log likelihood = -102461.4
Iteration 3: log likelihood = -102325.48
Iteration 4: log likelihood = -102298.87
Iteration 5: log likelihood = -102298.83
Iteration 6: log likelihood = -102298.83
R. Schunck and F. Perales 105
idcode
var(W_age) .4968284 .0233867 .4530423 .5448465
var(_cons) 34.09215 1.030189 32.13164 36.17227
LR test vs. linear model: chi2(2) = 5687.80 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
Note that we have additionally specified the iterations option, which requests
xthybrid to display the underlying meglm output. This is a sensible choice to detect
possible convergence problems when models become more complicated.
An LR test, which compares the restricted model (the model without the random
slope) with the unrestricted model (the model with the random slope), indicates that
the model with a random slope for age fits the data better (LR χ2 (1) = 1861.97).
. xthybrid hours age msp black other, clusterid(idcode) se
(output omitted )
. estimates store model1
. xthybrid hours age msp black other, clusterid(idcode) se randomslope(age)
(output omitted )
. estimates store model2
. lrtest model1 model2
Likelihood-ratio test LR chi2(1) = 1861.97
(Assumption: model1 nested in model2) Prob > chi2 = 0.0000
Note: The reported degrees of freedom assumes the null hypothesis is not on the
boundary of the parameter space. If this is not true, then the reported
test is conservative.
106 Within- and between-cluster effects in GLMM
Because xthybrid relies on the Stata meglm command, it can easily handle nonlinear
models. To illustrate this, we transform our continuous outcome variable (hours) into
a dummy variable (full time) where 1 indicates that the respondent works full-time
(> 30 hours) and 0 indicates that the respondent works part-time (≤ 30 hours). We
then fit a model using the logit link and the binomial distribution for the outcome
variable, which yields a logistic regression (table 3):
. generate full_time = hours > 30 if hours!=.
. xthybrid full_time age msp black other, clusterid(idcode) family(binomial)
> link(logit) se
(output omitted )
The results show that also in nonlinear specifications, the estimated within-cluster
effects on the level-one and level-two variables are the same in the hybrid model (col-
umn 1) and the correlated random-effects model (column 2). The corresponding stan-
dard random-intercept and fixed-effects logit models are specified as (for results, see
table 3)
. xtlogit full_time age msp black other, i(idcode) re
(output omitted )
. xtlogit full_time age msp black other, i(idcode) fe
(output omitted )
Table 3. Coefficients from logit models (logit link and binomial distribution)
show the results of hybrid models fit using xthybrid for three specifications for which
a fixed-effects approach is not readily available:6
6. Note that a conditional likelihood approach for the negative binomial model is implemented
in Stata, but this method does not control for unobserved heterogeneity at the cluster level
(Allison and Waterman 2002; Green 2007; Guimarães 2008).
R. Schunck and F. Perales 109
Variable model
full_time
R_black 0.2474
0.0436
R_other 0.1282
0.1773
W_age -0.0093
0.0019
W_msp -0.2390
0.0325
B_age 0.0761
0.0335
B_age_2 -0.0013
0.0006
B_msp 0.2753
0.2024
B_msp_2 -0.8287
0.1879
_cons 0.2231
0.4845
var(_cons[idcode])
_cons 0.8631
0.0418
Statistics
ll -1.203e+04
chi2 316.8241
p 0.0000
aic 24082.5895
bic 24165.1408
legend: b/se
Level 1: 28428 units. Level 2: 4709 units.
Tests of the random effects assumption:
_b[B_age] = _b[W_age]; p-value: 0.0108
_b[B_msp] = _b[W_msp]; p-value: 0.0125
The quadratic terms of the cluster means for the variables msp and age are sta-
tistically significant (B age 2 = −0.001, SE = 0.001; B msp 2 = −0.829, SE = 0.188),
but their inclusion has negligible effects (from a substantial significance standpoint) on
the estimated within-cluster effects. However, an LR test suggests that the model with
the quadratic terms of the cluster means fits the data better than a model without.
Including a cubic function does not further improve the model fit:7
. quietly xthybrid full_time age msp black other, clusterid(idcode)
> family(binomial) link(probit) se test
. estimates store model3
7. Note that a Wald test of joint insignificance of the quadratic and the cubic terms comes to the same
conclusion. This is not surprising, considering that the LR test and the Wald test are asymptotically
equivalent (Johnston and DiNardo 1997, 150).
R. Schunck and F. Perales 111
In this case, we would choose the model that includes quadratic terms of the cluster
means. Finally, as with the linear model, xthybrid can fit nonlinear models with
random slopes. For instance, a hybrid GLMM with a logit link and binomial distribution
(a hybrid logit model) with a random slope on age is specified as
Note that we fit GLMMs with maximum likelihood estimation (Skrondal and Rabe-
Hesketh 2004; McCulloch, Searle, and Neuhaus 2008; Wooldridge 2010). This applies
also to models fit using Stata’s meglm command, on which the xthybrid command
is based. Stata uses numerical integration to calculate and maximize the likelihood.
Because these methods are computationally intensive, the models may take some time
to converge. It is therefore sensible to start by fitting simple models, with few random
terms.
Maximum likelihood estimates are consistent, asymptotically normally distributed,
and asymptotically efficient (McCulloch and Neuhaus 2013; Wooldridge 2010). How-
ever, maximum-likelihood estimation does not perform well with small samples, often
providing biased estimates (Neuhaus and McCulloch 2006, 79). The minimum num-
ber of observations required for robust maximum likelihood estimation of GLMMs de-
pends on the model specification (for discussions on sample sizes in mixed models, see
Maas and Hox [2005]; Moineddin, Matheson, and Glazier [2007]; Schunck [2016]).
6 Conclusion
We have discussed the rationale behind hybrid and correlated random-effects models
(Allison 2009; Schunck 2013; Wooldridge 2010) for clustered data, focusing on how we
can adapt these to specifications that fall under the umbrella of GLMM (McCullagh and
Nelder 1989; McCulloch, Searle, and Neuhaus 2008; Rabe-Hesketh, Skrondal, and Pick-
les 2004; Skrondal and Rabe-Hesketh 2003). We introduced the user-written xthybrid
command as an accessible and flexible tool to fit these models using Stata.
112 Within- and between-cluster effects in GLMM
7 Acknowledgments
The authors thank Marco Giesselmann, Philipp Lersch, and an anonymous reviewer for
their constructive and helpful feedback and Yangtao Huang for assistance testing the
xthybrid command. This research was supported by the Australian Research Council
Centre of Excellence for Children and Families over the Life Course (project number
CE140100027).
8 References
Allison, P. D. 1999. Comparing logit and probit coefficients across groups. Sociological
Methods and Research 28: 186–208.
Baltagi, B. H., ed. 2006. Panel Data Econometrics: Theoretical Contributions and
Empirical Applications. Amsterdam: Elsevier.
Baltagi, B. H. 2013. Econometric Analysis of Panel Data. 5th ed. New York: Wiley.
Bell, A., and K. Jones. 2015. Explaining fixed effects: Random effects modeling of
time-series cross-sectional and panel data. Political Science Research and Methods 3:
133–153.
Green, W. 2007. Functional form and heterogeneity in models for count data. Founda-
tions and Trends in Econometrics 1: 113–218.
Guimarães, P. 2008. The fixed effects negative binomial model revisited. Economics
Letters 99: 63–66.
Halaby, C. N. 2004. Panel models in sociological research: Theory into practice. Annual
Review of Sociology 30: 507–544.
Hsiao, C. 2003. Analysis of Panel Data. 2nd ed. Cambridge: Cambridge University
Press.
Johnston, J., and J. DiNardo. 1997. Econometric Methods. 4th ed. New York: McGraw–
Hill.
Kohler, U., K. B. Karlson, and A. Holm. 2011. Comparing coefficients of nested non-
linear probability models. Stata Journal 11: 420–438.
114 Within- and between-cluster effects in GLMM
Kreft, I. G. G., J. de Leeuw, and L. S. Aiken. 1995. The effect of different forms of
centering in hierarchical linear models. Multivariate Behavioral Research 30: 1–21.
Maas, C. J. M., and J. J. Hox. 2005. Sufficient sample sizes for multilevel modeling.
Methodology 1: 86–92.
McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. 2nd ed. London:
Chapman & Hall/CRC.
McCulloch, C. E., and J. M. Neuhaus. 2013. Generalized linear mixed models: Estima-
tion and inference. In The SAGE Handbook of Multilevel Modeling, ed. M. A. Scott,
J. S. Simonoff, and B. D. Marx, 271–286. London: Sage.
Mundlak, Y. 1978. On the pooling of time series and cross section data. Econometrica
46: 69–85.
Rabe-Hesketh, S., and A. Skrondal. 2012. Multilevel and Longitudinal Modeling Using
Stata. 3rd ed. College Station, TX: Stata Press.
Skrondal, A., and S. Rabe-Hesketh. 2003. Some applications of generalized linear la-
tent and mixed models in epidemiology: Repeated measures, measurement error and
multilevel modeling. Norwegian Journal of Epidemiology 13: 265–278.
. 2004. Generalized Latent Variable Modeling: Multilevel, Longitudinal, and
Structural Equation Models. Boca Raton, FL: Chapman & Hall/CRC.
Snijders, T. A. B., and J. Berkhof. 2008. Diagnostic checks for multilevel models. In
Handbook of Multilevel Analysis, ed. J. de Leeuw and E. Meijer, 141–175. Berlin:
Springer.
Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd
ed. Cambridge, MA: MIT Press.