Using Estimated Linear Mixed: Generalized Linear Models Trajectories From Model
Using Estimated Linear Mixed: Generalized Linear Models Trajectories From Model
102
Division of Biostatistics
Kitasato University Graduate School of Pharmaceutical Sciences
Introduction
Longitudinal studies are designed to measure intra-individual change over time. In recent
years, longitudinal data has received much attention in many fields such as biomedical
research and economics. One reason for the interest stems ffom the new opportunities
provided by longitudinal data to develop predictive models of subsequent outcomes given
the current data for an individual. In other words, trajectories, or longitudinally changing
pattems of repeated measurements of variables up to a given time , may afford predictive
ability for subsequent observations that are measured after time , the termination of the
trajectory.
A growth curve model based on a linear mixed model helps investigate an overall
pattem of change in repeated measurements over time, in other words, trajectories, which
can be used to predict subsequent observations. Several works about prediction of the
separate outcome variable given current data for an individual by using trajectories by a
growth curve model have been reported. Dang et al. (2007) developed a method to use
the estimated trajectories of each subject by a bivariate growth curve from longitudinal
measures in a Cox proportional hazards model to predict a separate outcome. Other works
explored the effects of a measurement error in a time-varying covariate for a mixed model
applied to a longitudinal study: Tosteson et al. (1998) used a likelihood-based method
of estimation to fit mixed models with measurement eirors in covariates. Regression
calibration is simple and potentially applicable to any regression model and it tends to
be most useful for estimating parameters in GLMs with covariate measurement errors
$t$
$t$
One challenge to predict subsequent outcome is that the features of the longitudinal
profiles are observed only through the longitudinal measurements, which are subject
103
Statistical Models
Several modeling approaches have been described in the literature to deal with longitudinal
profiles of the data. To this aim, linear mixed models and generalized linear models can
be used. In this section, a brief review of these models is described.
2.1
In mixed effects models, random effects are used to describe the correlation structure
in the data and the responses are usually assumed to be independent conditional on the
random effects (Verbeke and Molenberghs, 2000; Molenberghs and Verbeke, 2005). The
key feature of mixed models is the presence of parameters that vary randomly with the
subunits (e.g., persons in longitudinal data; smdies in meta-analysis, etc.). For example,
linear mixed models are models that incorporate both fixed effects, which are parameters
associated with an entire population or with certain repeatable levels of the experimental
factors and random effects, which are associated with individual experimental units drawn
at random Rom a population. Linear mixed models are primarily used to describe
relationships between a response variable and some covariates in the data that are grouped
according to one or more classification factors. By associating common random effects to
observations sharing the same level of a classification factor, linear mixed models flexibly
represent the covariance structure induced by the data. In some situations linear mixed
models are the most plausible models for a particular data structure. Potential advantages
are:
104
.
.
(which
when the data are hierarchical, their structure can be more directly reflected, leading
to a more natural inference.
the model can provide estimates of the random effects which can be used to
distinguish or classify the sub-units.
linear mixed model offers flexibility in fitting different variance-covariance structures.
105
Traditionally, the exponential family model adopted for the study of GLMs deals with
a linear function of the response variable involving the unknown parameters of interest.
This covers most of the experimental situations arising in practice. However, some special
members, such as the curved exponential family of distributions, are not covered. Thus,
GLMs should be further generalized to include such members.
All known response surface techniques were developed within the framework of linear
models under the strong assumptions of normality and equal variances conceming the
error distribution. One important area that needs further investigation under the less rigid
structure of GLMs is the choice of design. In this paper, we focus on a logistic regression
to predict a subsequent outcome in the proposed approaches.
3.1
Let
denote observations of subject $(i=1, \ldots , n)$ at the time point $t_{j}(j$
,
. The observation vector of trajectories, $W_{i}=(W_{i1}, \ldots , W_{im_{i}})^{t}$ , is assumed to
follow a linear mixed model (Laird and Ware, 1982).
$1,$
$\ldots$
$=$
$i$
$W_{ij}$
$m_{i})$
$W_{i}=T_{i}U_{i}+1\cdot(\beta^{t}X_{i})+e_{i}$
(3.1)
where
is a
matrix whose rows consists of
defined as a
observed
covariate vector which is function of time,
are the $qx1$ unobserved vector of subject
specific random effects following a normal distribution $N(\xi, Z),$ is a
vector
whose elements are 1,
and varianceis
observed covariate vector with mean
covariance matrix
is xl vector containing fixed effects other than times, and is
.
a $m_{i}x1$ vector of errors,
whose elements consist of following
$T_{i}$
$q\cross 1$
$T_{ij}^{t}s;T_{ij}s$
$m_{i}\cross q$
$U_{i}$
$1$
$X_{i}^{t}$
$X\beta$
$p_{1}\cross 1$
$\eta_{x}$
$e_{i}$
$p_{1}$
$m_{i}\cross 1$
$\epsilon_{ij}$
$N(O, \sigma_{\epsilon}^{2})$
106
and
are independent. A typical example is that $T_{ij}^{t}=(1, t_{ij})$ with
$q=2$ and components
of $U_{i}=(U_{i1}, U_{i2})^{t}$ can be interpreted as the baseline
and
value and the rate of change of subject , respectively.
From equation (3.1), the conditional distribution of given
and
is multivariate
normal with mean and variance given by:
We assume that
$U_{i}$
$X_{i}$
$U_{i1}$
$U_{i2}$
$i$
$W_{i}$
$U_{j}$
$X_{i}$
$E(U;|W_{j}, X_{l})=\hat{U}_{i}$
$=\xi+$
$(\Sigma T_{i}^{t}$
$0^{t})((1\beta^{t}ZX)^{t}X$
$\cdot\beta^{t}\Sigma\Sigma_{x}X-1(\begin{array}{l}+W_{i}-(T_{i}\xi 1\cdot\beta^{t}\eta_{x})X_{i}-\eta_{\chi}\end{array})$
$V(U_{i}|W_{i}, X_{i})=A_{i}$
$=\Sigma-$
3.2
$(\Sigma T_{i}^{t}$
$0^{t})((1\beta^{t}\Sigma^{I^{t}}X)^{t}X$
$\cdot\beta^{I}\Sigma\Sigma_{x}X-1(^{T_{i}\Sigma}0)\cdot$
$g(\cdot)$
$\mu_{i}$
$(U_{i}, Z_{i})$
$Y_{i}$
$g(\mu_{i})=\phi_{0}+Z_{i}^{t}\phi_{1}+U_{i}^{t}\phi_{2}$
$U_{i}$
$\theta$
$=$
$(\phi, \gamma)$
$\phi$
$=$
$\gamma$
$=$
$i$
$L(Y_{i}, W_{t};\theta)$
$=$
$L(Y_{i}|W_{i};\theta)L(W_{i};\gamma)$
$\phi$
(3.2)
107
$\log$
likelihood function.
$=$
$=$
(3.3)
$\gamma$
$\gamma$
$\sum_{i=1}^{n}1$
$g(\mu_{i})=\phi_{0}+Z_{f}^{t}\phi_{1}+\hat{U}_{i}^{t}\phi_{2}+b_{i}$
where
.
Suppose that the subsequent outcome of interest is a binary response which follows
the logistic regression model $p_{i}=P(Y=1|Z, U)=H(\phi_{0}+Z_{i}^{t}\phi_{1}+U_{i}^{l}\phi_{2})$ , where
$H(v)=\{1+\exp(-v)\}^{-1}$ is the logistic function. It is shown that using the normal
approximation to the standard logistic distribution (Johnson et al., 2004; Lin and Breslow,
1999; Wang et al., 2000; Carroll et al., 2006) when some of the predictors,
in this case,
are measured with error,
$b_{i}\sim N(\phi_{2}^{t}A_{i}\phi_{2})$
$Y$
$U_{i}$
logit
$(p_{i})= \frac{\phi_{0}+Z_{i}^{t}\phi_{1}+\hat{U}_{i}^{t}\phi_{2}}{\{1+(\phi_{2}^{t}A_{i}\phi_{2})/c^{2}\}^{1/2}}$
(3.4)
4 Discussion
Although generalized linear models with one-time covariates are well understood, little
has been done on generalized linear models with longitudinal covariates. We develop
108
in this paper a framework for generalized linear models with longitudinal covariates.
Specifically, we assume that the longitudinal covariates follow a growth curve model
based on a linear mixed model and the primary outcome variable depends on the
longitudinal covariates through latent subject-specific random effects.
The two-step approach fits the generalized linear model by replacing the unobserved
random effects with their estimators obtained by fitting individual linear regression
models using each subjects data from the first step. The conditional likelihood approach
estimates the regression coefficients by maxinuizing the conditional distribution of the
outcome variable given the observed longitudinal covariates.
To relate trajectories and subsequent outcomes, a two-step approach can be taken
as well as a single-step joint distribution model. The two-step approach is a simple
way to assess causal relationship between outcomes of different stages. However, one
disadvantage of a two-step model is its bias when covariates are subject specific estimators
obtained Rom the first step. Several works investigated the bias of estimates in GLMs
obtained in the second step and proposed methods to reduce the bias (Dang et al., 2007;
Wang et al., 2000). When trajectories from a growth curve model are assessed, covariates
affecting trajectories should be taken into account; however, most of the existing works
do not take into account the effects of covariates on the trajectories. Trajectories may be
affected by other covariates, such as age, gender, or some prognostic factors; hence we
propose a conditional likelihood approach to account for these errors and correct the bias
to incorporate the effects of covariates.
Although a full likelihood approach could also be use to reduce bias due to estimation
errors, the conditional likelihood approach is numerically more practical because it
utilizes estimates of trajectories from a linear mixed model in the first step.
There are several directions for further research on this topic. First, further research
should be conducted to assess the validity of the proposed models through simulation.
Also, implications of misspecification of the models should be studied. An important
aspect in this regard would be the development of procedures for model diagnostics.
As Dang et al. (2007) pointed out, when we use the estimated random effects as
predictors for another model, the normality assumption is very important but difficult to
test. Further research on formal methods for assessing possible departures from normal
random effects would be useful. Li et al. (2004) proposed estimators for the generalized
linear model parameters that require no assumptions on the random effects. We could
investigate semiparametric estimators as proposed by Li et al. (2004) also incorporating
covariates which affects trajectories. The two-step approach could be easily extended to
handle various parametric models in the first step such as non-linear mixed models or
multivariate longitudinal measurement may be entertained; therefore, bias caused by the
two-step approach which uses non-linear mixed models should also be investigated.
109
References
Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear
mixed models. Joumal ofAmerican Statistical Association, $88(421):9-25$ .
Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement
error in nonlinear models. Chapman&Hall, New York, second edition.
Dang, Q., Mazumdar, S., Anderson, S. J., Houck, P. R., and Reynolds, C. F. (2007).
Using trajectories from a bivariate growth curve as predictors in a cox regression model.
Statistics in Medicine, $26(4):800-811$ .
Johnson, N. L., Kotz, S., and Balakrishnan, N. (2004).
Continuous univariate
distributions, volume 1. John Wiley&Sons, New York, second edition.
Lin, X. and Breslow, N. E. (1999). Bias correction in generalized linear mixed models
Models
for
$A,$
$135:370-384$ .
of
error and the estimation of random effect parameters in a mixed model for longitudinal
data. Statistics in Medicine, 17: 1959-1971.
Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data.
110
Wang, C. Y., Wang, N., and Wang, S. (2000). Regression analysis when covariates
are regression parameters of a random effects model for observed longitudinal
measurements. Biometrics, 56:487-495.
Zkiger, S. L., Liang, K.-Y., and Albert, P. S. (1988). Models for longitudinal data: a
generalized estimating equation approach. Biometrics, 44: 1049-1060.