0% found this document useful (0 votes)
300 views9 pages

Using Estimated Linear Mixed: Generalized Linear Models Trajectories From Model

1$ $X_{i}$ $U_{i}$ $q\cross $1_{m_{i}}$ $\beta$ $e_{i}$ $N(0,\sigma^{2}I_{m_{i}})$ This document summarizes statistical models for analyzing longitudinal data, including linear mixed models and generalized linear models. It proposes a two-step method to predict a subsequent outcome using trajectories estimated from a linear mixed model. In the first step, a linear mixed model is fitted to estimate individual trajectories while adjusting for other covariates. In the second step, a logistic regression model is applied using the estimated trajectories as covariates, with a conditional

Uploaded by

ikin sodikin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
300 views9 pages

Using Estimated Linear Mixed: Generalized Linear Models Trajectories From Model

1$ $X_{i}$ $U_{i}$ $q\cross $1_{m_{i}}$ $\beta$ $e_{i}$ $N(0,\sigma^{2}I_{m_{i}})$ This document summarizes statistical models for analyzing longitudinal data, including linear mixed models and generalized linear models. It proposes a two-step method to predict a subsequent outcome using trajectories estimated from a linear mixed model. In the first step, a linear mixed model is fitted to estimate individual trajectories while adjusting for other covariates. In the second step, a logistic regression model is applied using the estimated trajectories as covariates, with a conditional

Uploaded by

ikin sodikin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1603 2008 102-110

102

Generalized Linear Models Using Trajectories


Estimated from a Linear Mixed Model
Nami Maruyama

Division of Biostatistics
Kitasato University Graduate School of Pharmaceutical Sciences

Introduction

Longitudinal studies are designed to measure intra-individual change over time. In recent
years, longitudinal data has received much attention in many fields such as biomedical
research and economics. One reason for the interest stems ffom the new opportunities
provided by longitudinal data to develop predictive models of subsequent outcomes given
the current data for an individual. In other words, trajectories, or longitudinally changing
pattems of repeated measurements of variables up to a given time , may afford predictive
ability for subsequent observations that are measured after time , the termination of the
trajectory.
A growth curve model based on a linear mixed model helps investigate an overall
pattem of change in repeated measurements over time, in other words, trajectories, which
can be used to predict subsequent observations. Several works about prediction of the
separate outcome variable given current data for an individual by using trajectories by a
growth curve model have been reported. Dang et al. (2007) developed a method to use
the estimated trajectories of each subject by a bivariate growth curve from longitudinal
measures in a Cox proportional hazards model to predict a separate outcome. Other works
explored the effects of a measurement error in a time-varying covariate for a mixed model
applied to a longitudinal study: Tosteson et al. (1998) used a likelihood-based method
of estimation to fit mixed models with measurement eirors in covariates. Regression
calibration is simple and potentially applicable to any regression model and it tends to
be most useful for estimating parameters in GLMs with covariate measurement errors
$t$

$t$

(Carroll et al., 2006).

One challenge to predict subsequent outcome is that the features of the longitudinal
profiles are observed only through the longitudinal measurements, which are subject

103

to measurement error and other variation. Naive implementation by imputing the


longitudinal profiles with measurement error yields biased inference, and several methods
for reducing this bias have been proposed. For example, Wang et al. (2000) proposed
to correct bias with longitudinal covariates with errors in GLMs for binary outcomes
and Cox proportional hazards model for censored outcome variables. Although many
works have been reported on correcting bias in GLMs, a method of prediction based on
GLMs, specifically a logistic regression model, to correct bias ffom a growth curve model
adjusted for other covariates has not been well investigated. In this paper, we investigate a
method of prediction by generalized linear models, especially a logistic regression model
using trajectories, which are conditional expectations of predictors: In the first step, we
get parameter estimates of trajectories which are random effects obtained from a linear
mixed model while adjusting other covariates. To predict the subsequent outcome, in
the second step, a conditional likelihood approach is applied to correct the parameter
estimation errors of trajectories ffom the first step.

Statistical Models

Several modeling approaches have been described in the literature to deal with longitudinal
profiles of the data. To this aim, linear mixed models and generalized linear models can
be used. In this section, a brief review of these models is described.

2.1

Linear Mixed Model

In mixed effects models, random effects are used to describe the correlation structure
in the data and the responses are usually assumed to be independent conditional on the

random effects (Verbeke and Molenberghs, 2000; Molenberghs and Verbeke, 2005). The
key feature of mixed models is the presence of parameters that vary randomly with the
subunits (e.g., persons in longitudinal data; smdies in meta-analysis, etc.). For example,
linear mixed models are models that incorporate both fixed effects, which are parameters
associated with an entire population or with certain repeatable levels of the experimental
factors and random effects, which are associated with individual experimental units drawn
at random Rom a population. Linear mixed models are primarily used to describe
relationships between a response variable and some covariates in the data that are grouped
according to one or more classification factors. By associating common random effects to
observations sharing the same level of a classification factor, linear mixed models flexibly
represent the covariance structure induced by the data. In some situations linear mixed
models are the most plausible models for a particular data structure. Potential advantages

are:

104

an appropriate covariance pattem model

directly models a pattem of


correlations between observations) leads to more efficient and precise fixed effect
estimates and standard errors.

.
.

(which

when the data are hierarchical, their structure can be more directly reflected, leading
to a more natural inference.
the model can provide estimates of the random effects which can be used to
distinguish or classify the sub-units.
linear mixed model offers flexibility in fitting different variance-covariance structures.

A potential disadvantage of linear mixed models is that more distributional assumptions


need to be made. Moreover, usually approximations have to be used to estimate certain
parameters of the model. As a consequence, conclusions depend on more assumptions,
increasing the risk of misspecifying the model and hence biased parameter estimates.
Nevertheless, linear mixed models offer a powerful and flexible tool for analysis of
longitudinal data.
In longitudinal studies, a growth curve model based on a linear mixed model
including two random effects (intercept and slope) which are normally distributed with
an independent Gaussian error is probably the most routinely used to study change over
time of a quantitative outcome. Therefore, the growth curve model is applied to predict a
subsequent outcome in this paper.

2.2 Generalized Linear Models


As a paradigm for a large class of problems in applied statistics, generalized linear models
(GLMs) have proved very effective since their introduction by Nelder and Wedderbum
(1972). GLMs are a unified class of regression models for discrete and continuous
response variables, and have been used routinely in dealing with observational studies.
GLMs have several areas of application ranging Rom medicine to economics, quality
control and sample surveys. Applications of the logistic regression model, expanded
with the popularity of case-control designs in epidemiology, now provide a basic tool
for epidemiologic investigation of chronic diseases. Probit and logistic models play a key
role in all forms of assay experiments. The log-linear model is the comerstone of modem
approaches to the analysis of contingency table data, and has been found particularly
useful for medical and social sciences. Poisson regression models are widely employed
to study rates of events such as disease outcomes. The complementary log-log model
arises in the study of infectious diseases, and more generally, in the analysis of survival
data associated with clinical and longitudinal follow-up studies.

105

Traditionally, the exponential family model adopted for the study of GLMs deals with
a linear function of the response variable involving the unknown parameters of interest.
This covers most of the experimental situations arising in practice. However, some special
members, such as the curved exponential family of distributions, are not covered. Thus,
GLMs should be further generalized to include such members.
All known response surface techniques were developed within the framework of linear
models under the strong assumptions of normality and equal variances conceming the
error distribution. One important area that needs further investigation under the less rigid
structure of GLMs is the choice of design. In this paper, we focus on a logistic regression
to predict a subsequent outcome in the proposed approaches.

3 Generalized Linear Model with Covariate


Measurement Error
In this section, we propose a method of prediction for an outcome variable based on a
generalized linear model, specifically a logistic regression model, whose covariates are
variables that characterize an individual trajectory. As an individual trajectory contains
estimation error, this, in fact, constitutes a measurement error model. The model is fitted
in two steps. First, a linear mixed model is fitted to the longitudinal data to estimate the
random effect that characterizes the trajectory for each individual while adjusting for other
covariates. In the second step, a conditional likelihood approach is applied to account for
the estimation error in the trajectory. Prediction of an outcome variable is based on the
logistic regression model in the second step.

3.1
Let

The First Step: Trajectories of Longitudinal data

denote observations of subject $(i=1, \ldots , n)$ at the time point $t_{j}(j$
,
. The observation vector of trajectories, $W_{i}=(W_{i1}, \ldots , W_{im_{i}})^{t}$ , is assumed to
follow a linear mixed model (Laird and Ware, 1982).
$1,$

$\ldots$

$=$

$i$

$W_{ij}$

$m_{i})$

$W_{i}=T_{i}U_{i}+1\cdot(\beta^{t}X_{i})+e_{i}$

(3.1)

where
is a
matrix whose rows consists of
defined as a
observed
covariate vector which is function of time,
are the $qx1$ unobserved vector of subject
specific random effects following a normal distribution $N(\xi, Z),$ is a
vector
whose elements are 1,
and varianceis
observed covariate vector with mean
covariance matrix
is xl vector containing fixed effects other than times, and is
.
a $m_{i}x1$ vector of errors,
whose elements consist of following
$T_{i}$

$q\cross 1$

$T_{ij}^{t}s;T_{ij}s$

$m_{i}\cross q$

$U_{i}$

$1$

$X_{i}^{t}$

$X\beta$

$p_{1}\cross 1$

$\eta_{x}$

$e_{i}$

$p_{1}$

$(\epsilon_{i1}, \ldots , \epsilon_{im_{i}})^{t}$

$m_{i}\cross 1$

$\epsilon_{ij}$

$N(O, \sigma_{\epsilon}^{2})$

106

and
are independent. A typical example is that $T_{ij}^{t}=(1, t_{ij})$ with
$q=2$ and components
of $U_{i}=(U_{i1}, U_{i2})^{t}$ can be interpreted as the baseline
and
value and the rate of change of subject , respectively.
From equation (3.1), the conditional distribution of given
and
is multivariate
normal with mean and variance given by:

We assume that

$U_{i}$

$X_{i}$

$U_{i1}$

$U_{i2}$

$i$

$W_{i}$

$U_{j}$

$X_{i}$

$E(U;|W_{j}, X_{l})=\hat{U}_{i}$

$=\xi+$

$(\Sigma T_{i}^{t}$

$0^{t})((1\beta^{t}ZX)^{t}X$

$\cdot\beta^{t}\Sigma\Sigma_{x}X-1(\begin{array}{l}+W_{i}-(T_{i}\xi 1\cdot\beta^{t}\eta_{x})X_{i}-\eta_{\chi}\end{array})$

$V(U_{i}|W_{i}, X_{i})=A_{i}$

$=\Sigma-$

3.2

$(\Sigma T_{i}^{t}$

$0^{t})((1\beta^{t}\Sigma^{I^{t}}X)^{t}X$

$\cdot\beta^{I}\Sigma\Sigma_{x}X-1(^{T_{i}\Sigma}0)\cdot$

The Second Step: Trajectory Modeling to Predict Subsequent


Outcome

Suppose that the subsequent outcomes of interest are responses


which have a
distribution in the canonical exponential family (McCullagh and Nelder, 1998), whose
mean is given as , where for a monotonic differential link function
. The effects of
$(W_{i}, X_{i})$ on
are modeled through
using the generalized linear model
$Y_{i}$

$g(\cdot)$

$\mu_{i}$

$(U_{i}, Z_{i})$

$Y_{i}$

$g(\mu_{i})=\phi_{0}+Z_{i}^{t}\phi_{1}+U_{i}^{t}\phi_{2}$

is a $p_{2}x1$ covariate vector and


where
is the subject specific random effects
from equation (3.1). Wang et al. (2000) proposed a generalized linear model which
included longimdinal covariates with a measurement error. We propose a method to use
the summary characteristics of trajectories from a growth curve model as predictors in a
generalized linear model by extending the method by Wang et al. (2000).
, where
The likelihood function of
and
contributed by the th subjects is
$Z_{i}$

$U_{i}$

$\theta$

$=$

$(\phi, \gamma)$

$\phi$

$=$

$(\phi_{0}, \phi_{1}, \phi_{2})$

$\gamma$

$=$

$i$

$(\xi, \Sigma, \beta, \Sigma_{x}, \eta_{x}, \sigma_{\epsilon}^{2})$

$L(Y_{i}, W_{t};\theta)$

$=$

$L(Y_{i}|W_{i};\theta)L(W_{i};\gamma)$

Therefore, the parameters specifying the assumed models,

$\phi$

(3.2)

and 7 are estimated by

107

maximizing the following

$\log$

$\log L(Y, W;\theta)$

likelihood function.
$=$

$\log L(Y|W;\theta)+\log L(W;\gamma)$

$=$

$\sum_{i=1}^{n}\{\log L(Y_{i}|W_{i};\theta)+\log L(W_{i};\gamma)\}$

(3.3)

We consider to maximize the conditional $\log$ likelihood $\sum_{i=1}^{n}\log L(Y_{i}|W_{i};\theta)$ with


respect to given and estimate by maximizing
og $L(W_{i};\gamma)$ through fitting
$\phi$

$\gamma$

$\gamma$

$\sum_{i=1}^{n}1$

the linear mixed model (3.1).


The true marginal mean for the hierarchical model with normally distributed random
effects could be expressed with adjusted values for the regression variables or regression
coefficients (Zeger et al., 1988; Breslow and Clayton, 1993). Therefore,
follows a
generalized linear mixed model:
$Y_{t}|W_{i}$

$g(\mu_{i})=\phi_{0}+Z_{f}^{t}\phi_{1}+\hat{U}_{i}^{t}\phi_{2}+b_{i}$

where
.
Suppose that the subsequent outcome of interest is a binary response which follows
the logistic regression model $p_{i}=P(Y=1|Z, U)=H(\phi_{0}+Z_{i}^{t}\phi_{1}+U_{i}^{l}\phi_{2})$ , where
$H(v)=\{1+\exp(-v)\}^{-1}$ is the logistic function. It is shown that using the normal
approximation to the standard logistic distribution (Johnson et al., 2004; Lin and Breslow,
1999; Wang et al., 2000; Carroll et al., 2006) when some of the predictors,
in this case,
are measured with error,
$b_{i}\sim N(\phi_{2}^{t}A_{i}\phi_{2})$

$Y$

$U_{i}$

logit

$(p_{i})= \frac{\phi_{0}+Z_{i}^{t}\phi_{1}+\hat{U}_{i}^{t}\phi_{2}}{\{1+(\phi_{2}^{t}A_{i}\phi_{2})/c^{2}\}^{1/2}}$

(3.4)

where $c=15\pi/16\sqrt{3}$. In general, as the denominator in equation (3.4) is greater than 1,


estimates of the parameters are slightly attenuated.
By maximizing the $\log$ likelihood function (3.3) using the conditional likelihood
having covariates with errors, parameters in the second step are estimated. The variance
of parameters in equation (3.4) can be estimated by inverting the observed information
matrix. Prediction of the outcome variable is based on the logistic regression model in the
second step.

4 Discussion
Although generalized linear models with one-time covariates are well understood, little
has been done on generalized linear models with longitudinal covariates. We develop

108

in this paper a framework for generalized linear models with longitudinal covariates.
Specifically, we assume that the longitudinal covariates follow a growth curve model
based on a linear mixed model and the primary outcome variable depends on the
longitudinal covariates through latent subject-specific random effects.
The two-step approach fits the generalized linear model by replacing the unobserved
random effects with their estimators obtained by fitting individual linear regression
models using each subjects data from the first step. The conditional likelihood approach
estimates the regression coefficients by maxinuizing the conditional distribution of the
outcome variable given the observed longitudinal covariates.
To relate trajectories and subsequent outcomes, a two-step approach can be taken
as well as a single-step joint distribution model. The two-step approach is a simple
way to assess causal relationship between outcomes of different stages. However, one
disadvantage of a two-step model is its bias when covariates are subject specific estimators
obtained Rom the first step. Several works investigated the bias of estimates in GLMs
obtained in the second step and proposed methods to reduce the bias (Dang et al., 2007;
Wang et al., 2000). When trajectories from a growth curve model are assessed, covariates
affecting trajectories should be taken into account; however, most of the existing works
do not take into account the effects of covariates on the trajectories. Trajectories may be
affected by other covariates, such as age, gender, or some prognostic factors; hence we
propose a conditional likelihood approach to account for these errors and correct the bias
to incorporate the effects of covariates.
Although a full likelihood approach could also be use to reduce bias due to estimation
errors, the conditional likelihood approach is numerically more practical because it
utilizes estimates of trajectories from a linear mixed model in the first step.
There are several directions for further research on this topic. First, further research
should be conducted to assess the validity of the proposed models through simulation.
Also, implications of misspecification of the models should be studied. An important
aspect in this regard would be the development of procedures for model diagnostics.
As Dang et al. (2007) pointed out, when we use the estimated random effects as
predictors for another model, the normality assumption is very important but difficult to
test. Further research on formal methods for assessing possible departures from normal
random effects would be useful. Li et al. (2004) proposed estimators for the generalized
linear model parameters that require no assumptions on the random effects. We could
investigate semiparametric estimators as proposed by Li et al. (2004) also incorporating
covariates which affects trajectories. The two-step approach could be easily extended to
handle various parametric models in the first step such as non-linear mixed models or
multivariate longitudinal measurement may be entertained; therefore, bias caused by the
two-step approach which uses non-linear mixed models should also be investigated.

109

References
Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear
mixed models. Joumal ofAmerican Statistical Association, $88(421):9-25$ .

Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement
error in nonlinear models. Chapman&Hall, New York, second edition.

Dang, Q., Mazumdar, S., Anderson, S. J., Houck, P. R., and Reynolds, C. F. (2007).
Using trajectories from a bivariate growth curve as predictors in a cox regression model.
Statistics in Medicine, $26(4):800-811$ .
Johnson, N. L., Kotz, S., and Balakrishnan, N. (2004).
Continuous univariate
distributions, volume 1. John Wiley&Sons, New York, second edition.

Laird, N. M. and Ware, J. H. (1982).


Biometrics, 38:936-937.

Random-effects models for longitudinal data.

Li, E., Zhang, D., and Davidian, M. (2004).

Conditional estimation for generalized


linear models when covariates are subject-specific parameters in a mixed model for
longitudinal measurements. Biometrics, 60: 1-7.

Lin, X. and Breslow, N. E. (1999). Bias correction in generalized linear mixed models

with multiple components of dispersion. Journal


91: 1007-1016.

of American Statistical Association,

McCullagh, P. and Nelder, J. A. (1998). Genemlized linear models. Chapman&Hall,


New York, second edition.

Molenberghs, G. and Verbeke, G. (2005).


Springer, New York.

Models

for

Discrete Longitudinal Data.

Nelder, J. A. and Wedderbum, R. W. M. (1972). Generalized linear models. Joumal

Royal Statistical Society, Series

$A,$

$135:370-384$ .

of

Tosteson, T. D., Buonaccorsi, J. P., and Demidenko, E. (1998). Covariate measurement

error and the estimation of random effect parameters in a mixed model for longitudinal
data. Statistics in Medicine, 17: 1959-1971.
Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data.

Springer, New York.

110

Wang, C. Y., Wang, N., and Wang, S. (2000). Regression analysis when covariates
are regression parameters of a random effects model for observed longitudinal
measurements. Biometrics, 56:487-495.

Zkiger, S. L., Liang, K.-Y., and Albert, P. S. (1988). Models for longitudinal data: a
generalized estimating equation approach. Biometrics, 44: 1049-1060.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy