0% found this document useful (0 votes)
34 views13 pages

Content Server

aertcle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views13 pages

Content Server

aertcle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

THE JOURNAL OF EXPERIMENTAL EDUCATION, 2018

VOL. 86, NO. 4, 690–701


https://doi.org/10.1080/00220973.2017.1369384

Brief Research Report: Growth Models With Small Samples


and Missing Data
Daniel McNeish
Arizona State University, Tempe, AZ, USA

ABSTRACT KEYWORDS
Small samples are common in growth models due to financial and logistical Growth model; missing data;
difficulties of following people longitudinally. For similar reasons, longitudinal mixed effect model;
studies often contain missing data. Though full information maximum multilevel model; small
likelihood (FIML) is popular to accommodate missing data, the limited sample
number of studies in this area have found that FIML tends to perform poorly
with small-sample growth models. This report demonstrates that the fault
lies not with how FIML accommodates missingness but rather with maximum
likelihood estimation itself. We discuss how the less popular restricted
likelihood form of FIML, along with small-sample-appropriate methods, yields
trustworthy estimates for growth models with small samples and missing
data. That is, previously reported small sample issues with FIML are
attributable to finite sample bias of maximum likelihood estimation not
direct likelihood. Estimation issues pertinent to joint multiple imputation and
predictive mean matching are also included and discussed.

IN BEHAVIORAL SCIENCES, small samples are extremely common due to financial limitations, due
to difficulty attracting enough participants who meet criteria for participation, or because the research
is experimental. These rationales tend to be related to missing data as well, and small samples and
missing data often appear simultaneously in empirical studies. Though the total number remains small,
a handful of investigations have begun to appear in the methodological literature that address the
performance of missing data methods when sample size is small (Barnes, Lindborg, & Seaman, 2006;
Delucchi & Bostrom, 1999; Graham & Schafer, 1999; McNeish, 2017a; Shin, Davison, & Long, 2016;
von Hippel, 2016). Given the nascent state of small-sample, missing data research, much of the avail-
able research on small-sample methods has focused on less complex statistical models such as mean or
group comparisons (Barnes et al., 2006; Delucchi & Bostrom, 1999) or single-level regression
(McNeish, 2017a; Graham & Schafer, 1999). Studies are just beginning to extend small-sample, missing
data investigations to multivariate models (McNeish & Harring, 2017; Shin, Davison, & Long, 2009,
Shin et al., 2016).
These studies tend to find that the popular full information maximum likelihood (FIML) method to
accommodate missing data (aka direct likelihood or raw likelihood) performs relatively poorly in small
samples despite its broad popularity. In behavioral sciences, FIML is most often implemented in the
structural equation modeling (SEM) framework (Enders & Bandalos, 2001; for growth modeling, this is
sometimes referred to as latent curve modeling, Bollen & Curran, 2006). Though justifiable, previous
studies have discussed how the SEM framework for growth is handicapped with small sample data
(roughly 100 or less; McNeish, 2016). In empirical studies, smaller samples are common and appear in
about 33% of studies based on a meta-analysis of growth models in personality psychology by Roberts,
Walton, and Viechtbauer (2006), for example. As we describe in more detail shortly, maximum likelihood

CONTACT Daniel McNeish dmcneish@asu.edu PO Box 871104, Tempe, AZ 85287, USA.


© 2017 Taylor & Francis Group, LLC
THE JOURNAL OF EXPERIMENTAL EDUCATION 691

(ML) estimation in the SEM framework is known to underestimate growth-factor variances and standard
errors with small samples, even with complete data (Browne & Draper, 2006). As a result, the conclusions
about FIML with small-sample growth models are difficult to interpret. Is the performance of FIML for
handling missingness poor, is poor performance due to small sample bias that exists with ML, or do the
two factors interact such that small-sample bias is exacerbated with missing data?
In this report, we discuss the shortcomings of the SEM framework with small-sample longitudinal
data by comparing it to the multilevel model (MLM) framework. The literature on the latter has devel-
oped many small-sample specific methods and corrections that make growth models less susceptible to
small-sample bias than the SEM framework, even though the models are often mathematically equiva-
lent (Curran, 2003). We then perform a simulation in which we include models estimated with the
SEM framework and with small-sample specific corrections only available within the MLM framework.
The main goal of this report is to disentangle the bias in estimates that is attributable to small samples
from bias that may stem from poor performance of missing data methods. Though this introduction
has focused on FIML, we also extend the investigation to multiple imputation methods such as joint
multiple imputation and predictive mean matching.

Different types of growth models


Multilevel models
The multilevel model (MLM) approach accounts for the fact that individuals are measured repeatedly
over time by modeling the intercept and/or the time coefficients randomly (Laird & Ware, 1982). This
allows researchers to estimate a mean trajectory for the sample (called fixed effects) and subject-specific
deviations from the mean for each person in the data. Random effects capture how much the subject-
specific estimates for each person differ from the fixed effect estimate, which allows each person to
have his or her own unique growth trajectory.
The linear MLM can be written in matrix notation as

y i D Xi b C Zi ui C ei ; (1)

where y i is an mi £ 1 vector of responses for person i, mi is the number of observations for person i, Xi
is an mi £ p design matrix for the predictors, p is the number of predictors, b is a p £ 1 vector of fixed
regression coefficients, Zi is an mi £ r design matrix for the random effects, r is the number of random
effects, ui is an r £ 1 vector of random effects where ui » MVN.0; G/, and ei is a matrix of residuals of
the observations where ei » MVN.0; Ri / and Cov.ui ; ei / D 0.

Structural equation models


The SEM approach for growth follows the same premise as MLMs except that growth is formulated in
a general SEM framework rather than as an extension of the regression framework. Specifically, growth
in SEM is a confirmatory factor analysis (CFA) model with an imposed factor mean structure and con-
straints to yield estimates of growth. The main conceptual difference in SEM is that the random effects
are specified as latent variables in a CFA rather than randomly varying regression coefficients; the
mean of the latent variable represents the average growth trajectory and the factor score represents the
subject-specific growth trajectory. The MLM and SEM frameworks can be shown to be mathematically
equivalent in most cases (e.g., Curran, 2003; Mehta & West, 2000; Wu, West, & Taylor, 2009).
The SEM growth model is
y i D Lhi C ei
(2)
hi D a C GXi C zi
where L is a matrix of factor loadings, h i is a vector of subject-specific growth factors (intercept and
slope) values, ei is a vector of residuals that are distributed MVN.0; Q/, a is a vector of growth factor
692 D. MCNEISH

means, G is a matrix of coefficients for the predicted effect of time-invariant covariates on the latent
growth trajectory factors, Xi is a matrix of time-invariant covariates, and z i is a vector of random
effects that is distributed MVN.0; C/.

Small-sample issues in growth models


A major difference in the implementation of MLMs and SEM lies in modeling data with smaller sam-
ple sizes (McNeish, 2016). Although the smaller sample size does not alter the mathematical equiva-
lence of the models, it affects the implementation, such as the available estimation options. In SEM,
parameters are estimated as if the model is a CFA with a mean structure where the log-likelihood func-
tion is maximized when the discrepancy function FML is minimized such that,
h   ¡ 1 i
^ j ¡ lnjSj C tr S ¡ S
FML D lnj S ^ S^ ^ ¡ 1 .m ¡ m/
^ TS
C .m ¡ m/ ^ (3)

where S is the observed covariance matrix of observed variables, S ^ is the model-implied covariance
matrix, m is the mean vector of the observed variables, and m ^ is the model-implied mean vector
(Preacher, Wichman, MacCallum, & Briggs, 2008). These estimates correspond to an MLM estimated
with full ML where the objective log-likelihood function for the variance components is

1 ^ j ¡ 1 .y ¡ Xb ^ ¡ 1 .y ¡ Xb
^ GLS /T V ^ GLS /
lFIML .G; R/ D ¡ log j V (4)
2 2

where V D Var.y/ D ZGZT C R and b ^ GLS D .XT V ¡ 1 X/ ¡ 1 XT V ¡ 1 y, the generalized least squares
estimator for the fixed effects. However, no adjustment is made to account for b being estimated in
1

the model in Equation 3, which is used to estimate G and R. That is, G and R capture variance
around a fixed effect but the fixed effect is not known a priori. Without an anchor point, estimating
variance is difficult because it is not known about which reference point data are dispersed. Even
though the fixed effects are estimated in the same model, when calculating G and R, the estimates of
the fixed effects are treated as known values, despite these effects consuming degrees of freedom, and
have sampling variability. As a result, the elements of G are underestimated with small sample sizes
because these aspects are disregarded,2 which then leads to Var.b/ ^ being underestimated as well
(standard error estimates are taken from the square root of the diagonal elements of Var.b/) ^ because
P
^ D f J .XT V ¡ ¡ 1
Var.b/ ^ j X /g .
1
jD1 j j

The remedy for this situation is referred to as restricted maximum likelihood (REML). REML takes
the loss of degrees of freedom from estimating b into account and also separates the estimation of the
variance components from the fixed effects so that the fixed effects need not be treated as known.
REML uses the residuals from an ordinary least squares fit, r D y ¡ Xb ^ OLS D y ¡ X.XT X/ ¡ 1 XT y, and
ML is then applied to these OLS residuals, which are independent of the fixed effects by definition but
still retain the same conditional variance, as the outcome. Using the OLS residuals allows the variance
components to be estimated around a truly fixed regression line (fixed effects are necessarily 0 because
^ OLS ?rOLS ). The objective log-likelihood function for the variance components with REML is
b

1 ^ j ¡ 1 log j XT V
^ ¡ 1 X j ¡ 1 .y ¡ Xb ^ ¡ 1 .y ¡ Xb
^ GLS /T V ^ GLS /
lREML .G; R/ D ¡ log j V (5)
2 2 2

^ ¡ 1 X j term in Equation 4 (the generalized least squares covariance matrix for the
The 12 log j XT V
fixed effects) accounts for the uncertainty of estimating b and reduces the bias from estimates of the
variance components with small samples. Instead of estimating b and the variance components simul-
taneously, b is estimated separately from the variance components with REML. Therefore, V ^ does not
^
underestimate V, so Var.b/ is not necessarily underestimated. Conceptually, the difference between
THE JOURNAL OF EXPERIMENTAL EDUCATION 693

ML and REML is akin to dividing a sample variance estimate by n or n ¡1. With large samples, the dif-
ference is hardly noticeable and both converge asymptotically; however, for decreasing sample sizes,
the difference becomes increasingly discernable.3
To address downward bias of standard error estimates, the Kenward-Roger correction (Kenward &
Roger, 1997) in the MLM framework has been shown to yield unbiased estimates that produce Type I
error rates at the desired level. This correction includes an inflation factor to remove the downward
bias and a Satterthwaite-type degree of freedom method to more closely approximate the correct null
sampling distribution with the appropriate degrees of freedom. Therefore, though the models remain
mathematically equivalent in most contexts, use of different estimation methods results in disparate
small-sample properties.
As noted in McNeish (2016), REML and Kenward-Roger methods do not have analogs in SEM and
small-sample issues in the SEM framework continue to be underresearched compared to the MLM
framework. Deriving a broad REML-type estimator for SEM has also been noted to be a challenging
endeavor. Without delving into details, the general issue is that the restricted likelihood function
involves additional computations that are manipulations of the fixed effect design matrix (X in Equa-
tion 1). This matrix does not exist in the SEM framework because its elements are allocated to the a,
L, and X matrices in Equation 2.4

Overview of common missing data methods


Direct likelihood
Direct likelihood5 methods accommodate missing data by allowing the log-likelihood to be built by
summing information from each individual in the data (Enders, 2001). The most well-known direct
likelihood method is FIML (Arbuckle, 1996), although FIML is not the only direct likelihood method.
Though much less common, there is a version of FIML for restricted maximum likelihood (FIREML;
Beunckens, Molenberghs, & Kenward, 2005; Foulley, Jaffrezic, & Robert-Granie, 2000; Shah, Laird, &
Schoenfeld, 1997). The idea behind FIREML is identical to FIML with the exception that each individ-
ual contributes what they have to the restricted log-likelihood. REML uses conditional likelihood rather
than the joint likelihood, so its application is limited to data with no missing covariates (Allison, 2012).

Joint imputation
Rather than working around missing values as in direct likelihood, joint multiple imputation (JMI)
imputes data for missing values. Using Markov chain Monte Carlo, values are imputed using two itera-
tive steps: an imputation step (I-Step) and a posterior step (P-Step). The first I-Step simulates values for
the missing data using the mean and covariance matrix from the observed data as a prior distribution.
After a set of plausible values are imputed in the first I-Step, the P-Step then simulates the posterior
mean vector and covariance matrix based on the now-complete imputed data from the I-Step. The pos-
teriors from the P-Step are used as the prior distribution in the next I-Step where new values are
imputed for the missing data and the process continues to iterate between I-Steps and P-Steps. The
method is referred to as “joint” imputation because the simulated values are drawn from a single multi-
variate normal distribution based on a single mean vector and covariance matrix (Schafer & Graham,
2002).

Fully conditional specification with predictive mean matching


Whereas JMI attempts to solve one large multidimensional problem (i.e., all missing values are
imputed at once from a multivariate normal distribution), the idea of the fully conditional specification
(FCS, aka multiple imputation with chained equations [MICE]) is to solve multiple one-dimensional
problems (van Buuren, Boshuizen, & Knook, 1999). Rather than impute for all variables simulta-
neously, FCS imputes values for each variable separately. In the semiparametric predictive mean
694 D. MCNEISH

matching approach to FCS imputation, a regression model is built for each variable that has missing
data, using the observations with complete data. The goal of this regression is not to predict the
hypothetical value of the missing observation, however. Instead, the goal is to identify donor cases
whose values on other variables most closely match the case with missing data. It is common to identity
five such donor cases and then choose one at random for imputation (Enders, 2010). The value of this
selected donor case is directly imputed for the missing value (Schafer & Graham, 2002).
Predictive mean matching (PMM) has been noted to be particularly useful when the distribution of
the variable of interest is not easily modeled with straightforward methods such as when data are trun-
cated or highly skewed (Azur, Stuart, Frangakis, & Leaf, 2011). PMM ensures that imputed values are
plausible and in-range because the imputed values are taken from observed data (Horton & Lipsitz,
2001). As a further advantage, because regression is used only to find potential donor cases and not to
impute values, PMM is much less sensitive to potential misspecifications in the imputation model (Sea-
man & Hughes, 2016) and to distributional assumptions because the method is semiparametric in
nature (Vink, Frank, Pannekoek, & van Buuren, 2014).

Small-sample issues with missing data methods


Direct likelihood
An issue with treating missing data with FIML in SEM software is that test statistics and confidence
intervals are almost always produced with methods making asymptotic assumptions (asymptotic
standard errors, Z or x2 statistics; von Hippel, 2016). With smaller samples, it is appropriate to use
finite sample tests that include the relevant degrees of freedom (e.g., t tests or F tests, as is done in a
MLM framework), otherwise, the Type-I error rate will not follow the specified nominal rate (Schaalje,
McBride, & Fellingham, 2002). This concern is not present in FIREML because FIREML is only
available in the MLM framework and, therefore, finite sample test statistics are employed.
Furthermore, small sample issues may persist at larger sample sizes because the estimation works
around missing values and does not impute. For example, a data set with 80 people and 40% missing
values really has roughly 50 “individuals worth” of information were there no missing observations
(depending on the growth trajectory and the allocation of the missing values). That is, direct likelihood
methods do not take any direct action to accommodate the missing data (i.e., no values are imputed),
so data with 80 people and missing values may be as susceptible to small-sample bias as a sample of 50
people with complete data (e.g., McNeish & Harring, 2017).

Joint imputation
Imputations made by JMI are taken from a multivariate normal distribution, which may not be appro-
priate for all variables in the model since some may be discrete and others may be nonnormal.
Although the reliance of normality has been shown to be fairly innocuous in general (Schafer & Olsen,
1998), normality assumptions become more tenuous and their effects more pronounced as sample sizes
decrease (e.g., differences in t tests vs. Z-tests for small samples). The reliance on multivariate normal-
ity when imputing small-sample data has not been investigated thoroughly in the literature.

Predictive mean matching


Small samples may be problematic when using predictive mean matching because imputations are
taken directly from another individual in the data. With small samples, there may be limited overlap
between cases since the universe of donors is necessarily small. Potential donors, in effect, may not be
very similar to the observation containing the missing value, especially if using the recommended
strategy of randomly selecting the donor from five potential donors (i.e., the fifth-most-similar obser-
vation with a sample of 25 could be quite different from the observation whose value is missing). Alter-
natively, one person may repeatedly be selected as a donor if few other individuals have satisfactory
THE JOURNAL OF EXPERIMENTAL EDUCATION 695

overlap. The result may be many observations with identical imputed values, which artificially restricts
the variability of the data (White, Royston, & Wood, 2011). With small samples, it is also important to
consider whether the individuals with observed data cover the full range of the variables with missing
values. Imputed values can possibly be taken only from these people with observed data, so poor cover-
age can lead to nonrepresentative data that is artificially truncated (White & Carlin, 2010).

Simulation study
Simulation design
We conducted a simulation study to investigate the differences between missing-data methods with
small samples with both full and restricted ML estimation. Reminiscent of Shin et al. (2016), we gener-
ate data from a linear growth model with four repeated measures. We also include two time-invariant
covariates, both of which are binary. The first covariate has 50:50 prevalence (representative of biologi-
cal sex or a treatment status indicator) and the second covariate has 60:40 prevalence (representative
of a binary minority status indicator based on U.S. demographics). This generation model is based on
the data generation model in Muthen and Muthen (2002). Each covariate predicts both the intercept
and the linear slope. A path diagram for the data generation model with population values is provided
as a supplemental file. We generated four conditions for sample size: 15, 30, 50, and 100. These values
were selected because 100 is generally the cutoff for a “small sample” designation in growth models
(Curran, Obeidat, & Losardo, 2010) while ML generally starts to breakdown in growth models with
about 50 people in less complex models (McNeish, 2016).
To generate missing values, we followed the procedure outlined in Shin et al. (2009). To generate
missing-at-random (MAR) data, the following process was carried out using SAS data steps.
1. Scores at each time point are divided into two groups based on a median split.
2. A percentage of scores for the group comprising the lower 50th percentile of the previous time
point are randomly selected to be missing. As an example, to generate missing values at Time 2,
data at Time 1 is median split to form two groups. The group in the lower 50% at Time 1 is then
randomly selected to yield missing values at Time 2.
3. The same procedure is repeated for subsequent time points where the preceding time point is
median split and random values are selected for the present time point (the split at Time 2 affects
Time 3 and so on).
This means that the missing values are related to another variable in the model (Time 1 scores for
Time 2 missing values) but not the variables themselves (Time 2 scores at Time 2) or a variable not
included in the model, thus satisfying the conditions necessary for data to be MAR. Data were always
generated to be complete at Time 1.
We included two conditions for the amount of missing data. Data in both conditions were generated
such that once a value was generated to be missing, the individual was generated to be missing at all
subsequent time points. This is typically referred to as attrition, dropout, or monotone missingness
(Schafer & Graham, 2002). In the first condition, the amount of missing data generated to be missing
at each time point was: Time 1, 0%; Time 2, 15%; Time 3, 27%; Time 4, 37%. This condition represents
a constant dropout rate where 15% of the data are missing are each time point (the percentage of miss-
ingness deviates slightly from 15% at each point because missing values were generated randomly and
some values were already missing because of earlier dropout). In the second condition, the percentage
of missing values was increased and the pattern of missingness was altered so that missingness
increased over time. Again, the first time point was complete but 10% of the data was set to be missing
at Time 2, 25% at Time 3, and 60% at Time 4.6
To each replicated data set, we fit the model using six separate missing-data methods:
1. FIML
2. FIREML with a Kenward-Roger correction
3. JMI with ML after imputation
4. JMI with REML and a Kenward-Roger correction after imputation
696 D. MCNEISH

5. Predictive mean matching with ML after imputation


6. Predictive mean matching with REML and a Kenward-Roger correction after imputation
Data were generated in SAS Proc IML and models were fit in SAS using Proc Mixed and Proc Calis.
For the JMI conditions, a Jeffreys priors was used based on recommendations in Schafer (1997).
Because of the small sample and the principle that multiple imputation efficiency is based on the num-
ber of imputations, we used 50 imputations for the JMI and predictive mean matching conditions
rather than the standard five imputations (Bodner, 2008; Graham, Olchowski, & Gilreath, 2007). For
each cell of the simulation design, we ran 1,000 replications.

Outcome measures
Relative bias
The first outcome is the relative bias of the slope mean/coefficient and time invariant predictors. Rela-
tive bias is calculated by taking the difference between the estimated parameter and the population
value and then dividing by the population value, .Estimated ¡ Population/
Population . Based on recommendations in
Hoogland and Boomsma (1998), relative biases with absolute values greater than 10% are meaningfully
biased.

Ninety-five percent interval coverage rate


The coverage rate of the 95% confidence interval is a metric used to investigate the performance of
sampling variability estimates. For each replication of the simulation, a confidence interval is estimated
for each parameter. If the variability is accurately estimated, 95% of intervals should contain the popu-
lation parameter value. If intervals are too short, that indicates the variability is being underestimated
(which inflates the Type-I error rate). Intervals that are too wide indicate the variability is being overes-
timated (which inflates Type-II error rates). Based on criteria in Bradley (1978), 95% interval coverage
rates below 92.5% or above 97.5% are indicative of poor variability estimates.

Results
Table 1 shows the relative bias and 95% interval coverage rates for the direct likelihood conditions and
Table 2 shows the relative bias and confidence interval coverage for the JMI and predictive mean
matching conditions. The results for the complete data (before missing values were induced) are
shown, for comparison, at the right of each table.
In Table 1, neither FIML nor FIREML estimates display meaningful bias based on the Hoogland and
Boomsma (1998) criteria; however, coverage intervals were too short for all parameters with 30 or
fewer individuals for FIML. This pattern is observed whether or not missing data are present. This
demonstrates that, even though FIML is viable as a missing-data method asymptotically, its finite sam-
ple properties are problematic and render FIML problematic for small-sample–missing-data problems.
However, the FIREML condition showed no such issues with interval coverage, even with as few as 15
individuals. FIREML operates based on the same principles as FIML, maintaining its desirable proper-
ties; however, using REML, finite sample test samples, and the Kenward-Roger correction yields better
results.
In the JMI conditions in Table 2, the general pattern of Table 1 is upheld although the bias and
the coverage intervals were less well-behaved, especially for the variable missing-data condition with
more missing data. In the 15% missing-data condition, the results were quite close to the direct likeli-
hood conditions—estimating the model from imputed data with ML led to short coverage intervals at
samples of 30 or fewer while the coverage intervals from REML showed no issues. Relative bias of
the parameters was problematic in the 15-sample-size condition but not with larger samples. In the
variable missing-data condition, both estimation methods had difficulty in the smallest-sample-size
condition with both the coverage intervals and relative bias being outside the acceptable range. At a
sample size of 30, ML coverage intervals were noticeably worse than in the 15% missing-data
condition.
THE JOURNAL OF EXPERIMENTAL EDUCATION 697

Table 1. Relative bias and confidence interval coverage for direct likelihood methods.

15% Missing Condition Variable Missing Condition Complete Data

FIML FIREML FIML FIREML ML REML


Sample
size Predictor CI RB CI RB CI RB CI RB CI RB CI RB

15 Slope 88 4 95 4 88 ¡7 95 ¡7 90 3 96 3
X1 89 5 96 5 91 4 96 4 91 4 95 4
X1 £ Slope 88 ¡6 95 ¡6 91 ¡3 94 ¡3 90 ¡9 96 ¡9
X2 90 7 95 7 91 5 95 5 91 ¡1 96 ¡1
X2 £ Slope 86 1 94 1 90 9 94 9 90 9 95 9

30 Slope 91 5 95 5 91 ¡8 95 ¡8 92 2 95 2
X1 91 3 96 3 90 4 96 4 92 4 95 4
X1 £ Slope 92 ¡5 95 ¡5 91 ¡3 94 ¡3 91 ¡4 95 ¡4
X2 91 5 96 4 91 0 95 0 92 0 95 0
X2 £ Slope 91 4 95 5 90 7 95 7 91 4 95 4

50 Slope 93 4 95 3 93 ¡6 95 ¡6 94 2 96 2
X1 92 2 96 2 93 2 96 2 93 4 95 4
X1 £ Slope 94 ¡6 95 ¡6 94 0 95 0 94 ¡4 95 ¡4
X2 93 7 95 7 93 7 95 7 94 ¡4 95 ¡4
X2 £ Slope 93 ¡5 95 ¡4 93 1 95 1 93 8 95 8

100 Slope 94 4 95 3 94 ¡4 95 ¡4 94 2 95 2
X1 95 1 95 1 94 2 95 2 95 2 95 2
X1 £ Slope 95 ¡10 95 ¡10 95 ¡6 95 ¡6 95 ¡3 95 ¡3
X2 94 3 95 3 94 3 95 3 95 ¡6 95 ¡6
X2 £ Slope 93 ¡4 95 ¡4 95 0 96 0 94 4 96 4

Note. CI D 95% interval coverage, RB D relative bias. Bold CI values exceed Bradley’s range; bold RB values exceed a magnitude of
10%.

Table 2 shows that predictive mean matching is generally not desirable with small-sample growth
data. Relative bias exceeded the acceptable range regardless of estimation method, even in the 100-sam-
ple-size condition. In smaller-sample-size conditions, the magnitude of the bias exceeded 100% for
some parameters. Coverage intervals were also rather poor and tended to be untrustworthy across the
spectrum of conditions.

Discussion and conclusions


Based on the results of our simulation, several patterns are noteworthy. First, although FIML has been
noted to perform poorly with small samples, this finding did not hold in this study. The idea behind
direct likelihood does not appear to be flawed. Though FIML tends to perform somewhat poorly with
small samples in the conditions included in this study, the issue lies in the small-sample bias of the esti-
mator rather than the inability of the method to handle missing data. This can be seen by the fact that
FIML has issues with smaller sample sizes, but FIREML is essentially unblemished. Thus, small-sample
bias of the selected estimator rather than the principles of direct likelihood are at the heart of the issue.
In the context of growth models, our results show that imputation is a less serviceable choice com-
pared to direct likelihood in the conditions we studied. In growth models, there tend to be relatively
few variables (e.g., participants are measured a few times) and the presence of such scant information
seems to make the imputation difficult because there are not many sources of information from which
to produce strong imputations. This is especially prevalent in the results for the predictive mean
matching conditions, which were quite poor in absolute terms and in relation to the other methods.
One potential drawback of FIREML is missing values on covariates. In SAS Proc Mixed, for
instance, FIREML is limited to situations where missingness is constrained to the outcome variable
and observations with missing covariates are listwise deleted. This is not a software limitation but
rather a difficulty with the direct likelihood methods for univariate models in general (Ibrahim, 1990).
698

Table 2. Relative bias and confidence interval coverage for JMI and predictive mean matching methods.

15% Missing Condition 15% Missing Condition Variable Missing Condition Variable Missing Condition Complete Data
D. MCNEISH

JMI-ML JMI-REML PMM-ML PMM-REML JMI-ML JMI-REML PMM-ML PMM-REML ML REML


Sample
size Predictor CI RB CI RB CI RB CI RB CI RB CI RB CI RB CI RB CI RB CI RB

15 Slope 89 12 93 12 91 39 94 39 79 76 82 76 80 104 88 104 90 3 90 3


X1 91 6 97 6 95 9 97 9 90 19 98 19 96 24 98 24 91 4 95 4
X1 £ Slope 87 ¡9 95 ¡9 95 ¡67 98 ¡67 88 ¡54 92 ¡54 96 ¡146 99 ¡146 90 ¡9 96 ¡9
X2 87 16 96 16 92 15 96 15 90 24 98 24 95 28 97 28 91 ¡1 96 ¡1
X2 £ Slope 88 ¡33 93 ¡33 92 ¡46 97 ¡46 89 ¡76 93 ¡76 97 ¡116 99 ¡116 90 9 95 9

30 Slope 92 5 96 5 92 34 95 34 87 ¡5 90 ¡5 76 97 82 97 92 2 95 2
X1 92 2 96 2 95 9 96 9 91 4 96 4 96 21 97 21 92 4 95 4
X1 £ Slope 92 1 95 1 97 ¡50 98 ¡50 91 ¡8 94 ¡8 97 ¡131 99 ¡131 91 ¡4 95 ¡4
X2 92 7 96 7 95 10 97 11 91 ¡3 97 ¡3 96 20 98 20 92 0 95 0
X2 £ Slope 91 ¡4 96 ¡4 96 ¡35 98 ¡35 92 47 94 47 99 ¡99 99 ¡99 91 4 95 4

50 Slope 94 3 96 3 93 26 94 26 92 3 94 3 73 88 77 88 94 2 96 2
X1 95 1 96 1 96 6 97 6 98 6 97 6 97 20 98 20 93 4 95 4
X1 £ Slope 95 ¡3 96 ¡3 98 ¡44 99 ¡44 95 ¡17 97 ¡17 98 ¡121 98 ¡121 94 ¡4 95 ¡4
X2 94 9 95 9 94 11 95 11 96 9 97 9 95 24 96 24 94 ¡4 95 ¡4
X2 £ Slope 95 ¡8 96 ¡8 97 ¡39 98 ¡39 95 ¡6 96 ¡6 98 ¡110 99 ¡110 93 8 95 8

100 Slope 94 3 95 3 93 23 94 23 95 2 96 2 72 71 73 71 94 2 95 2
X1 94 0 95 0 95 6 96 6 95 2 95 2 95 17 96 17 95 2 95 2
X1 £ Slope 96 ¡5 96 ¡5 97 ¡46 98 ¡46 95 ¡5 95 ¡5 96 ¡113 97 ¡113 95 ¡3 95 ¡3
X2 94 4 95 4 95 7 95 7 95 6 96 6 96 18 96 18 95 ¡6 95 ¡6
X2 £ Slope 94 ¡6 94 ¡6 97 36 97 36 96 ¡12 96 ¡12 97 ¡97 97 ¡97 94 4 96 4

Note. JMI D Joint multiple imputation, PMM D Predictive mean matching, CI D 95% interval coverage, RB D relative bias. Bold CI values exceed Bradley’s range; bold RB values exceed a magnitude of 10%.
THE JOURNAL OF EXPERIMENTAL EDUCATION 699

The common remedy for direct likelihood with missing covariates is to resort to the SEM framework
such that the covariates can be specified to be endogenous—meaning that the joint likelihood can be
used rather than the conditional likelihood (Horton & Kleinman, 2007). This option is not viable with
small samples, however, because REML itself is based on the conditional likelihood. However, in
growth models, covariates are often time invariant—meaning that if the covariate is observed at one
time point, it can be carried forward to all other observations because such covariates do not change
over time, by definition.
Nonetheless, there may be cases where time-varying covariates are missing, which makes the imple-
mentation of FIREML more difficult. In such cases, we recommend conducting the minimal amount
of data preprocessing necessary to get the data in a “FIREML-able” form. For example, we would rec-
ommend that only the time-varying covariates be treated with multiple imputation. Then during the
analysis phase of multiple imputation, the missing outcome values would be left missing such that
each model is fit using FIREML (with a Kenward-Roger correction) to deal with the missing outcomes.

Limitations
Seeing as the scope of the research question was quite narrow (small samples and missing data and
growth models), performance of the methods under question could be reasonably compared within a
small simulation study. Nonetheless, no simulation study can completely address any single question
and it is important to note the limitations of the current study. We included only one model type and
one missing-data mechanism. We selected four repeated measures because this number is commonly
seen in the literature, especially with smaller samples, because it is cost effective while still allowing
inspection of possible nonlinear trajectories (Curran et al., 2010; Vickers, 2003). Different behavior
could be observed for studies with more repeated measures. To keep the number of conditions reason-
able, we also generated data from multivariate normal distributions and generated missing data via a
MAR mechanism. We also only explored models that could be fit interchangeably in the MLM or SEM
frameworks. There are some types of models that are commonly used but that can only be fit in the
more general SEM framework (e.g., latent basis models or second-order growth models; Hancock,
Kuo, & Lawrence, 2001; Wu & Lang, 2016). The recommendations provided here are not applicable to
these types of growth models because these models cannot be presently estimated with REML.

Concluding remarks
Though the SEM and the MLM growth model frameworks are often considered to be interchangeable,
there are differences in the estimation routines with small samples, broadly defined as fewer than 100
people. Previous studies have confounded biases attributable to small sample sizes and to missing data
by not accounting for differential performance of different estimators in such contexts. Despite theoret-
ical considerations that direct likelihood should be more efficient than multiple imputation with a finite
number of imputations in small samples, evidence to the contrary has been reported. Here, we found
that much of the bias previously observed is related to the small-sample-size issues in the estimation
not in the accommodation of missing data. Using the lesser known FIREML, growth model data with
small samples and missing data are not as problematic as has been previously reported, and when
appropriate small-sample methods are used, direct likelihood was found to yield desirable properties
such that efficiency advantages over multiple imputation with small samples can be retained. Addition-
ally, if multiple imputation methods are used with smaller samples, using more appropriate small-sam-
ple estimation methods after imputation yielded results with more-desirable statistical properties.

Notes
1. This corresponds to the iterative generalized least squares algorithm (IGLS) for obtaining maximum likelihood esti-
mates. There are other algorithms to obtain the maximum likelihood estimates; another common method is the EM
algorithm.
700 D. MCNEISH

2. Asymptotically, treating the fixed effects as known in the calculation of the variance components is negligible
because sampling variance approaches zero as sample size approaches infinity and degrees of freedom are less
impactful at larger sample sizes.
3. For a full paper outlining the differences between REML and ML that does not rely on equations, readers are
referred to McNeish (2017b).
4. Cheung (2013) discusses an REML estimator for SEM in some conditions. This version of REML is based on equiva-
lent models and does not derive REML for SEM but instead transforms an SEM to an MLM and applies the MLM
REML equations. Though effective for some models, it is not fully generalizable.
5. Traditionally, direct likelihood and FIML are considered interchangeable terms for the same method (Allison, 2012).
However, FIML is technically one type of direct likelihood method. In this paper, we use “FIML” to specifically refer
to full information maximum likelihood and “direct likelihood” to refer to the broader class of such methods, which
includes but is not limited to FIML.
6. It seems contradictory that we set missing values to the lower 50% of the distribution but achieved 60% missingness.
This occurred by setting the entire lower 50% at Time 3 to be missing at Time 4 in addition to dropouts at Time 2
and Time 3 that were not in the lower 50% at Time 3.

References
Allison, P. D. (2012, April 23). Handling missing data by maximum likelihood (Keynote presentation at the SAS Global
Forum, Orlando, Florida). Retrieved from http://www.statisticalhorizons.com/wp-content/uploads/MissingData
ByML.pdf
Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schu-
macker (Eds.), Advanced structural equation modeling (pp. 243–277). Mahwah, NJ: Lawrence Erlbaum.
Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: What is it and how
does it work? International Journal of Methods in Psychiatric Research, 20, 40–49.
Barnes, S. A., Lindborg, S. R., & Seaman, J. W. (2006). Multiple imputation techniques in small sample clinical trials. Sta-
tistics in Medicine, 25, 233–245.
Beunckens, C., Molenberghs, G., & Kenward, M. G. (2005). Direct likelihood analysis versus simple forms of imputation
for missing data in randomized clinical trials. Clinical Trials, 2, 379–386.
Bodner, T. E. (2008). What improves with increased missing data imputations? Structural Equation Modeling, 15, 651–675.
Bollen, K. A., & Curran, P. J. 2006. Latent curve models: A structural equation perspective. Hoboken, NJ: Wiley.
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144–152.
Browne, W. J., & Draper, D. (2006). A comparison of Bayesian and likelihood-based methods for fitting multilevel mod-
els. Bayesian Analysis, 1, 473–514.
Cheung, M. W. L. (2013). Implementing restricted maximum likelihood estimation in structural equation models. Struc-
tural Equation Modeling, 20, 157–167.
Curran, P. J. (2003). Have multilevel models been structural equation models all along? Multivariate Behavioral Research,
38, 529–569.
Curran, P. J., Obeidat, K., & Losardo, D. (2010). Twelve frequently asked questions about growth curve modeling. Journal
of Cognition and Development, 11, 121–136.
Delucchi, K., & Bostrom, A. (1999). Small sample longitudinal clinical trial with missing data: A comparison of analytic
methods. Psychological Methods, 4, 158–172.
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford Press.
Enders, C. K. (2001). A primer on maximum likelihood algorithms available for use with missing data. Structural Equa-
tion Modeling, 8, 128–141.
Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for
missing data in structural equation models. Structural Equation Modeling, 8, 430–457.
Foulley, J. L., Jaffrezic, F., & Robert-Granie, C. (2000). EM-REML estimation of covariance parameters in Gaussian mixed
models for longitudinal data analysis. Genetics Selection Evolution, 32, 1.
Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clar-
ifications of multiple imputation theory. Prevention Science, 8, 206–213.
Graham, J. W., & Schafer, J. L. (1999). On the performance of multiple imputation for multivariate data with
small sample size. In R. Hoyle (Ed.), Statistical strategies for small sample research (pp. 1–29). Thousand Oaks,
CA: Sage.
Hancock, G. R., Kuo, W. L., & Lawrence, F. R. (2001). An illustration of second-order latent growth models. Structural
Equation Modeling, 8, 470–489.
Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling: An overview and a meta-
analysis. Sociological Methods and Research, 26, 329–367.
Horton, N. J., & Kleinman, K. P. (2007). Much ado about nothing. The American Statistician, 61, 79–90.
Horton, N. J., & Lipsitz, S. R. (2001). Multiple imputation in practice: Comparison of software packages for regression
models with missing variables. American Statistician, 55, 244–254.
THE JOURNAL OF EXPERIMENTAL EDUCATION 701

Ibrahim, J. G. (1990). Incomplete data in generalized linear models. Journal of the American Statistical Association, 85,
765–769.
Kenward, M. G., & Roger, J. H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Bio-
metrics, 53, 983–997.
Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963–974.
McNeish, D. (2017a). Missing data methods for arbitrary missingness with small samples. Journal of Applied Statistics, 44,
24–39.
McNeish, D. (2017b). Small sample methods for multilevel modeling: A colloquial elucidation of REML and the Ken-
ward-Roger correction. Multivariate Behavioral Research. Advance online publication. Retrieved from https://doi.org/
10.1080/00273171.2017.1344538
McNeish, D. (2016). Using data-dependent priors to mitigate small sample size bias in latent growth models: A discussion
and illustration using Mplus. Journal of Educational and Behavioral Statistics, 41, 27–56.
McNeish, D., & Harring, J. R. (2017). Correcting model fit criteria for small sample latent growth models with incomplete
data. Educational and Psychological Measurement. Advance online publication. doi:10.1177/0013164416661824.
Mehta, P. D., & West, S. G. (2000). Putting the individual back into individual growth curves. Psychological Methods, 5,
23–43.
Muthen, L. K., & Muthen, B. O. (2002). How to use a Monte Carlo study to decide on sample size and determine power.
Structural Equation Modeling, 9, 599–620.
Preacher, K. J., Wichman, A. L., MacCallum, R. C., & Briggs, N. E. (2008). Latent growth curve modeling. Thousand Oaks,
CA: Sage.
Roberts, B. W., Walton, K. E., & Viechtbauer, W. (2006). Patterns of mean-level change in personality traits across the life
course: A meta-analysis of longitudinal studies. Psychological Bulletin, 132, 1–25.
Schaalje, G. B., McBride, J. B., & Fellingham, G. W. (2002). Adequacy of approximations to distributions of test statistics
in complex mixed linear models. Journal of Agricultural, Biological, and Environmental Statistics, 7, 512–524.
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177.
Schafer, J. L., & Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: A data analyst’s per-
spective. Multivariate Behavioral Research, 33, 545–571.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. London, UK: Chapman & Hill.
Seaman, S. R., & Hughes, R. A. (2016). Relative efficiency of joint-model and full-conditional-specification multiple
imputation when conditional models are compatible: The general location model. Statistical Methods in Medical
Research. Advance online publication. doi:10.1177/0962280216665872
Shah, A., Laird, N., & Schoenfeld, D. (1997). A random-effects model for multiple characteristics with possibly missing
data. Journal of the American Statistical Association, 92, 775–779.
Shin, T., Davison, M. L., & Long, J. D. (2016). Maximum likelihood versus multiple imputation for missing data in small
longitudinal samples with nonnormality. Psychological Methods. Advance online publication. doi:10.1037/met0000094
Shin, T., Davison, M. L., & Long, J. D. (2009). Effects of missing data methods in structural equation modeling with non-
normal longitudinal data. Structural Equation Modeling, 16, 70–98.
Van Buuren, S., Boshuizen, H. C., & Knook, D. L. (1999). Multiple imputation of missing blood pressure covariates in
survival analysis. Statistics in Medicine, 18, 681–694.
Vickers, A. J. (2003). How many repeated measures in repeated measures designs? Statistical issues for comparative trials.
BMC Medical Research Methodology, 3, 1.
Vink, G., Frank, L. E., Pannekoek, J., & Buuren, S. (2014). Predictive mean matching imputation of semicontinuous varia-
bles. Statistica Neerlandica, 68, 61–90.
Von Hippel, P. T. (2016). New confidence intervals and bias comparisons show that maximum likelihood can beat multi-
ple imputation in small samples. Structural Equation Modeling, 23, 422–437.
White, I. R., Royston, P., & Wood, A. M. (2011). Multiple imputation using chained equations: Issues and guidance for
practice. Statistics in Medicine, 30, 377–399.
White, I. R., & Carlin, J. B. (2010). Bias and efficiency of multiple imputation compared with complete-case analysis for
missing covariate values. Statistics in Medicine, 29, 2920–2931.
Wu, W., & Lang, K. M. (2016). Proportionality assumption in latent basis curve models: A cautionary note. Structural
Equation Modeling, 23, 140–154.
Wu, W., West, S. G., & Taylor, A. B. (2009). Evaluating model fit for growth curve models: Integration of fit indices from
SEM and MLM frameworks. Psychological Methods, 14, 183–201.
Copyright of Journal of Experimental Education is the property of Taylor & Francis Ltd and
its content may not be copied or emailed to multiple sites or posted to a listserv without the
copyright holder's express written permission. However, users may print, download, or email
articles for individual use.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy