Content Server
Content Server
ABSTRACT KEYWORDS
Small samples are common in growth models due to financial and logistical Growth model; missing data;
difficulties of following people longitudinally. For similar reasons, longitudinal mixed effect model;
studies often contain missing data. Though full information maximum multilevel model; small
likelihood (FIML) is popular to accommodate missing data, the limited sample
number of studies in this area have found that FIML tends to perform poorly
with small-sample growth models. This report demonstrates that the fault
lies not with how FIML accommodates missingness but rather with maximum
likelihood estimation itself. We discuss how the less popular restricted
likelihood form of FIML, along with small-sample-appropriate methods, yields
trustworthy estimates for growth models with small samples and missing
data. That is, previously reported small sample issues with FIML are
attributable to finite sample bias of maximum likelihood estimation not
direct likelihood. Estimation issues pertinent to joint multiple imputation and
predictive mean matching are also included and discussed.
IN BEHAVIORAL SCIENCES, small samples are extremely common due to financial limitations, due
to difficulty attracting enough participants who meet criteria for participation, or because the research
is experimental. These rationales tend to be related to missing data as well, and small samples and
missing data often appear simultaneously in empirical studies. Though the total number remains small,
a handful of investigations have begun to appear in the methodological literature that address the
performance of missing data methods when sample size is small (Barnes, Lindborg, & Seaman, 2006;
Delucchi & Bostrom, 1999; Graham & Schafer, 1999; McNeish, 2017a; Shin, Davison, & Long, 2016;
von Hippel, 2016). Given the nascent state of small-sample, missing data research, much of the avail-
able research on small-sample methods has focused on less complex statistical models such as mean or
group comparisons (Barnes et al., 2006; Delucchi & Bostrom, 1999) or single-level regression
(McNeish, 2017a; Graham & Schafer, 1999). Studies are just beginning to extend small-sample, missing
data investigations to multivariate models (McNeish & Harring, 2017; Shin, Davison, & Long, 2009,
Shin et al., 2016).
These studies tend to find that the popular full information maximum likelihood (FIML) method to
accommodate missing data (aka direct likelihood or raw likelihood) performs relatively poorly in small
samples despite its broad popularity. In behavioral sciences, FIML is most often implemented in the
structural equation modeling (SEM) framework (Enders & Bandalos, 2001; for growth modeling, this is
sometimes referred to as latent curve modeling, Bollen & Curran, 2006). Though justifiable, previous
studies have discussed how the SEM framework for growth is handicapped with small sample data
(roughly 100 or less; McNeish, 2016). In empirical studies, smaller samples are common and appear in
about 33% of studies based on a meta-analysis of growth models in personality psychology by Roberts,
Walton, and Viechtbauer (2006), for example. As we describe in more detail shortly, maximum likelihood
(ML) estimation in the SEM framework is known to underestimate growth-factor variances and standard
errors with small samples, even with complete data (Browne & Draper, 2006). As a result, the conclusions
about FIML with small-sample growth models are difficult to interpret. Is the performance of FIML for
handling missingness poor, is poor performance due to small sample bias that exists with ML, or do the
two factors interact such that small-sample bias is exacerbated with missing data?
In this report, we discuss the shortcomings of the SEM framework with small-sample longitudinal
data by comparing it to the multilevel model (MLM) framework. The literature on the latter has devel-
oped many small-sample specific methods and corrections that make growth models less susceptible to
small-sample bias than the SEM framework, even though the models are often mathematically equiva-
lent (Curran, 2003). We then perform a simulation in which we include models estimated with the
SEM framework and with small-sample specific corrections only available within the MLM framework.
The main goal of this report is to disentangle the bias in estimates that is attributable to small samples
from bias that may stem from poor performance of missing data methods. Though this introduction
has focused on FIML, we also extend the investigation to multiple imputation methods such as joint
multiple imputation and predictive mean matching.
y i D Xi b C Zi ui C ei ; (1)
where y i is an mi £ 1 vector of responses for person i, mi is the number of observations for person i, Xi
is an mi £ p design matrix for the predictors, p is the number of predictors, b is a p £ 1 vector of fixed
regression coefficients, Zi is an mi £ r design matrix for the random effects, r is the number of random
effects, ui is an r £ 1 vector of random effects where ui » MVN.0; G/, and ei is a matrix of residuals of
the observations where ei » MVN.0; Ri / and Cov.ui ; ei / D 0.
means, G is a matrix of coefficients for the predicted effect of time-invariant covariates on the latent
growth trajectory factors, Xi is a matrix of time-invariant covariates, and z i is a vector of random
effects that is distributed MVN.0; C/.
where S is the observed covariance matrix of observed variables, S ^ is the model-implied covariance
matrix, m is the mean vector of the observed variables, and m ^ is the model-implied mean vector
(Preacher, Wichman, MacCallum, & Briggs, 2008). These estimates correspond to an MLM estimated
with full ML where the objective log-likelihood function for the variance components is
1 ^ j ¡ 1 .y ¡ Xb ^ ¡ 1 .y ¡ Xb
^ GLS /T V ^ GLS /
lFIML .G; R/ D ¡ log j V (4)
2 2
where V D Var.y/ D ZGZT C R and b ^ GLS D .XT V ¡ 1 X/ ¡ 1 XT V ¡ 1 y, the generalized least squares
estimator for the fixed effects. However, no adjustment is made to account for b being estimated in
1
the model in Equation 3, which is used to estimate G and R. That is, G and R capture variance
around a fixed effect but the fixed effect is not known a priori. Without an anchor point, estimating
variance is difficult because it is not known about which reference point data are dispersed. Even
though the fixed effects are estimated in the same model, when calculating G and R, the estimates of
the fixed effects are treated as known values, despite these effects consuming degrees of freedom, and
have sampling variability. As a result, the elements of G are underestimated with small sample sizes
because these aspects are disregarded,2 which then leads to Var.b/ ^ being underestimated as well
(standard error estimates are taken from the square root of the diagonal elements of Var.b/) ^ because
P
^ D f J .XT V ¡ ¡ 1
Var.b/ ^ j X /g .
1
jD1 j j
The remedy for this situation is referred to as restricted maximum likelihood (REML). REML takes
the loss of degrees of freedom from estimating b into account and also separates the estimation of the
variance components from the fixed effects so that the fixed effects need not be treated as known.
REML uses the residuals from an ordinary least squares fit, r D y ¡ Xb ^ OLS D y ¡ X.XT X/ ¡ 1 XT y, and
ML is then applied to these OLS residuals, which are independent of the fixed effects by definition but
still retain the same conditional variance, as the outcome. Using the OLS residuals allows the variance
components to be estimated around a truly fixed regression line (fixed effects are necessarily 0 because
^ OLS ?rOLS ). The objective log-likelihood function for the variance components with REML is
b
1 ^ j ¡ 1 log j XT V
^ ¡ 1 X j ¡ 1 .y ¡ Xb ^ ¡ 1 .y ¡ Xb
^ GLS /T V ^ GLS /
lREML .G; R/ D ¡ log j V (5)
2 2 2
^ ¡ 1 X j term in Equation 4 (the generalized least squares covariance matrix for the
The 12 log j XT V
fixed effects) accounts for the uncertainty of estimating b and reduces the bias from estimates of the
variance components with small samples. Instead of estimating b and the variance components simul-
taneously, b is estimated separately from the variance components with REML. Therefore, V ^ does not
^
underestimate V, so Var.b/ is not necessarily underestimated. Conceptually, the difference between
THE JOURNAL OF EXPERIMENTAL EDUCATION 693
ML and REML is akin to dividing a sample variance estimate by n or n ¡1. With large samples, the dif-
ference is hardly noticeable and both converge asymptotically; however, for decreasing sample sizes,
the difference becomes increasingly discernable.3
To address downward bias of standard error estimates, the Kenward-Roger correction (Kenward &
Roger, 1997) in the MLM framework has been shown to yield unbiased estimates that produce Type I
error rates at the desired level. This correction includes an inflation factor to remove the downward
bias and a Satterthwaite-type degree of freedom method to more closely approximate the correct null
sampling distribution with the appropriate degrees of freedom. Therefore, though the models remain
mathematically equivalent in most contexts, use of different estimation methods results in disparate
small-sample properties.
As noted in McNeish (2016), REML and Kenward-Roger methods do not have analogs in SEM and
small-sample issues in the SEM framework continue to be underresearched compared to the MLM
framework. Deriving a broad REML-type estimator for SEM has also been noted to be a challenging
endeavor. Without delving into details, the general issue is that the restricted likelihood function
involves additional computations that are manipulations of the fixed effect design matrix (X in Equa-
tion 1). This matrix does not exist in the SEM framework because its elements are allocated to the a,
L, and X matrices in Equation 2.4
Joint imputation
Rather than working around missing values as in direct likelihood, joint multiple imputation (JMI)
imputes data for missing values. Using Markov chain Monte Carlo, values are imputed using two itera-
tive steps: an imputation step (I-Step) and a posterior step (P-Step). The first I-Step simulates values for
the missing data using the mean and covariance matrix from the observed data as a prior distribution.
After a set of plausible values are imputed in the first I-Step, the P-Step then simulates the posterior
mean vector and covariance matrix based on the now-complete imputed data from the I-Step. The pos-
teriors from the P-Step are used as the prior distribution in the next I-Step where new values are
imputed for the missing data and the process continues to iterate between I-Steps and P-Steps. The
method is referred to as “joint” imputation because the simulated values are drawn from a single multi-
variate normal distribution based on a single mean vector and covariance matrix (Schafer & Graham,
2002).
matching approach to FCS imputation, a regression model is built for each variable that has missing
data, using the observations with complete data. The goal of this regression is not to predict the
hypothetical value of the missing observation, however. Instead, the goal is to identify donor cases
whose values on other variables most closely match the case with missing data. It is common to identity
five such donor cases and then choose one at random for imputation (Enders, 2010). The value of this
selected donor case is directly imputed for the missing value (Schafer & Graham, 2002).
Predictive mean matching (PMM) has been noted to be particularly useful when the distribution of
the variable of interest is not easily modeled with straightforward methods such as when data are trun-
cated or highly skewed (Azur, Stuart, Frangakis, & Leaf, 2011). PMM ensures that imputed values are
plausible and in-range because the imputed values are taken from observed data (Horton & Lipsitz,
2001). As a further advantage, because regression is used only to find potential donor cases and not to
impute values, PMM is much less sensitive to potential misspecifications in the imputation model (Sea-
man & Hughes, 2016) and to distributional assumptions because the method is semiparametric in
nature (Vink, Frank, Pannekoek, & van Buuren, 2014).
Joint imputation
Imputations made by JMI are taken from a multivariate normal distribution, which may not be appro-
priate for all variables in the model since some may be discrete and others may be nonnormal.
Although the reliance of normality has been shown to be fairly innocuous in general (Schafer & Olsen,
1998), normality assumptions become more tenuous and their effects more pronounced as sample sizes
decrease (e.g., differences in t tests vs. Z-tests for small samples). The reliance on multivariate normal-
ity when imputing small-sample data has not been investigated thoroughly in the literature.
overlap. The result may be many observations with identical imputed values, which artificially restricts
the variability of the data (White, Royston, & Wood, 2011). With small samples, it is also important to
consider whether the individuals with observed data cover the full range of the variables with missing
values. Imputed values can possibly be taken only from these people with observed data, so poor cover-
age can lead to nonrepresentative data that is artificially truncated (White & Carlin, 2010).
Simulation study
Simulation design
We conducted a simulation study to investigate the differences between missing-data methods with
small samples with both full and restricted ML estimation. Reminiscent of Shin et al. (2016), we gener-
ate data from a linear growth model with four repeated measures. We also include two time-invariant
covariates, both of which are binary. The first covariate has 50:50 prevalence (representative of biologi-
cal sex or a treatment status indicator) and the second covariate has 60:40 prevalence (representative
of a binary minority status indicator based on U.S. demographics). This generation model is based on
the data generation model in Muthen and Muthen (2002). Each covariate predicts both the intercept
and the linear slope. A path diagram for the data generation model with population values is provided
as a supplemental file. We generated four conditions for sample size: 15, 30, 50, and 100. These values
were selected because 100 is generally the cutoff for a “small sample” designation in growth models
(Curran, Obeidat, & Losardo, 2010) while ML generally starts to breakdown in growth models with
about 50 people in less complex models (McNeish, 2016).
To generate missing values, we followed the procedure outlined in Shin et al. (2009). To generate
missing-at-random (MAR) data, the following process was carried out using SAS data steps.
1. Scores at each time point are divided into two groups based on a median split.
2. A percentage of scores for the group comprising the lower 50th percentile of the previous time
point are randomly selected to be missing. As an example, to generate missing values at Time 2,
data at Time 1 is median split to form two groups. The group in the lower 50% at Time 1 is then
randomly selected to yield missing values at Time 2.
3. The same procedure is repeated for subsequent time points where the preceding time point is
median split and random values are selected for the present time point (the split at Time 2 affects
Time 3 and so on).
This means that the missing values are related to another variable in the model (Time 1 scores for
Time 2 missing values) but not the variables themselves (Time 2 scores at Time 2) or a variable not
included in the model, thus satisfying the conditions necessary for data to be MAR. Data were always
generated to be complete at Time 1.
We included two conditions for the amount of missing data. Data in both conditions were generated
such that once a value was generated to be missing, the individual was generated to be missing at all
subsequent time points. This is typically referred to as attrition, dropout, or monotone missingness
(Schafer & Graham, 2002). In the first condition, the amount of missing data generated to be missing
at each time point was: Time 1, 0%; Time 2, 15%; Time 3, 27%; Time 4, 37%. This condition represents
a constant dropout rate where 15% of the data are missing are each time point (the percentage of miss-
ingness deviates slightly from 15% at each point because missing values were generated randomly and
some values were already missing because of earlier dropout). In the second condition, the percentage
of missing values was increased and the pattern of missingness was altered so that missingness
increased over time. Again, the first time point was complete but 10% of the data was set to be missing
at Time 2, 25% at Time 3, and 60% at Time 4.6
To each replicated data set, we fit the model using six separate missing-data methods:
1. FIML
2. FIREML with a Kenward-Roger correction
3. JMI with ML after imputation
4. JMI with REML and a Kenward-Roger correction after imputation
696 D. MCNEISH
Outcome measures
Relative bias
The first outcome is the relative bias of the slope mean/coefficient and time invariant predictors. Rela-
tive bias is calculated by taking the difference between the estimated parameter and the population
value and then dividing by the population value, .Estimated ¡ Population/
Population . Based on recommendations in
Hoogland and Boomsma (1998), relative biases with absolute values greater than 10% are meaningfully
biased.
Results
Table 1 shows the relative bias and 95% interval coverage rates for the direct likelihood conditions and
Table 2 shows the relative bias and confidence interval coverage for the JMI and predictive mean
matching conditions. The results for the complete data (before missing values were induced) are
shown, for comparison, at the right of each table.
In Table 1, neither FIML nor FIREML estimates display meaningful bias based on the Hoogland and
Boomsma (1998) criteria; however, coverage intervals were too short for all parameters with 30 or
fewer individuals for FIML. This pattern is observed whether or not missing data are present. This
demonstrates that, even though FIML is viable as a missing-data method asymptotically, its finite sam-
ple properties are problematic and render FIML problematic for small-sample–missing-data problems.
However, the FIREML condition showed no such issues with interval coverage, even with as few as 15
individuals. FIREML operates based on the same principles as FIML, maintaining its desirable proper-
ties; however, using REML, finite sample test samples, and the Kenward-Roger correction yields better
results.
In the JMI conditions in Table 2, the general pattern of Table 1 is upheld although the bias and
the coverage intervals were less well-behaved, especially for the variable missing-data condition with
more missing data. In the 15% missing-data condition, the results were quite close to the direct likeli-
hood conditions—estimating the model from imputed data with ML led to short coverage intervals at
samples of 30 or fewer while the coverage intervals from REML showed no issues. Relative bias of
the parameters was problematic in the 15-sample-size condition but not with larger samples. In the
variable missing-data condition, both estimation methods had difficulty in the smallest-sample-size
condition with both the coverage intervals and relative bias being outside the acceptable range. At a
sample size of 30, ML coverage intervals were noticeably worse than in the 15% missing-data
condition.
THE JOURNAL OF EXPERIMENTAL EDUCATION 697
Table 1. Relative bias and confidence interval coverage for direct likelihood methods.
15 Slope 88 4 95 4 88 ¡7 95 ¡7 90 3 96 3
X1 89 5 96 5 91 4 96 4 91 4 95 4
X1 £ Slope 88 ¡6 95 ¡6 91 ¡3 94 ¡3 90 ¡9 96 ¡9
X2 90 7 95 7 91 5 95 5 91 ¡1 96 ¡1
X2 £ Slope 86 1 94 1 90 9 94 9 90 9 95 9
30 Slope 91 5 95 5 91 ¡8 95 ¡8 92 2 95 2
X1 91 3 96 3 90 4 96 4 92 4 95 4
X1 £ Slope 92 ¡5 95 ¡5 91 ¡3 94 ¡3 91 ¡4 95 ¡4
X2 91 5 96 4 91 0 95 0 92 0 95 0
X2 £ Slope 91 4 95 5 90 7 95 7 91 4 95 4
50 Slope 93 4 95 3 93 ¡6 95 ¡6 94 2 96 2
X1 92 2 96 2 93 2 96 2 93 4 95 4
X1 £ Slope 94 ¡6 95 ¡6 94 0 95 0 94 ¡4 95 ¡4
X2 93 7 95 7 93 7 95 7 94 ¡4 95 ¡4
X2 £ Slope 93 ¡5 95 ¡4 93 1 95 1 93 8 95 8
100 Slope 94 4 95 3 94 ¡4 95 ¡4 94 2 95 2
X1 95 1 95 1 94 2 95 2 95 2 95 2
X1 £ Slope 95 ¡10 95 ¡10 95 ¡6 95 ¡6 95 ¡3 95 ¡3
X2 94 3 95 3 94 3 95 3 95 ¡6 95 ¡6
X2 £ Slope 93 ¡4 95 ¡4 95 0 96 0 94 4 96 4
Note. CI D 95% interval coverage, RB D relative bias. Bold CI values exceed Bradley’s range; bold RB values exceed a magnitude of
10%.
Table 2 shows that predictive mean matching is generally not desirable with small-sample growth
data. Relative bias exceeded the acceptable range regardless of estimation method, even in the 100-sam-
ple-size condition. In smaller-sample-size conditions, the magnitude of the bias exceeded 100% for
some parameters. Coverage intervals were also rather poor and tended to be untrustworthy across the
spectrum of conditions.
Table 2. Relative bias and confidence interval coverage for JMI and predictive mean matching methods.
15% Missing Condition 15% Missing Condition Variable Missing Condition Variable Missing Condition Complete Data
D. MCNEISH
30 Slope 92 5 96 5 92 34 95 34 87 ¡5 90 ¡5 76 97 82 97 92 2 95 2
X1 92 2 96 2 95 9 96 9 91 4 96 4 96 21 97 21 92 4 95 4
X1 £ Slope 92 1 95 1 97 ¡50 98 ¡50 91 ¡8 94 ¡8 97 ¡131 99 ¡131 91 ¡4 95 ¡4
X2 92 7 96 7 95 10 97 11 91 ¡3 97 ¡3 96 20 98 20 92 0 95 0
X2 £ Slope 91 ¡4 96 ¡4 96 ¡35 98 ¡35 92 47 94 47 99 ¡99 99 ¡99 91 4 95 4
50 Slope 94 3 96 3 93 26 94 26 92 3 94 3 73 88 77 88 94 2 96 2
X1 95 1 96 1 96 6 97 6 98 6 97 6 97 20 98 20 93 4 95 4
X1 £ Slope 95 ¡3 96 ¡3 98 ¡44 99 ¡44 95 ¡17 97 ¡17 98 ¡121 98 ¡121 94 ¡4 95 ¡4
X2 94 9 95 9 94 11 95 11 96 9 97 9 95 24 96 24 94 ¡4 95 ¡4
X2 £ Slope 95 ¡8 96 ¡8 97 ¡39 98 ¡39 95 ¡6 96 ¡6 98 ¡110 99 ¡110 93 8 95 8
100 Slope 94 3 95 3 93 23 94 23 95 2 96 2 72 71 73 71 94 2 95 2
X1 94 0 95 0 95 6 96 6 95 2 95 2 95 17 96 17 95 2 95 2
X1 £ Slope 96 ¡5 96 ¡5 97 ¡46 98 ¡46 95 ¡5 95 ¡5 96 ¡113 97 ¡113 95 ¡3 95 ¡3
X2 94 4 95 4 95 7 95 7 95 6 96 6 96 18 96 18 95 ¡6 95 ¡6
X2 £ Slope 94 ¡6 94 ¡6 97 36 97 36 96 ¡12 96 ¡12 97 ¡97 97 ¡97 94 4 96 4
Note. JMI D Joint multiple imputation, PMM D Predictive mean matching, CI D 95% interval coverage, RB D relative bias. Bold CI values exceed Bradley’s range; bold RB values exceed a magnitude of 10%.
THE JOURNAL OF EXPERIMENTAL EDUCATION 699
The common remedy for direct likelihood with missing covariates is to resort to the SEM framework
such that the covariates can be specified to be endogenous—meaning that the joint likelihood can be
used rather than the conditional likelihood (Horton & Kleinman, 2007). This option is not viable with
small samples, however, because REML itself is based on the conditional likelihood. However, in
growth models, covariates are often time invariant—meaning that if the covariate is observed at one
time point, it can be carried forward to all other observations because such covariates do not change
over time, by definition.
Nonetheless, there may be cases where time-varying covariates are missing, which makes the imple-
mentation of FIREML more difficult. In such cases, we recommend conducting the minimal amount
of data preprocessing necessary to get the data in a “FIREML-able” form. For example, we would rec-
ommend that only the time-varying covariates be treated with multiple imputation. Then during the
analysis phase of multiple imputation, the missing outcome values would be left missing such that
each model is fit using FIREML (with a Kenward-Roger correction) to deal with the missing outcomes.
Limitations
Seeing as the scope of the research question was quite narrow (small samples and missing data and
growth models), performance of the methods under question could be reasonably compared within a
small simulation study. Nonetheless, no simulation study can completely address any single question
and it is important to note the limitations of the current study. We included only one model type and
one missing-data mechanism. We selected four repeated measures because this number is commonly
seen in the literature, especially with smaller samples, because it is cost effective while still allowing
inspection of possible nonlinear trajectories (Curran et al., 2010; Vickers, 2003). Different behavior
could be observed for studies with more repeated measures. To keep the number of conditions reason-
able, we also generated data from multivariate normal distributions and generated missing data via a
MAR mechanism. We also only explored models that could be fit interchangeably in the MLM or SEM
frameworks. There are some types of models that are commonly used but that can only be fit in the
more general SEM framework (e.g., latent basis models or second-order growth models; Hancock,
Kuo, & Lawrence, 2001; Wu & Lang, 2016). The recommendations provided here are not applicable to
these types of growth models because these models cannot be presently estimated with REML.
Concluding remarks
Though the SEM and the MLM growth model frameworks are often considered to be interchangeable,
there are differences in the estimation routines with small samples, broadly defined as fewer than 100
people. Previous studies have confounded biases attributable to small sample sizes and to missing data
by not accounting for differential performance of different estimators in such contexts. Despite theoret-
ical considerations that direct likelihood should be more efficient than multiple imputation with a finite
number of imputations in small samples, evidence to the contrary has been reported. Here, we found
that much of the bias previously observed is related to the small-sample-size issues in the estimation
not in the accommodation of missing data. Using the lesser known FIREML, growth model data with
small samples and missing data are not as problematic as has been previously reported, and when
appropriate small-sample methods are used, direct likelihood was found to yield desirable properties
such that efficiency advantages over multiple imputation with small samples can be retained. Addition-
ally, if multiple imputation methods are used with smaller samples, using more appropriate small-sam-
ple estimation methods after imputation yielded results with more-desirable statistical properties.
Notes
1. This corresponds to the iterative generalized least squares algorithm (IGLS) for obtaining maximum likelihood esti-
mates. There are other algorithms to obtain the maximum likelihood estimates; another common method is the EM
algorithm.
700 D. MCNEISH
2. Asymptotically, treating the fixed effects as known in the calculation of the variance components is negligible
because sampling variance approaches zero as sample size approaches infinity and degrees of freedom are less
impactful at larger sample sizes.
3. For a full paper outlining the differences between REML and ML that does not rely on equations, readers are
referred to McNeish (2017b).
4. Cheung (2013) discusses an REML estimator for SEM in some conditions. This version of REML is based on equiva-
lent models and does not derive REML for SEM but instead transforms an SEM to an MLM and applies the MLM
REML equations. Though effective for some models, it is not fully generalizable.
5. Traditionally, direct likelihood and FIML are considered interchangeable terms for the same method (Allison, 2012).
However, FIML is technically one type of direct likelihood method. In this paper, we use “FIML” to specifically refer
to full information maximum likelihood and “direct likelihood” to refer to the broader class of such methods, which
includes but is not limited to FIML.
6. It seems contradictory that we set missing values to the lower 50% of the distribution but achieved 60% missingness.
This occurred by setting the entire lower 50% at Time 3 to be missing at Time 4 in addition to dropouts at Time 2
and Time 3 that were not in the lower 50% at Time 3.
References
Allison, P. D. (2012, April 23). Handling missing data by maximum likelihood (Keynote presentation at the SAS Global
Forum, Orlando, Florida). Retrieved from http://www.statisticalhorizons.com/wp-content/uploads/MissingData
ByML.pdf
Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schu-
macker (Eds.), Advanced structural equation modeling (pp. 243–277). Mahwah, NJ: Lawrence Erlbaum.
Azur, M. J., Stuart, E. A., Frangakis, C., & Leaf, P. J. (2011). Multiple imputation by chained equations: What is it and how
does it work? International Journal of Methods in Psychiatric Research, 20, 40–49.
Barnes, S. A., Lindborg, S. R., & Seaman, J. W. (2006). Multiple imputation techniques in small sample clinical trials. Sta-
tistics in Medicine, 25, 233–245.
Beunckens, C., Molenberghs, G., & Kenward, M. G. (2005). Direct likelihood analysis versus simple forms of imputation
for missing data in randomized clinical trials. Clinical Trials, 2, 379–386.
Bodner, T. E. (2008). What improves with increased missing data imputations? Structural Equation Modeling, 15, 651–675.
Bollen, K. A., & Curran, P. J. 2006. Latent curve models: A structural equation perspective. Hoboken, NJ: Wiley.
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144–152.
Browne, W. J., & Draper, D. (2006). A comparison of Bayesian and likelihood-based methods for fitting multilevel mod-
els. Bayesian Analysis, 1, 473–514.
Cheung, M. W. L. (2013). Implementing restricted maximum likelihood estimation in structural equation models. Struc-
tural Equation Modeling, 20, 157–167.
Curran, P. J. (2003). Have multilevel models been structural equation models all along? Multivariate Behavioral Research,
38, 529–569.
Curran, P. J., Obeidat, K., & Losardo, D. (2010). Twelve frequently asked questions about growth curve modeling. Journal
of Cognition and Development, 11, 121–136.
Delucchi, K., & Bostrom, A. (1999). Small sample longitudinal clinical trial with missing data: A comparison of analytic
methods. Psychological Methods, 4, 158–172.
Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford Press.
Enders, C. K. (2001). A primer on maximum likelihood algorithms available for use with missing data. Structural Equa-
tion Modeling, 8, 128–141.
Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for
missing data in structural equation models. Structural Equation Modeling, 8, 430–457.
Foulley, J. L., Jaffrezic, F., & Robert-Granie, C. (2000). EM-REML estimation of covariance parameters in Gaussian mixed
models for longitudinal data analysis. Genetics Selection Evolution, 32, 1.
Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clar-
ifications of multiple imputation theory. Prevention Science, 8, 206–213.
Graham, J. W., & Schafer, J. L. (1999). On the performance of multiple imputation for multivariate data with
small sample size. In R. Hoyle (Ed.), Statistical strategies for small sample research (pp. 1–29). Thousand Oaks,
CA: Sage.
Hancock, G. R., Kuo, W. L., & Lawrence, F. R. (2001). An illustration of second-order latent growth models. Structural
Equation Modeling, 8, 470–489.
Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling: An overview and a meta-
analysis. Sociological Methods and Research, 26, 329–367.
Horton, N. J., & Kleinman, K. P. (2007). Much ado about nothing. The American Statistician, 61, 79–90.
Horton, N. J., & Lipsitz, S. R. (2001). Multiple imputation in practice: Comparison of software packages for regression
models with missing variables. American Statistician, 55, 244–254.
THE JOURNAL OF EXPERIMENTAL EDUCATION 701
Ibrahim, J. G. (1990). Incomplete data in generalized linear models. Journal of the American Statistical Association, 85,
765–769.
Kenward, M. G., & Roger, J. H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Bio-
metrics, 53, 983–997.
Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963–974.
McNeish, D. (2017a). Missing data methods for arbitrary missingness with small samples. Journal of Applied Statistics, 44,
24–39.
McNeish, D. (2017b). Small sample methods for multilevel modeling: A colloquial elucidation of REML and the Ken-
ward-Roger correction. Multivariate Behavioral Research. Advance online publication. Retrieved from https://doi.org/
10.1080/00273171.2017.1344538
McNeish, D. (2016). Using data-dependent priors to mitigate small sample size bias in latent growth models: A discussion
and illustration using Mplus. Journal of Educational and Behavioral Statistics, 41, 27–56.
McNeish, D., & Harring, J. R. (2017). Correcting model fit criteria for small sample latent growth models with incomplete
data. Educational and Psychological Measurement. Advance online publication. doi:10.1177/0013164416661824.
Mehta, P. D., & West, S. G. (2000). Putting the individual back into individual growth curves. Psychological Methods, 5,
23–43.
Muthen, L. K., & Muthen, B. O. (2002). How to use a Monte Carlo study to decide on sample size and determine power.
Structural Equation Modeling, 9, 599–620.
Preacher, K. J., Wichman, A. L., MacCallum, R. C., & Briggs, N. E. (2008). Latent growth curve modeling. Thousand Oaks,
CA: Sage.
Roberts, B. W., Walton, K. E., & Viechtbauer, W. (2006). Patterns of mean-level change in personality traits across the life
course: A meta-analysis of longitudinal studies. Psychological Bulletin, 132, 1–25.
Schaalje, G. B., McBride, J. B., & Fellingham, G. W. (2002). Adequacy of approximations to distributions of test statistics
in complex mixed linear models. Journal of Agricultural, Biological, and Environmental Statistics, 7, 512–524.
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177.
Schafer, J. L., & Olsen, M. K. (1998). Multiple imputation for multivariate missing-data problems: A data analyst’s per-
spective. Multivariate Behavioral Research, 33, 545–571.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. London, UK: Chapman & Hill.
Seaman, S. R., & Hughes, R. A. (2016). Relative efficiency of joint-model and full-conditional-specification multiple
imputation when conditional models are compatible: The general location model. Statistical Methods in Medical
Research. Advance online publication. doi:10.1177/0962280216665872
Shah, A., Laird, N., & Schoenfeld, D. (1997). A random-effects model for multiple characteristics with possibly missing
data. Journal of the American Statistical Association, 92, 775–779.
Shin, T., Davison, M. L., & Long, J. D. (2016). Maximum likelihood versus multiple imputation for missing data in small
longitudinal samples with nonnormality. Psychological Methods. Advance online publication. doi:10.1037/met0000094
Shin, T., Davison, M. L., & Long, J. D. (2009). Effects of missing data methods in structural equation modeling with non-
normal longitudinal data. Structural Equation Modeling, 16, 70–98.
Van Buuren, S., Boshuizen, H. C., & Knook, D. L. (1999). Multiple imputation of missing blood pressure covariates in
survival analysis. Statistics in Medicine, 18, 681–694.
Vickers, A. J. (2003). How many repeated measures in repeated measures designs? Statistical issues for comparative trials.
BMC Medical Research Methodology, 3, 1.
Vink, G., Frank, L. E., Pannekoek, J., & Buuren, S. (2014). Predictive mean matching imputation of semicontinuous varia-
bles. Statistica Neerlandica, 68, 61–90.
Von Hippel, P. T. (2016). New confidence intervals and bias comparisons show that maximum likelihood can beat multi-
ple imputation in small samples. Structural Equation Modeling, 23, 422–437.
White, I. R., Royston, P., & Wood, A. M. (2011). Multiple imputation using chained equations: Issues and guidance for
practice. Statistics in Medicine, 30, 377–399.
White, I. R., & Carlin, J. B. (2010). Bias and efficiency of multiple imputation compared with complete-case analysis for
missing covariate values. Statistics in Medicine, 29, 2920–2931.
Wu, W., & Lang, K. M. (2016). Proportionality assumption in latent basis curve models: A cautionary note. Structural
Equation Modeling, 23, 140–154.
Wu, W., West, S. G., & Taylor, A. B. (2009). Evaluating model fit for growth curve models: Integration of fit indices from
SEM and MLM frameworks. Psychological Methods, 14, 183–201.
Copyright of Journal of Experimental Education is the property of Taylor & Francis Ltd and
its content may not be copied or emailed to multiple sites or posted to a listserv without the
copyright holder's express written permission. However, users may print, download, or email
articles for individual use.