Fit Indices in SEM
Fit Indices in SEM
Overview
In structural equation modeling, the fit indices establish whether, overall, the model is acceptable.
If the model is acceptable, researchers then establish whether specific paths are significant.
Acceptable fit indices do not imply the relationships are strong. Indeed, high fit indices are often
easier to obtain when the relationships between variables are low rather than high--because the
power to detect discrepancies from predictions are amplified.
Many of the fit indices are derived from the chi-square value. Conceptually, the chi-square value,
in this context, represents the difference between the observed covariance matrix and the
predicted or model covariance matrix.
The fit indices can be classified into several classes. These classes include:
Discrepancy functions, such as the chi square test, relative chi square, and RMS
Tests that compare the target model with the null model, such as the CFI, NFI, TFI, and IFI
Information theory goodness of fit measures, such as the AIC, BCC, BIC, and CAIC
Many researchers, such as Marsh, Balla, and Hau (1996), recommend that individuals utilize a
range of fit indices. Indeed, Jaccard and Wan (1996) recommend using indices from different
classes as well& this strategy overcomes the limitations of each index.
The Normed Fit Index (NFI) exceeds .90 (Byrne, 1994) or .95 (Schumacker & Lomax, 2004)
RMS is less than .08 (Browne & Cudeck, 1993)--and ideally less than .05 (Stieger, 1990).
Alternatively, the upper confidence interval of the RMS should not exceed .08 (Hu &
Bentler, 1998)
The relative chi-square should be less than 2 or 3 (Kline, 1998& Ullman, 2001).
These criteria are merely guidelines. To illustrate, in a field in which previous models generate CFI
values of .70 only, a CFI value of .85 represents progress and thus should be acceptable (Bollen,
1989).
Discrepancy functions
Chi-square
The chi-square for the model is also called the discrepancy function, likelihood ratio chi-square, or
chi-square goodness of fit. In AMOS, the chi-square value is called CMIN.
If the chi-square is not significant, the model is regarded as acceptable. That is, the observed
covariance matrix is similar to the predicted covariance matrix--that is, the matrix predicted by the
model.
Complex models, with many parameters, will tend to generate an acceptable fit
If the sample size is large, the model will usually be rejected, sometimes unfairly
When the assumption of multivariate normality is violated, the chi-square fit index is
inaccurate. The Satorra-Bentler scaled chi-square, which is available in EQS, is often
preferred, because this index penalizes the chi-square for kurtosis.
Relative chi-square
The relative chi-square is also called the normed chi-square. This value equals the chi-square index
divided by the degrees of freedom. This index might be less sensitive to sample size. The criterion
for acceptance varies across researchers, ranging from less than 2 (Ullman, 2001) to less than 5
(Schumacker & Lomax, 2004).
The RMS, also called the RMR or RMSE, represents the square root of the average or mean of the
covariance residuals--the differences between corresponding elements of the observed and
predicted covariance matrix. Zero represents a perfect fit, but the maximum is unlimited.
Because the maximum is unbounded, the RMS is difficult to interpret and consensus has not been
reached on the levels that represent acceptable models. Some researchers utilized the
standardized version of the RMS instead to override this problem.
According to some researchers, RMS should be less than .08 (Browne & Cudeck, 1993)--and ideally
less than .05 (Stieger, 1990). Alternatively, the upper confidence interval of the RMS should not
exceed .08 (Hu & Bentler, 1998).
The comparative fit index, like the IFI, NFI, BBI, TLI, and RFI, compare the model of interest with
some alternative, such as the null or independence model. The CFI is also known as the Bentler
Comparative Fit Index.
Specifically, the CFI compares the fit of a target model to the fit of an independent model--a model
in which the variables are assumed to be uncorrelated. In this context, fit refers to the difference
between the observed and predicted covariance matrices, as represented by the chi-square index.
In short, the CFI represents the ratio between the discrepancy of this target model to the
discrepancy of the independence model. Roughly, the CFI thus represents the extent to which the
model of interest is better than is the independence model. Values that approach 1 indicate
acceptable fit.
CFI is not too sensitive to sample size (Fan, Thompson, and Wang, 1999). However, CFI is not
effective if most of the correlations between variables approach 0--because there is, therefore,
less covariance to explain. Furthermore, Raykov (2000, 2005) argues that CFI is a biased measure,
based on non-centrality.
The incremental fit index, also known as Bollen's IFI, is also relatively insensitive to sample size.
Values that exceed .90 are regarded as acceptable, although this index can exceed 1.
To compute the IFI, first the difference between the chi square of the independence model--in
which variables are uncorrelated--and the chi-square of the target model is calculated. Next, the
difference between the chi-square of the target model and the df for the target model is
calculated. The ratio of these values represents the IFI.
The NFI is also known as the Bentler-Bonett normed fit index. The fit index varies from 0 to 1--
where 1 is ideal. The NFI equals the difference between the chi-square of the null model and the
chi square of target model, divided by the chi-square of the null model. In other words, an NFI of .
90, for example, indicates the model of interest improves the fit by 90% relative to the null or
independence model.
When the samples are small, the fit is often underestimated (Ullman, 2001). Furthermore, in
contrast to the TLI, the fit can be overestimated if the number of parameters is increased& the
NNFI overcomes this problem.
The TLI, sometimes called the NNFI, is similar to the NFI. However, the index is lower, and hence
the model is regarded as less acceptable, if the model is complex. To compute the TLI:
First divide the chi square for the target model and the null model by their corresponding
df vales--which generates relative chi squares for each model.
Finally, divide this difference by the relative chi square for the null model minus 1.
According to Marsh, Balla, and McDonald (1988), the TFL is relatively independent of sample size.
The TFI is usually lower than is the GFI--but values over .90 or over .95 are considered acceptable
(e.g., Hu & Bentler, 1999).
The AIC, like the BIC, BCC, and CAIC, is regarded as an information theory goodness of fit
measure--applicable when maximum likelihood estimation is used (Burnham & Anderson, 1998).
These indices are used to compare different models. The models that generate the lowest values
are optimal. The absolute AIC value is irrelevant--although values closer to 0 are ideal& only the
AIC value of one model relative to the AIC value of another model is meaningful.
Like the chi square index, the AIC also reflects the extent to which the observed and predicted
covariance matrices differ from each other. However, unlike the chi square index, the AIC
penalizes models that are too complex. In particular, the AIC equals the chi square divided by n
plus 2k / (n-1). In this formula, k = .5v/v + 1 - df, where v is the number of variables and n = the
sample size.
The BCC is similar to the AIC. That is, the BCC and AIC both represent the extent to which the
observed covariance matrix differs from the predicted covariance matrix--like the chi square
statistic--but include a penalty if the model is complex, with many parameters. The BCC bestows
an even harsher penalty than does the AIC.
The BCC equals the chi square divided by n plus 2k / (n- v - 2). In this formula, k = .5v/v + 1 - df,
where v is the number of variables and n = the sample size.
The CAIC is similar to the AIC as well. However, the CAIC also confers a penalty if the sample size is
small.
The Bayesian Information Criterion is also known as Akaike's Bayesian Information Criterion (ABIC)
and the Schwarz Bayesian Criterion (SBC). This index is similar to the AIC, but the penalty against
complex models is especially pronounced--even more pronounced than is the BCC and CAIC
indices. Furthermore, like the CAIC, a penalty against small samples is include.
BIC was derived by Raftery (1995). Roughly, the BIC is the log of a Bayes factor of the target model
compared to the saturated model.
Many other indices have also been developed. These indices include the GFI, AGFI, FMIN,
noncentrality parameter, and centrality index. The GFI and, to a lesser extent, the FMIN used to be
very popular, but their use has dwindled recently.
Some indices are especially sensitive to sample size. For example, fit indices overestimate the fit
when the sample size is small--below 200, for example. Nevertheless, RMSEA and CFI seem to be
less sensitive to sample size (Fan, Thompson, and Wang, 1999).
References
Anderson, J. C., & Gerbing, D. W. (1984). The effect of sampling error on convergence, improper
solutions and goodness-of-fit indices for maximum likelihood confirmatory factor
analysis.Psychometrika, 49, 155-173.
Bentler, P M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238-
246.
Bentler, P. M., & Bonett, D. G. (1980). Significant tests and goodness of fit in the analysis of
covariance structures. Psychological Bulletin, 88, 588-606.
Bentler, P. M., & Mooijaart, A. (1989). Choice of structural model via parsimony: A rationale based
on precision. Psychological Bulletin, 106,315-317.
Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for covariance
structures. Multivariate Behavioral Research, 24, 445-455.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S.
Long (Eds.), Testing structural equation models (pp. 136-162). Newsbury Park, CA: Sage.
Burnham, K, P., and D. R. Anderson (1998). Model selection and inference: A practical information-
theoretic approach. New York: Springer-Verlag.
Byrne, B. M. (1994). Structural equation modeling with EQS and EQS/Windows. Thousand Oaks, CA:
Sage Publications.
Cheung, G. W. & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement
invariance. Structural Equation Modeling, 9, 233-255.
Fan, X., B. Thompson, and L. Wang (1999). Effects of sample size, estimation method, and model
specification on structural equation modeling fit indexes. Structural Equation Modeling, 6, 56-83.
Hipp J. R., & Bollen K. A. (2003). Model fit in structural equation models with censored, ordinal, and
dichotomous variables: testing vanishing tetrads. Sociological Methodology, 33, 267-305.
Hu, L. T., & Bentler, P. M. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural equation
modeling: Concepts, issues, and applications (pp. 76-99). Thousand Oaks, CA: Sage.
Kline, R. B. (1998). Principles and practice of structural equation modeling. NY: Guilford Press.
Jaccard, J., & Wan, C. K. (1996). LISREL approaches to interaction effects in multiple regression.
Thousand Oaks, CA: Sage Publications.
Joreskog, K. G. (1993). Testing structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing
structural equation models (pp. 294-316). Newbury, CA: Sage.
Marsh, H. W., Balla, J. R., & Hau, K. T. (1996). An evaluation of incremental fit indexes: A clarification
of mathematical and empirical properties. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced
structural equation modeling techniques(pp.315-353 . Mahwah , NJ : Lawrence Erlbaum.
Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness-of-fit indexes in confirmatory factor
analysis: The effect of sample size. Psychological Bulletin, 103, 391-410.
Marsh, H. W., & Hau, K. T. (1996). Assessing goodness of fit: Is parsimony always desirable?
, 64, 364-390.
Raftery, A. E. (1995). Bayesian model selection in social research. In Adrian E. Raftery (Ed.) (pp. 111-
164). Oxford: Blackwell.
Raykov, T. (2000). On the large-sample bias, variance, and mean squared error of the conventional
noncentrality parameter estimator of covariance structure models. Structural Equation Modeling, 7,
431-441.
Raykov, T. (2005). Bias-corrected estimation of noncentrality parameters of covariance structure
models. Structural Equation Modeling, 12, 120-129.
Schumacker, R. E., & Lomax, R. G. (2004). A beginner's guide to structural equation modeling,
Second edition. Mahwah, NJ: Lawrence Erlbaum Associates.
Steiger J. H. (2000). Point estimation, hypothesis testing and interval estimation using the RMSEA:
Some comments and a reply to Hayduk and Glaser. Structural Equation Modeling, 7, 149-162.
Tucker, L. R., & Lewis, C. (1973). The reliability coefficient for maximum likelihood factor
analysis. Psychometrika, 38, 1-10.
Ullman, J. B. (2001). Structural equation modeling. In B. G. Tabachnick & L. S. Fidell (2001). Using
Multivariate Statistics (4th ed& pp 653- 771). Needham Heights, MA: Allyn & Bacon.