0% found this document useful (0 votes)

246 views21 pages

Latent Class Cluster Analysis - Vermunt, Magidson

1. Latent class cluster analysis is a statistical technique for clustering objects into groups based on their characteristics. It assumes the data comes from a mixture of underlying probability distributions. 2. The document discusses latent class cluster analysis models for continuous variables, where variables within each cluster are assumed to have a multivariate normal distribution. More restrictive models constrain the variance-covariance matrices to be diagonal or equal across clusters. 3. Latent class cluster analysis yields probabilistic clustering, assigning objects posterior probabilities of membership in each cluster rather than definite assignments, unlike other clustering methods. It allows classifying new objects based on estimated model parameters.

Uploaded by

Nico Prokop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

246 views21 pages

Latent Class Cluster Analysis - Vermunt, Magidson

Uploaded by

Nico Prokop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

1

LATENT CLASS CLUSTER ANALYSIS

Jeroen K. Vermunt Tilburg University Jay Magidson Statistical Innovations Inc.

INTRODUCTION
Kaufman and Rousseeuw (1990) dene cluster analysis as the classication of similar objects into groups, where the number of groups, as well as their forms are unknown. The form of a group refers to the parameters of cluster; that is, to its cluster-specic means, variances, and covariances that also have a geometrical interpretation. A similar denition is given by Everitt (1993) who speaks about deriving a useful division into a number of classes, where both the number of classes and the properties of the classes are to be determined. These could also be denitions of exploratory LC analysis, in which objects are assumed to belong to one of a set of K latent classes, with the number of classes and their sizes not known a priori. In addition, objects belonging to the same class are similar with respect to the observed variables in the sense that their observed scores are assumed to come from the same probability distributions, whose parameters are, however, unknown quantities to be estimated. Because of the similarity between cluster and exploratory LC analysis, it is not surprising that the latter method is becoming a more and more popular clustering tool. In this paper, we want to describe the state-of-art in the eld of LC cluster analysis. Most of the work in this eld involves continuous indicators assuming (restricted) multivariate normal distributions within classes. Although authors seldom refer to the work of Gibson (1959) and Lazarsfeld and Henry (1968), actually they are using what these authors called latent prole analysis: that is, latent structure models with a single categorical latent variable and a set of continuous indicators. Wolfe (1970) was the rst one who made an explicit connection between LC and cluster analysis. The last decade there was a renewed interest in the application of LC analysis as a cluster analysis method. Labels that are used to describe such a use of LC analysis are: mixture likelihood approach to clustering (McLachlan and Basford 1988; Everitt 1993), model-based clustering (Baneld and Raftery 1993; Bensmail et. al. 1997; Fraley and Raftery 1998a, 1998b), mixture-model clustering (Jorgensen and Hunt 1996; McLachlan et al. 1999), Bayesian classication (Cheeseman and Stutz 1995), unsupervised learning (McLachlan and Peel 1996), and latent class cluster analysis (Vermunt and Magidson 2000). Probably the most important reason of the increased popularity of LC analysis as a statistical tool for cluster analysis is the fact that nowadays high-speed computers make these computationally intensive methods practically applicable. Several software packages are available for the estimation of LC cluster models. An important dierence between standard cluster analysis techniques and LC clustering is that the latter is a model-based clustering approach. This means that a statistical model is postulated for the population from which the sample under study is coming. More precisely,

2 it is assumed that the data is generated by a mixture of underlying probability distributions. When using the maximum likelihood method for parameter estimation, the clustering problem involves maximizing a log-likelihood function. This is similar to standard non-hierarchical cluster techniques in which the allocation of objects to clusters should be optimal according some criterion. These criteria typically involve minimizing the within-cluster variation and/or maximizing the between-cluster variation. An advantage of using a statistical model is, however, that the choice of the cluster criterion is less arbitrary. Nevertheless, the log-likelihood functions corresponding to LC cluster models may be similar to the criteria used by certain non-hierarchical cluster techniques like k-means. LC clustering is very exible in the sense that both simple and complicated distributional forms can be used for the observed variables within clusters. As in any statistical model, restrictions can be imposed on the parameters to obtain more parsimony and formal tests can be used to check their validity. Another advantage of the model-based clustering approach is that no decisions have to be made about the scaling of the observed variables: for instance, when working with normal distributions with unknown variances, the results will be the same irrespective of whether the variables are normalized or not. This is very dierent from standard non-hierarchical cluster methods, where scaling is always an issue. Other advantages are that it is relatively easy to deal with variables of mixed measurement levels (dierent scale types) and that there are more formal criteria to make decisions about the number of clusters and other model features. LC analysis yields a probabilistic clustering approach. This means that although each object is assumed to belong to one class or cluster, it is taken into account that there is uncertainty about an objects class membership. This makes LC clustering conceptually similar to fuzzy clustering techniques. An important dierence between these two approaches is, however, that in fuzzy clustering an objects grades of membership are the parameters to be estimated (Kaufman and Rousseeuw 1990) while in LC clustering an individuals posterior class-membership probabilities are computed from the estimated model parameters and his observed scores. This makes it possible to classify other objects belonging to the population from which the sample is taken, which is not possible with standard fuzzy cluster techniques. The remainder of this paper is organized as follows. The next section discusses the LC cluster model for continuous variables. Subsequently, attention is paid to models for sets of indicators of dierent measurement levels, also known as mixed-mode data. Then we explain how to include covariates in a LC cluster model. After discussing estimation and testing, two empirical examples are presented. The paper ends with a short discussion. An appendix describes computer programs that implement the various kinds of LC clustering methods presented in this paper.

CONTINUOUS INDICATOR VARIABLES

The basic LC cluster model has the form
K

f (yi |) =
k=1

k fk (yi |k ) .

Here, yi denotes an objects scores on a set of observed variables, K is the number of clusters, and k denotes the prior probability of belonging to latent class or cluster k or, equivalently, the size of cluster k. Alternative labels for the ys are indicators, dependent variables, outcome

3 variables, outputs, endogenous variables, or items. As can be seen, the distribution of yi given the model parameters , f (yi |), is assumed to be a mixture of classes-specic densities, fk (yi |k ). Most of the work on LC cluster analysis has been done for continuous variables. Generally, these continuous variables are assumed to be normally distributed within latent classes, possibly after applying an appropriate non-linear transformation (Lazarsfeld and Henry 1968; Baseld and Raftery 1993; McLachlan 1988; McLachlan et. al. 1999; Cheeseman and Stutz 1995). Alternatives for the normal distribution are student, Gompertz, or gamma distributions (see, for instance, McLachlan et. al. 1999). The most general Gaussian distribution of which all restricted versions discussed below are special cases is the multivariate normal model with parameters k and k . If no further restrictions are imposed, the LC clustering problem involves estimating a separate set of means, variances, and covariances for each latent class. In most applications, the main objective is nding classes that dier with respect to their means or locations. The fact that the model allows classes to have dierent variances implies that classes may also dier with respect to the homogeneity of the responses to the observed variables. In standard LC models with categorical variables, it is generally assumed that the observed variables are mutually independent within clusters. This is, however, not necessary here. The fact that each class has its own set of covariances means that the y variables may be correlated with clusters, as well as that these correlations may be cluster specic. So, the clusters do not only dier with respect to their means and variances, but also with respect to the correlations between the observed variables. It will be clear that as the number of indicators and/or the number of latent classes increases, the number of parameters to be estimated increases rapidly, especially the number of free parameters in the variance-covariance matrices, k . Therefore, it is not surprising that restrictions which are imposed to obtain more parsimony and stability typically involve constraining the class-specic variance-covariance matrices. An important constraint model is the local independence model obtained by assuming that all within-cluster covariances are equal to zero or, equivalently, by assuming that the variancecovariance matrices, k , are diagonal matrices. Models that are less restrictive than the local independence model can be obtained by xing some but not all covariances to zero or, equivalently, by assuming certain pairs of ys to be mutually dependent within latent classes. Another interesting type of constraint is the equality or homogeneity of variance-covariance matrices across latent classes, i.e., k = . Such a homogeneous or class-independent error structure yields clusters having the same forms but dierent locations. Note that these kinds of equality constraints can be applied in combination with any structure for . Baneld and Raftery (1993) proposed reparameterizing the class-specic variance-covariance matrices by an eigenvalue decomposition:
T k = k Dk A k Dk .

The parameter k is a scalar, Dk is a matrix with eigenvectors, and Ak is a diagonal matrix whose elements are proportional to the eigenvalues of k . More precisely, k = |k |1/d , where d is the number of observed variables, and Ak is scaled such that |Ak | = 1. A nice feature of the above decomposition is that each of the three sets of parameters has a geometrical interpretation: k indicates what can be called the volume of cluster k, Dk its orientation, and Ak its shape. If we think of a cluster as a clutter of points in a multidimensional space, the volume is the size of the clutter, while the orientation and shape parameters indicate

4 whether the clutter is spherical or ellipsoidal. Thus, restrictions imposed on these matrices can directly be interpreted in terms of the geometrical form of the clusters. Typically, matrices are assumed to be class-independent and/or simpler structures (diagonal or identity) are used for certain matrices. See Bensmail et al. (1997) and Fraley and Raftery (1998b) for overviews of the many possible specications. Rather than by a restricted eigenvalue decomposition, the structure of the k matrices can also be simplied by means of a covariance-structure model. Several authors have proposed using latent class models for dealing with unobserved heterogeneity in covariance-structure analysis (Arminger and Stein 1997; Dolan and Van der Maas 1997; Jedidi et. al. 1997). The same methodology can be used to restrict the error structure in LC cluster analysis with continuous indicators. An interesting structure for k , that is related to the eigenvalue decomposition described above, is a factor analytic model (Yung 1997; McLachlan and Peel 1998); that is, k = k k k + Uk . (1)

Here, k is a matrix with factor loadings, k is the variance-covariance matrix of the factors, and Uk is a diagonal matrix with unique variances. Restricted versions can be obtained by limiting the number of factors (for instance, to one) and/or xing some factor loading to zero. Such specications make it possible to describe the correlations between the y variables within clusters or, equivalently, the structure of local dependencies, by means of a small number of parameters.

MIXED INDICATOR VARIABLES

In the previous section, we concentrated on LC cluster models for continuous indicators assuming a (restricted) multivariate normal distribution for yi within each of the classes. Often we are, however, confronted with other types of indicators, like nominal or ordinal variables or counts. LC cluster models for nominal and ordinal variables assuming (restricted) multinomial distributions for the items are equivalent to standard exploratory LC models (Goodman 1974; Clogg 1981, 1995). Bckenholt (1993) and Wedel et. al. (1993) proposed LC models for Poisson o counts. Using the general structure of the LC model, it is straightforward to specify cluster models for sets of indicators of dierent scale types or, as Everitt (1988, 1993) called it, for mixed-mode data (see also Lawrence and Krzanowski 1996; Jorgensen and Hunt 1996; and Vermunt and Magidson 2000: 147-152). Assuming local independence, the LC cluster model for mixed ys is of the form
K J

f (yi |) =
k=1

k
j=1

fk (yij |jk ) ,

(2)

where J denotes the total number of indicators and j a particular indicator. Rather than specifying the joint distribution of yi given class membership using a single multivariate distribution, we now have to specify the appropriate univariate distribution function for each element yij of yi . Possible choices for continuous yij are univariate normal, student, gamma, and log-normal distributions. A natural choice for discrete nominal or ordinal variables is the (restricted) multinomial distribution. Suitable distributions for counts are, for instance, Poisson, binomial, or negative binomial.

5 In the above specication, we assumed that the ys are conditional independent within latent classes. This assumption can easily be relaxed by using the appropriate multivariate rather than univariate distributions for sets of locally dependent y variables. It is not necessary to present a separate formula for this situation. We can just think of the index j in equation (2) to denote a set of indicators rather than a single indicator. For sets of continuous variables, we can again work with a multivariate normal distribution. A set of nominal/ordinal variables can combined into a (restricted) joint multinomial distribution. Correlated counts could be modeled with a multivariate Poisson model. More dicult is the specication of the mixed multivariate distributions. Krzanowski (1983) described two possible ways of modeling the relationship between a nominal/ordinal and a continuous y: via a conditional Gaussian or via a conditional multinomial distribution, which means either using the categorical variable as a covariate in the normal model or the continuous one as a covariate in the multinomial model. Lawrence and Krzanowski (1996) and Hunt and Jorgensen (1999) used the conditional Gaussian distribution in LC clustering with combinations of categorical and continuous variables. Local dependencies with a Poisson variable could be dealt with in the same way, i.e., by allowing its mean to dependent on the relevant continuous or categorical variable(s). The possibility to include local dependencies between indicators is very important when using LC analysis as a clustering tool. First, it prevents that one ends with a solution that contains too many clusters. Often, a simpler solution with less clusters is obtained by including a few direct eects between y variables. It should be stressed that there is also a risk of allowing for within-cluster associations: direct eects may hide relevant clusters. A second reason for relaxing the local independence assumption is that it may yield a better classication of objects into clusters. Saying that two variables are locally dependent is conceptually the same as saying that they contain some overlapping information that should not be used when determining to which class an object belongs. Consequently, if we omit a signicant bivariate dependency from a LC cluster model, the corresponding locally dependent indicators get a too high weight in the classication formula (see equation (3)) compared to the other indicators.

COVARIATES
The LC cluster modeling approach described above is quite general: It deals with mixed-mode data and it allows for many dierent specication of the (correlated) error structure. An important extension of this model is the inclusion of covariates to predict class membership. Conceptually, it makes very much sense to distinguish (endogenous) variables that serve as indicators of the latent variable from (exogenous) variables that are used to predict to which cluster an object belongs. This idea is, in fact, the same as in Cloggs (1981) LCM with external variables. Note that in certain situations we may want to use the latent cluster variable as a predictor of an observed response variable rather than as a dependent variable. For such situations, we do not need special arrangements like the ones needed with covariates. A model in which the cluster variable serves as predictor can be obtained by using the response variable as one of the y variables.

6 Using the same basic structure as in equation (2), this yields the following LC cluster model:
K J

f (yi |zi , ) =
k=1

k|zi
j=1

fk (yij |jk ) .

Here, zi denotes object is covariate values. Alternative terms for the zs are concomitant variables, grouping variables, external variables, exogenous variables, and inputs. To reduce the number of parameters, the probability of belonging to class k given covariate values zi , k|zi , will generally be restricted by a multinomial logit model; that is, a logit model with linear eects and no higher order interactions. An even more general specication is obtained by allowing covariates to have direct eects on the indicators, which yields
K J

f (yi |zi , ) =
k=1

k|zi
j=1

fk (yij |zi , jk ) .

The conditional mean of the y variables can now be directly related to the covariates. This makes it possible to relax the implicit assumption in the previous specication that the inuence of the zs on the ys goes completely via the latent variable. For an example, see Vermunt and Magidson (2000: 155). The possibility to have direct eects of zs on ys can also be used to specify direct eects between indicators of dierent scale types by means of a simple trick: one of the two variables involved should be used both as covariate (not inuencing class membership) and as indicator. We will use this trick below in our second example.

ESTIMATION
The two main methods to estimate the parameters of the various types of LC cluster models are maximum likelihood (ML) and maximum posterior (MAP). Wallace and Dowe (forthcoming) proposed a minimum message length (MML) estimator, which in most situations is similar of MAP. The log-likelihood function required in ML and MAP approaches can be derived from the probability density function dening the model. Bayesian MAP estimation involves maximizing the log-posterior distribution, which is the sum of the log-likelihood function and the logs of the priors for the parameters. Although generally there is not much dierence between ML and MAP estimates, an important advantage of the latter method is that it prevents the occurrence of boundary or terminal solutions: probabilities and variances cannot become zero. With a very small amount of prior information, the parameter estimates are forced to stay within the interior of the parameter space. Typical priors are Dirichlet priors for multinomial probabilities and inverted-Wishart priors for the variance-covariance matrices in multivariate normal models. For more details on these priors see Vermunt and Magidson (2000: 164-165) Most software packages, use the EM algorithm or some modication of it to nd the ML or MAP estimates. In our opinion, the ideal algorithm is starting with a number of EM iterations and when close enough to the nal solution, switching to Newton-Raphson. This is a way to combine the advantages of both algorithms, that is, the stability of EM even when far away from the optimum and the speed of Newton-Raphson when close to the optimum.

7 A well-known problem in LC analysis is the occurrence of local solutions. The best way to prevent ending with a local solution is to use multiple sets of starting values. Some computer programs for LC clustering have automated the search for good starting values using several sets of random starting values, as well as solutions obtained with other cluster methods. In the application of LC analysis to clustering, we are not only interested in the estimation of the model parameters. Another important estimation problem is classication of objects into clusters. This can be based on the posterior class membership probabilities k|yi ,zi , = k|zi j fk (yij |zi , jk ) . k k|zi j fk (yij |zi , jk ) (3)

The standard classication method is modal allocation, which amounts to assigning each object to the class with the highest posterior probability.

MODEL SELECTION
The model selection issue is one of the main research topics in LC clustering. Actually, there are two issues: the rst one concerns the decision about the number of clusters, the second one concerns the form of the model given the number of clusters. For an overview on this topic see Celeux et. al. (1997). Assumptions with respect to the forms of the clusters given their number can be tested using standard likelihood-ratio tests between nested models, for instance, between a model with an unrestricted covariance matrix and a model with a restricted covariance matrix. Wald tests and Lagrange multiplier tests can be used to assess the signicance of certain included or excluded terms, respectively. It is well-known that these kinds of chi-squared tests cannot be used to determine the number of clusters. The most popular set of model selection tools in LC cluster analysis are information criteria like AIC, BIC, and CAIC (Fraley and Raftery 1998b). The most recent development is the use of computationally intensive techniques like parametric bootstrapping (McLachlan, et. al. 1999) and Markov Chain Monte Carlo methods (Bensmail et. al. 1997) to determine the number of clusters and their forms. Cheeseman and Stutz (1995) proposed a fully automated model selection method using approximate Bayes factors (dierent from BIC). Another set of methods for evaluating LC cluster models is based on the uncertainty of classication or, equivalently, the separation of the clusters. Besides the estimated total number of misclassications, Goodman-Kruskal lambda, Goodman-Kruskal tau, or entropy based measures can be used to indicate how well the indicators predict class membership. Celeux et. al. (1997) described various indices that combine information on model t and information on classication errors; two of them are the classication likelihood (C) and the approximate weight of evidence (AWE).

TWO EMPIRICAL EXAMPLES

Below LC cluster modeling is illustrated by means of two empirical examples. The analyses are performed with the LCA program Latent GOLD (Vermunt and Magidson, 2000), which implements both ML and MAP estimation with Dirichlet and inverted-Wishart priors for multinomial probabilities and error variance-covariance matrices, respectively. A feature of the program that

8 was extensively used in the analyses described below is the possibility to add local dependencies using information on bivariate residuals. Model selection was based on BIC, where it should be noted that the BIC we use is computed using the log-likelihood value and the number of parameters rather than using the L2 value and the number of degrees of freedom.

Diabetes data
The rst empirical example concerns a three-dimensional data set involving 145 observations used for diabetes diagnosis (Reaven and Miller 1979). The three continuous variables are labeled glucose (y1 ), insuline (y2 ), and sspg (y3 ). The data set also contains information on the clinical classication in three groups (normal, chemical diabetes, and overt diabetes), which makes it possible to compare the clinical classication with the classication obtained from the cluster model. The substantive question of interest is whether the three indirect diagnostic measures yield a reliable diagnosis; that is, whether they yield a classication that is close to the clinical classication. This data set comes with the MCLUST program and is also used by Fraley and Raftery (1998a, 1998b) to illustrate their model-based cluster analysis based on the eigenvalue decomposition described in equation (1). The nal model they selected on the basis of the BIC criterion was the unrestricted three-class model, which means that none of the restrictions that can be specied with their approach holds for this data set. We used six dierent specications for the variance-covariance matrices: class-dependent and class-independent unrestricted, class-dependent and class-independent diagonal, as well as class-dependent and class-independent with only the y1 -y2 error covariance free. With unrestricted we that all covariances are free and with diagonal that all covariances are assumed to be zero. The models with only the y1 -y2 error covariance free were used because the bivariate residuals of both diagonal models indicated that there was only a local dependency between these two variables. Moreover, the results from the unrestricted models indicated that the y1 -y3 and y2 -y3 covariances did not dier signicantly from zero. [INSERT TABLE 1 ABOUT HERE] Table 1 reports the BIC values for the estimated one to ve class models. The 3-class model that only includes the error covariance between y1 and y2 and with class-dependent variances and covariances has the lowest BIC value. Its BIC value is slightly lower than of the classdependent unrestricted three-class model, Fraley and Rafterys nal model for this data set. The BIC values in table 1 show clearly that models with too restrictive error structures for a particular data set overestimate the number of clusters. Here, this applies to the models with class-independent error variances and the class-dependent diagonal model. Therefore, it is important to be able to work with dierent types of error structures. Note that the most restrictive model that we used the model with class-independent diagonal error structure can be seen as a probabilistic variant of k-means cluster analysis (McLachlan and Basford 1988). [INSERT TABLE 2 ABOUT HERE] Table 2 reports the parameters estimates for the three-class model with class-dependent variance-covariance matrices and with only a local dependence between y1 and y2 . These parameters are the cluster sizes (k ), the cluster-specic means (jk ), the cluster-specic variances

9
2 (jk ), as well as the cluster-specic covariance between y1 and y2 (12k ). The overt diabetes group (cluster 3), has much higher means on glucose and insuline and a much lower mean on sspg than the normal group (cluster 1). The chemical diabetes group (cluster 2) has somewhat lower means on glucose and insuline and a much lower mean on sspg than the normal group. The reported error variances show that the overt diabetes cluster is much more heterogeneous with respect to glucose and insuline and much more homogeneous with respect to sspg than the normal cluster. The chemical diabetes group is the most homogeneous cluster on all three measures. The error covariances are somewhat easier to interpret if we transform them to correlations. Their values are .69, .21, and .93 for cluster 1, 2 and 3, respectively. This indicates that in the overt diabetes group there is a very strong association between glucose and insuline, while in the chemical diabetes group this association is very low, and even not signicantly dierent from zero (12k /SE12k = 1.60). Note that the within-cluster correlation of .93 is very high, which indicates that, in fact, the two measures are equivalent in cluster 3.

[INSERT TABLE 3 ABOUT HERE] Not only the BIC of our nal model is somewhat better than Fraley and Rafterys, also our classication is more in agreement with the clinical classication: our model misclassies 13.1 percent of the patients while the unrestricted models misclassies 14.5 percent. Table 3 reports the cross-tabulation of the clinical and the LC cluster classication based on the posterior classmembership probabilities. As can be seen, some normal patients are classied as cases with chemical diabetes and vice versa. The other type of error is that some overt diabetes cases are classied as normal.

Prostate cancer data

Our second example concerns the analysis of a mixed-mode data set with pre-trial covariates from a prostate cancer clinical trial. Jorgensen and Hunt (1996) and Hunt and Jorgensen (1999) used this data set containing information on 506 patients to illustrate the use of the LC cluster model implemented in their MULTIMIX program. The eight continuous indicators are age (y1 ), weight index (y2 ), systolic blood pressure (y5 ), diastolic blood pressure (y6 ), serum haemoglobin (y8 ), size of primary tumor (y9 ), index of tumor stage and histolic grade (y10 ), and serum prostatic acid phosphatase (y11 ). The four categorical observed variables are performance rating (y3 ; 4 levels), cardiovascular disease history (y4 ; 2 levels), electrocardiogram code (y7 , 7 levels), and bone metastases (y12 , 2 levels). The research question of interest is whether on the basis of these pre-trial covariates it is possible to identify subgroups that dier with respect to the likelihood of success of the medical treatment of prostate cancer. The categorical variables are treated as nominal and for the continuous variables we assumed normal distributions with class-specic variances. We estimated models from one to four latent classes. The rst model for each number of classes assumes local independence. The other four specications are obtained by subsequently adding the direct relationships between y5 and y6 , y2 and y8 , y8 and y12 , and y11 and y12 . This exploratory improvement of the model t was guided by Latent GOLDs bivariate residuals information, as well as the results reported by Hunt and Jorgensen (1999). To give an indication about the computation time needed for these kinds of models: all two-class models took less than 5 seconds to converge and all four class models less than 20 seconds on a Pentium II 350 Mhz. Note that here we have a data set with almost 500 cases

10 and 12 indicators. The estimation time increases linearly with the number of cases and, as long as we do not include too many local dependencies, also almost linearly with the number of indicators. [INSERT TABLE 4 ABOUT HERE] Table 4 presents the BIC values for the estimated models. As can be seen, the two-class model that includes all four direct relationships has the lowest BIC. Comparison of the various models given a certain number of classes shows that inclusion of the direct relationship between y5 and y6 (the two blood pressure measures) improves the t in all situations. The other bivariate terms improve the t in the one-, two-, and three-class models, but not in the fourclass model. If we compare the models with dierent number of classes for a given error structure, the four-class model performs best when assuming local independence, the threeclass model when including the y5 and y6 covariance, and the two-class model when including additional bivariate terms. Thus, if we are willing to include the y5 -y6 eect, a model with no more than three classes should be selected. If we are willing to include more direct eects, the two-class model is the preferred one. This shows again that the possibility to work with more local dependencies may yield a simpler nal model. [INSERT TABLE 5 ABOUT HERE] Table 5 reports the parameters estimates for the two-class model containing all four direct eects. Wald tests for the dierence of the means and probabilities between classes indicate that only the mean ages (1k ) are not signicantly dierent between classes. Cluster 2 turns out to have somewhat higher means on weight (2k ), blood pressure (5k and 6k ), and serum haemoglobin (8k ), and lower means on size of tumor (9k ), index of tumor stage (10k ), and serum prostatic acid phosphatase (11k ). If we look at the nominal indicators, we see a large dierence between the two classes in the distribution of bone metastases (y12 ), somewhat smaller dierences in performance rating (y3 ) and cardiovascular disease history (y4 ), and a very small dierence in electrocardiogram code (y7 ). The direct eects between the indicators are quite strong. They all have a positive sign except for the eect of y12 on y11 . To investigate the usefulness of the applied technique, Jorgensen and Hunt (1996) and Hunt and Jorgensen (1999) investigated the strength of the relationship between the obtained classication and the outcome of the medical trial. They showed that their two-class solution, which is similar to the two-class model with local dependencies obtained here, predicted very well the success of the medical treatment.

CONCLUSIONS
This paper described the state-of-art in the eld of cluster analysis using LC models. Two important recent developments are the possibility to use various kinds of meaningful restrictions on the covariance structure in mixtures of multivariate normal distributions and the possibility to work with mixed-mode data. The rst example demonstrated the use of dierent types of specications for the covariance structure. It showed that too restrictive models may yield too many latent classes. The second example illustrated LC clustering with mixed-mode data using models with and without local dependencies.

REFERENCES
Arminger, G., and Stein, P. 1997. Finite mixture of covariance structure models with regressors: loglikehood function, distance estimation, t indices, and a complex example. Sociological Methods and Research 26: 148-182. Baneld, J.D., and Raftery, A.E. 1993. Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803-821. Bensmail, H., Celeux, G., Raftery, A.E., and Robert, C.P. 1997. Inference in model based clustering. Statistics and Computing 7: 1-10. Byar, D.P., and Green, S.B. 1980. The choice of treatment for cancer patients based on covariate information: Application to prostate cancer. Bulletin of Cancer 67: 477-490. Bckenholt, U. 1993. A latent class regression approach for the analysis of recurrent choices. o British Journal of Mathematical and Statistical Psychology 46: 95-118. Celeux, G., Biernacki, C., and Govaert, G. 1997. Choosing models in model-based clustering and discriminant analysis. Technical Report. Rhone-Alpes: INRIA. Cheeseman, P., and Stutz, J. 1995. Bayesian classication (Autoclass): Theory and results. In Advances in knowledge discovery and data mining. edited by U.M.Fayyad, G.PiatetskyShapiro, P.Smyth and R.Uthurusamy. Menlo Park: The AAAI Press. Clogg, C.C. 1981. New developments in latent structure analysis. Pp. 215-246, in Factor analysis and measurement in sociological research, edited by D.J. Jackson and E.F. Borgotta. Beverly Hills: Sage Publications. Clogg, C.C. 1995. Latent class models. Pp. 311-359, in Handbook of statistical modeling for the social and behavioral sciences, edited by G.Arminger, C.C.Clogg, and M.E.Sobel. New York: Plenum Press. Dolan, C.V., and Van der Maas, H.L.J. 1997. Fitting multivariate normal nite mixtures subject to structural equation modeling. Psychometrika 63: 227-253. Everitt, B.S. 1988. A nite mixture model for the clustering of mixed-mode data. Statistics and Probability Letters 6: 305-309. Everitt, B.S. 1993), Cluster analysis. London: Edward Arnold. Fraley, C., and Raftery, A.E. 1998a. MCLUST: Software for model-based cluster and discriminant analysis. Department of Statistics, University of Washington: Technical Report No. 342. Fraley, C., and Raftery, A.E. 1998b. How many clusters? Which clustering method? - Answers via model-based cluster analysis. Department of Statistics, University of Washington: Technical Report no. 329. Gibson, W.A. 1959. Three multivariate models: Factor analysis, latent structure analysis, and latent prole analysis. Psychometrika 24: 229-252. Goodman, L.A. 1974. Exploratory latent structure analysis using both identiabe and unidentiable models. Biometrika 61: 215-231. Hunt, L, and Jorgensen, M. 1999. Mixture model clustering using the MULTIMIX program. Australian and New Zeeland Journal of Statistics 41: 153-172. Jedidi, K., Jagpal, H.S., and DeSarbo, W.S. 1997. Finite-mixture structural equation models for response-based segmentation and unobserved heterogeneity. Marketing Science 16: 39-59.

12 Jorgensen, M., and Hunt, L. 1996. Mixture model clustering of data sets with categorical and continuous variables. Pp 375-384, in Proceedings of the Conference ISIS 96, Australia 1996. Kaufman, L., and Rousseeuw, P.J. 1990. Finding groups in data: An introduction to cluster analysis. New York: John Wiley and Sons, Inc.. Krzanowski, W.J. 1983. Distance between populations using mixed continuous and categorical variables. Biometrika 70: 235-243. Lawrence C.J., Krzanowski, W.J. 1996. Mixture separation for mixed-mode data. Statistics and Computing 6: 85-92. Lazarsfeld, P.F., and Henry, N.W. 1968. Latent structure analysis. Boston: Houghton Mill. McLachlan, G.J., and Basford, K.E. 1988. Mixture models: inference and application to clustering. New York: Marcel Dekker. McLachlan, G.J., and Peel, D. 1996. An algorithm for unsupervised learning via normal mixture models. Pp. 354-363, in Information, statistics and induction in science, edited by D.L.Dowe, K.B.Korb, and J.J.Oliver. Singapore: World Scientic Publishing. McLachlan, G.J., and Peel, D. 1999. Modelling nonlinearity by mixtures of factor analysers via extension of the EM algorithm. Technical Report. Australia: Center for Statistics, University of Queensland. McLachlan, G.J., Peel, D., Basford, K.E., and Adams, P. 1999. The EMMIX software for the tting of mixtures of normal and t-components. Journal of Statistical Software 4, No. 2. Moustaki, I. 1996. A latent trait and a latent class model for mixed observed variables. The British Journal of Mathematical and Statistical Psychology 49: 313-334. Muthen, B., and Muthen, L., 1998. Mplus: Users manual. Los Angeles: Muthen and Muthen. Reaven, G.M., and Miller, R.G. 1979. An attempt to dene the nature of chemical diabetes using multidimensional analysis. Diabetologia 16: 17-24. Vermunt, J.K. 1997. LEM: A general program for the analysis of categorical data. Users manual. Tilburg University, The Netherlands. Vermunt, J.K., and Magidson, J. 2000. Latent GOLDs Users Guide. Boston: Statistical Innovations Inc.. Wallace, C.S., and Dowe, D.L. Forthcoming. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing. Wedel, M., DeSarbo, W.S., Bult, J.R., and Ramaswamy, V. 1993. A latent class Poisson regression model for heterogeneous count data with an application to direct mail. Journal of Applied Econometrics 8: 397-411. Wolfe, J.H. 1970. Pattern clustering by mulltivariate cluster analysis. Multivariate Behavioral Research 5: 329-350. Yung, Y.F. 1997. Finite mixtures in conrmatory factor-analysis models. Psychometrika 62: 297-330.

13 Table 1: BIC values for diabetes example Number of clusters 2 3 4 4819 4762 4788 5014 4923 4869 4957 4833 4805 5170 4999 4938 4835 4756 4761 5008 4920 4862

Model 1. Class-dependent unrestricted k 2. Class-independent unrestricted k 3. Class-dependent diagonal k 4. Class-independent diagonal k 5. Class-dependent k with only 12k free 6. Class-independent k with only 12k free

1 5138 5138 5530 5530 5156 5156

5 4818 4858 4815 4895 4784 4859

14 Table 2: Parameter estimates for diabetes example Cluster 2 = Chemical 3 = Overt Estimate S.E. estimate S.E. 0.54 0.05 0.19 0.03 91.23 1.06 234.76 14.87 359.22 6.63 1121.09 58.70 163.13 6.37 76.98 9.47 76.48 12.93 5005.91 1414.43 2669.75 506.55 73551.09 22176.29 2421.45 476.65 2224.50 616.43 96.46 60.30 17910.71 5423.37

1= Normal Parameter Estimate S.E. k 0.27 0.05 1k 104.00 2.85 2k 495.06 22.74 3k 309.43 28.06 2 1k 230.09 62.96 2 2k 14844.55 3708.65 2 3k 22966.52 5395.90 12k 1279.92 420.93

15 Table 3: Clinical versus LC cluster classication in diabetes example Clinical LC cluster classication classication normal chemical overt normal 26 10 0 chemical 4 72 0 overt 5 0 28 total 35 82 28

total 36 76 33 145

16 Table 4: BIC values for cancer example Number of clusters 1 2 3 4 23762 23112 23089 23088 23529 22889 22883 22887 23502 22872 22875 22893 23473 22861 22866 22895 23322 22845 22855 22888

Model 1. Local independence 2. Model 1 + 56k 3. Model 2 + 28k 4. Model 3 + 8.12 5. Model 4 + 11.12

17 Table 5: Parameter estimates for prostate cancer example Cluster 1 Cluster 2 Parameter Estimate S.E. Estimate S.E. k 0.45 0.03 0.55 0.03 1k 71.38 0.51 71.70 0.43 2k 97.51 0.98 100.26 0.83 1,3k 0.85 0.02 0.94 0.02 2,3k 0.09 0.02 0.05 0.01 3,3k 0.05 0.02 0.01 0.01 4,3k 0.01 0.01 0.00 0.00 1,4k 0.65 0.03 0.49 0.03 2,4k 0.35 0.03 0.51 0.03 5k 14.18 0.16 14.54 0.16 6k 8.00 0.09 8.29 0.10 1,7k 0.35 0.03 0.33 0.030 2,7k 0.05 0.02 0.05 0.01 3,7k 0.14 0.02 0.07 0.02 4,7k 0.04 0.01 0.06 0.02 5,7k 0.30 0.03 0.31 0.03 6,7k 0.12 0.02 0.17 0.02 7,7k 0.00 0.00 0.00 0.00 8k 128.01 1.38 132.21 1.80 9k 4.11 0.12 2.88 0.08 10k 12.02 0.11 8.88 0.08 11k 4.00 0.12 2.11 0.11 1,12k 0.65 0.03 0.99 0.01 2,12k 0.35 0.03 0.01 0.01 2 1k 52.35 5.36 43.97 4.15 2 186.60 19.82 166.73 15.89 2k 2 4.98 0.50 6.60 0.59 5k 2 6k 1.79 0.18 2.40 0.21 2 8k 355.82 35.44 325.52 29.47 2 9k 2.91 0.29 1.40 0.14 2 10k 2.05 0.21 1.25 0.13 2 2.56 0.25 0.25 0.03 11k 28k 61.98 19.14 47.56 15.12 56k 1.82 0.25 2.52 0.30 8.12 5.76 1.35 5.76 1.35 11.12 -0.49 0.11 -0.49 0.11

18 Table 6: Computer programs and their most important features

Multivar. normal yes yes yes no no yes yes yes yes Mixedmode no no no yes yes yes yes yes1 yes Estimation method ML ML ML ML ML MAP ML ML ML + MAP

Name NORMIX EMMIX MCLUST LEM Classmix Autoclass MULTIMIX Mplus Latent GOLD

Covar. no no no yes no no no yes yes

Algorithm EM EM EM EM +NR EM EM EM EM EM + NR

System / source DOS DOS + Fortran code S-plus DOS + Windows unknown DOS + C code Fortran code DOS Windows

1. In MPLUS, categorical indicators must be dichotomous

SOFTWARE
Several computer programs are available for estimating the various types of LC cluster models discussed in this paper. Table 6 lists the most important packages and gives information on the types of cluster models they implement (multivariate normal distributions and/or mixed-mode data); whether they allow users to include covariates in the model; the estimation method they use; the algorithm (EM or NR=Newton-Raphson) they use; and the system for with an executable version and/or the type of source code that is available. [INSERT TABLE 6 ABOUT HERE] We will not repeat all the information listed in table 6 but describe the main special features of some of the programs. NORMIX (Wolfe, 1970), EMMIX (McLachlan et. al., 1999), and MCLUST (Fraley and Raftery, 1998a) are programs for LC clustering with continuous variables using multivariate normal distributions. Special features of EMMIX are that it uses of multiple sets of starting values to prevent local solutions and that it performs likelihood-ratio tests for the number of clusters using parametric bootstrapping. MCLUST allows users to restrict the class-specic variance-covariance matrices using the eigenvalue decomposition described in equation (1). LEM (Vermunt, 1997) and Classmix (Moustaki, 1996) are LC analysis programs that can be used for clustering with mixed-mode data. LEM cannot only deal with (ordinal) categorical and continuous variables, but also with Poisson counts. In LEM, it is possible to include local dependencies between categorical variables. MULTIMIX (Hunt and Jorgensen, 1999), Mplus (Muthen and Muthen, 1998), Autoclass (Cheeseman and Stutz, 1995), and Latent GOLD (Vermunt and Magidson, 2000) can deal with multivariate normal distributions, as well as with mixed-mode data. MULTIMIX allows users to specifying local dependencies between categorical and continuous variables using conditional Gaussian distributions. Both Mplus and Latent GOLD are very exible with respect to the specication of the structure of the error-covariance matrices: any covariance can be included or excluded from the model. Two weak points of Mplus are that the categorical variables should be dichotomous and that the user has to provide starting values for all parameters. Autoclass is a program that has automatized model selection using multiple sets of starting values (also for the number of classes). Latent GOLD is the only fully Windows based program, which make it very easy to use. Like LEM, it cannot only deal with (ordinal) categorical and continuous variables, but also with Poisson counts. Its multiple sets of random starting values help users to prevent ending with a local solution and its bivariate residual measures make it easy to detect local dependencies to be included in the model.

SYMBOLS
K J i j k y y z f (..) 2 j j number of classes or clusters number of indicator variables index to denote a particular case index to denote a particular indicator variable index to denote a particular class or cluster vector of indicator variables value of an indicator variable covariate vector density function probability parameter vector mean vector variance-covariance matrix variance of variable j covariance between variables j and

FURTHER READING
Further reading on cluster analysis by means of latent class or nite mixture models can be done with McLachlan and Basford (1988) and Everitt (1993).

Handbook of Research Methods and Applications in Heterodox Economics
No ratings yet
Handbook of Research Methods and Applications in Heterodox Economics
629 pages
Latent Clustering W Mplus v2
No ratings yet
Latent Clustering W Mplus v2
57 pages
1894 - Pearson - Contributions To The Mathematical Theory of Evolution
No ratings yet
1894 - Pearson - Contributions To The Mathematical Theory of Evolution
56 pages
Hagenaars 2002 B
No ratings yet
Hagenaars 2002 B
22 pages
Tom A. B. Snijders - Multilevel Analysis - An Introduction To Basic and Advanced Multilevel Modeling (2011) - 1
No ratings yet
Tom A. B. Snijders - Multilevel Analysis - An Introduction To Basic and Advanced Multilevel Modeling (2011) - 1
521 pages
Model-Based Clustering
No ratings yet
Model-Based Clustering
23 pages
(Hodder Arnold Publication) Andrew Hinde - Demographic Methods-Routledge (1998)
No ratings yet
(Hodder Arnold Publication) Andrew Hinde - Demographic Methods-Routledge (1998)
320 pages
Clases Latentes Explicación Intuitiva
No ratings yet
Clases Latentes Explicación Intuitiva
15 pages
Latent Class Model Clustering Comparisosn With Kmeans
No ratings yet
Latent Class Model Clustering Comparisosn With Kmeans
8 pages
Active Statistics (Andrew Gelman, Aki Vehtari) (Z-Library)
No ratings yet
Active Statistics (Andrew Gelman, Aki Vehtari) (Z-Library)
370 pages
Exploratory and Multivariate Data Analysis by Michel Jambu (Auth.) PDF
No ratings yet
Exploratory and Multivariate Data Analysis by Michel Jambu (Auth.) PDF
473 pages
Linear Mixed Models For Longitudinal Data
No ratings yet
Linear Mixed Models For Longitudinal Data
579 pages
An Introduction To Latent Class Analysus - Nobuoki Eshima
0% (1)
An Introduction To Latent Class Analysus - Nobuoki Eshima
196 pages
CLUSTRING
No ratings yet
CLUSTRING
13 pages
Househol Demography and Household Modeling
No ratings yet
Househol Demography and Household Modeling
371 pages
Copula
No ratings yet
Copula
21 pages
Roger King, Simon Marginson, Rajani Naidoo - Handbook On Globalization and Higher Education (2011, Edward Elgar)
No ratings yet
Roger King, Simon Marginson, Rajani Naidoo - Handbook On Globalization and Higher Education (2011, Edward Elgar)
560 pages
Cluster Analysis-2
No ratings yet
Cluster Analysis-2
7 pages
EN - Bayesian Methods in Survival Analysis Enhancing Insights in Clinical Research
No ratings yet
EN - Bayesian Methods in Survival Analysis Enhancing Insights in Clinical Research
11 pages
Synthetic Fibers and Plastics
No ratings yet
Synthetic Fibers and Plastics
38 pages
PCI D Pharm (Pharmacy Diploma) Syllabus - Pharmagang
No ratings yet
PCI D Pharm (Pharmacy Diploma) Syllabus - Pharmagang
29 pages
SAS System For Regression
No ratings yet
SAS System For Regression
239 pages
Why Speaking in Tongues Teaching Notes
No ratings yet
Why Speaking in Tongues Teaching Notes
10 pages
Amy Gutmann, Dennis Thompson - Democracy and Disag - 240421 - 190313
No ratings yet
Amy Gutmann, Dennis Thompson - Democracy and Disag - 240421 - 190313
510 pages
Rasch Models Foundations, Recent Developments, and Applications
No ratings yet
Rasch Models Foundations, Recent Developments, and Applications
428 pages
Topological Data Analysis
No ratings yet
Topological Data Analysis
26 pages
Moral Objectives, Rules, and The Forms of Social Change (1998)
No ratings yet
Moral Objectives, Rules, and The Forms of Social Change (1998)
381 pages
Jean-Michel Josselin_Benoît Le Maux - STATISTICAL TOOLS FOR PROGRAM EVALUATION_ methods and applications to economic policy, public ... health, and education (2018, Springer International Publishing, Cham) - libgen
No ratings yet
Jean-Michel Josselin_Benoît Le Maux - STATISTICAL TOOLS FOR PROGRAM EVALUATION_ methods and applications to economic policy, public ... health, and education (2018, Springer International Publishing, Cham) - libgen
622 pages
poLCA An R Package For Polytomous Variable Latent
No ratings yet
poLCA An R Package For Polytomous Variable Latent
29 pages
Workshop Bayes
No ratings yet
Workshop Bayes
534 pages
Scholarly Book Reviewing in The Social Sciences and Humanities: The Flow of Ideas Within and Among Disciplines
No ratings yet
Scholarly Book Reviewing in The Social Sciences and Humanities: The Flow of Ideas Within and Among Disciplines
164 pages
Copula Statistics
No ratings yet
Copula Statistics
8 pages
Intermediate R - Cluster Analysis
33% (3)
Intermediate R - Cluster Analysis
27 pages
Stata Item Response Theory Reference Manual: Release 17
No ratings yet
Stata Item Response Theory Reference Manual: Release 17
257 pages
Chap 1-4, Statistical Inference, by Casella and Berger PDF
No ratings yet
Chap 1-4, Statistical Inference, by Casella and Berger PDF
686 pages
Виллемсе И., Ниелисани П. Статистические методы и навыки расчетов
100% (2)
Виллемсе И., Ниелисани П. Статистические методы и навыки расчетов
328 pages
Manual Stata 13
100% (1)
Manual Stata 13
371 pages
ATMega16 Microcontroller Digital LM35 LCD Thermometer
100% (3)
ATMega16 Microcontroller Digital LM35 LCD Thermometer
4 pages
Shipunov Visual Statistics
No ratings yet
Shipunov Visual Statistics
429 pages
Meta-Analysis Fixed Effect Vs Random Effects
No ratings yet
Meta-Analysis Fixed Effect Vs Random Effects
162 pages
Evaporation Pond Reply by SEZAD
No ratings yet
Evaporation Pond Reply by SEZAD
17 pages
Copula Intro
No ratings yet
Copula Intro
8 pages
Hu and Bentler 1998
No ratings yet
Hu and Bentler 1998
30 pages
Environment Baseline SURVEY Report For Nghi Son Refinery Petrochemical Complex
No ratings yet
Environment Baseline SURVEY Report For Nghi Son Refinery Petrochemical Complex
159 pages
Bayes PDF
No ratings yet
Bayes PDF
634 pages
Item Response Theory and Confirmatory Factor Analysis: Complementary Approaches For Scale Development
No ratings yet
Item Response Theory and Confirmatory Factor Analysis: Complementary Approaches For Scale Development
23 pages
Reml Guide
No ratings yet
Reml Guide
93 pages
Statistics With R Programming PDF
No ratings yet
Statistics With R Programming PDF
53 pages
985 Structural Equation Modeling in Educational Research
100% (1)
985 Structural Equation Modeling in Educational Research
39 pages
Geography Grade10 June Exams 2023-1
100% (1)
Geography Grade10 June Exams 2023-1
14 pages
Copula Modeling: An Introduction For Practitioners: Pravin K. Trivedi and David M. Zimmer
No ratings yet
Copula Modeling: An Introduction For Practitioners: Pravin K. Trivedi and David M. Zimmer
111 pages
Bayesian Statistics Primer PDF
No ratings yet
Bayesian Statistics Primer PDF
23 pages
Statistics
No ratings yet
Statistics
27 pages
Latent Profile Analysis in R: A Tutorial and Comparison To Mplus
No ratings yet
Latent Profile Analysis in R: A Tutorial and Comparison To Mplus
19 pages
Bifactor Modelling in Mplus
No ratings yet
Bifactor Modelling in Mplus
55 pages
Structural Equation Models: The Basics
No ratings yet
Structural Equation Models: The Basics
15 pages
Morgan Handbook 2013
No ratings yet
Morgan Handbook 2013
10 pages
Multivariate Data Analysis / Joseph F. Hair, Jr. ... (Et Al.)
No ratings yet
Multivariate Data Analysis / Joseph F. Hair, Jr. ... (Et Al.)
4 pages
Copula (Probability Theory)
No ratings yet
Copula (Probability Theory)
9 pages
Kutner Solution
0% (2)
Kutner Solution
43 pages
A Critical Look at The Use of PLS-SEM in MIS Quarterly
No ratings yet
A Critical Look at The Use of PLS-SEM in MIS Quarterly
18 pages
Principles of Research Design in The Social Sciences
No ratings yet
Principles of Research Design in The Social Sciences
180 pages
R/Rpad Reference Card: Slicing and Extracting Data
No ratings yet
R/Rpad Reference Card: Slicing and Extracting Data
5 pages
IJMSRR The Concept of Power in Decision Making Process A Cross Cultural Perspective 0404201526
No ratings yet
IJMSRR The Concept of Power in Decision Making Process A Cross Cultural Perspective 0404201526
4 pages
Writing Reproducible Reports: Knitr With R Markdown
No ratings yet
Writing Reproducible Reports: Knitr With R Markdown
24 pages
Response To Collins About One Point' That Is Absent From My Review of His Book - Yves Gingras 2009
No ratings yet
Response To Collins About One Point' That Is Absent From My Review of His Book - Yves Gingras 2009
1 page
Cumene Prices
No ratings yet
Cumene Prices
3 pages
KEY Student Notes Lecture 41 Acid-Base Reactions and Titrations
No ratings yet
KEY Student Notes Lecture 41 Acid-Base Reactions and Titrations
10 pages
Ch01 Introduction To MP
100% (1)
Ch01 Introduction To MP
11 pages
Econometrics in R: Grant V. Farnsworth October 26, 2008
No ratings yet
Econometrics in R: Grant V. Farnsworth October 26, 2008
50 pages
Highway 1
No ratings yet
Highway 1
76 pages
Prestressed Concrete
No ratings yet
Prestressed Concrete
4 pages
Bio Mecanic A
No ratings yet
Bio Mecanic A
24 pages
RNN Basics
No ratings yet
RNN Basics
17 pages
2.water Hardness - Ion Exchange Method
No ratings yet
2.water Hardness - Ion Exchange Method
5 pages
Beovision10 Gettingstarted English PDF
No ratings yet
Beovision10 Gettingstarted English PDF
32 pages
Tour Guiding
No ratings yet
Tour Guiding
19 pages
Le Mock 35 Ques @legaledgemock
No ratings yet
Le Mock 35 Ques @legaledgemock
40 pages
EBOOK Darby and Walsh Dental Hygiene Theory Practice 5Th Edition Download Full Chapter PDF Kindle
100% (59)
EBOOK Darby and Walsh Dental Hygiene Theory Practice 5Th Edition Download Full Chapter PDF Kindle
61 pages
RA Working at Heights
No ratings yet
RA Working at Heights
2 pages
8310 8311 8312 Kkat Eng 13
No ratings yet
8310 8311 8312 Kkat Eng 13
4 pages
National Biosecurity Manual For Beef Cattle Feedlots
No ratings yet
National Biosecurity Manual For Beef Cattle Feedlots
36 pages
Taiga Report v2
No ratings yet
Taiga Report v2
17 pages
Termodinamica Moran Shapiro 7a Ed - Resp
No ratings yet
Termodinamica Moran Shapiro 7a Ed - Resp
32 pages
Activity Running Wolf Ws 7A
No ratings yet
Activity Running Wolf Ws 7A
3 pages
Geography Oral Presentation Script
No ratings yet
Geography Oral Presentation Script
2 pages
Ehs 5 454
No ratings yet
Ehs 5 454
10 pages
S&C SMU-20 Rec Total
No ratings yet
S&C SMU-20 Rec Total
1 page
Densified Wooden Nails For New Timber Assemblies and Restoration Works - A Pilot Research
No ratings yet
Densified Wooden Nails For New Timber Assemblies and Restoration Works - A Pilot Research
9 pages
The Legend of Saint Barbara
No ratings yet
The Legend of Saint Barbara
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Latent Class Cluster Analysis - Vermunt, Magidson

Uploaded by

Latent Class Cluster Analysis - Vermunt, Magidson

Uploaded by

1

LATENT CLASS CLUSTER ANALYSIS

CONTINUOUS INDICATOR VARIABLES

MIXED INDICATOR VARIABLES

TWO EMPIRICAL EXAMPLES

Prostate cancer data

1 5138 5138 5530 5530 5156 5156

5 4818 4858 4815 4895 4784 4859

18 Table 6: Computer programs and their most important features

Covar. no no no yes no no no yes yes

1. In MPLUS, categorical indicators must be dichotomous

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.