An Introduction To Latent Class Analysus - Nobuoki Eshima
An Introduction To Latent Class Analysus - Nobuoki Eshima
Nobuoki Eshima
An Introduction
to Latent Class
Analysis
Methods and Applications
Behaviormetrics: Quantitative Approaches
to Human Behavior
Volume 14
Series Editor
Akinori Okada, Professor Emeritus, Rikkyo University,
Tokyo, Japan
This series covers in their entirety the elements of behaviormetrics, a term that
encompasses all quantitative approaches of research to disclose and understand
human behavior in the broadest sense. The term includes the concept, theory,
model, algorithm, method, and application of quantitative approaches from
theoretical or conceptual studies to empirical or practical application studies to
comprehend human behavior. The Behaviormetrics series deals with a wide range
of topics of data analysis and of developing new models, algorithms, and methods
to analyze these data.
The characteristics featured in the series have four aspects. The first is the variety
of the methods utilized in data analysis and a newly developed method that includes
not only standard or general statistical methods or psychometric methods
traditionally used in data analysis, but also includes cluster analysis, multidimen-
sional scaling, machine learning, corresponding analysis, biplot, network analysis
and graph theory, conjoint measurement, biclustering, visualization, and data and
web mining. The second aspect is the variety of types of data including ranking,
categorical, preference, functional, angle, contextual, nominal, multi-mode
multi-way, contextual, continuous, discrete, high-dimensional, and sparse data.
The third comprises the varied procedures by which the data are collected: by
survey, experiment, sensor devices, and purchase records, and other means. The
fourth aspect of the Behaviormetrics series is the diversity of fields from which the
data are derived, including marketing and consumer behavior, sociology, psychol-
ogy, education, archaeology, medicine, economics, political and policy science,
cognitive science, public administration, pharmacy, engineering, urban planning,
agriculture and forestry science, and brain science.
In essence, the purpose of this series is to describe the new horizons opening up
in behaviormetrics — approaches to understanding and disclosing human behaviors
both in the analyses of diverse data by a wide range of methods and in the
development of new methods to analyze these data.
Editor in Chief
Akinori Okada (Rikkyo University)
Managing Editors
Daniel Baier (University of Bayreuth)
Giuseppe Bove (Roma Tre University)
Takahiro Hoshino (Keio University)
An Introduction to Latent
Class Analysis
Methods and Applications
Nobuoki Eshima
Department of Pediatrics and Child Health
Kurume University
Kurume, Fukuoka, Japan
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
In observing human behaviors and responses to various stimuli and test items, it
is valid to assume they are dominated by factors, for example, attitudes and abili-
ties in Psychology, and in Sociology, belief, social customs and folkways, and so
on; however, it is rare to observe and measure such factors directly. The factors are
hypothesized components that we employ in scientific researches, and the factors
are unobservable and treated as latent variables to explain phenomena under consid-
eration. The factors are sometimes called latent or internal factors. Although the
latent factors cannot be measured directly, data of the observable variables, such as
responses to test items and interviews with respect to national elections, are obtain-
able. The variables are referred to as the manifest variables. Based on the observa-
tions, the latent factors have to be estimated, and in order to do it, latent structure
analysis was proposed by Lazarsfeld (1950). For extracting latent factors, it is needed
to collect multivariate data of manifest variables, and in measuring the variables, then,
we have to take response errors into consideration, which are those induced from
physical and mental conditions of subjects, intrusion (guessing) and omission (forget-
ting) errors, and so on. When observing results of a test battery from examinees or
subjects, it is critical how their abilities can be assessed and our interest is how we
order the examinees according to the latent factors instead of simple scores, that is,
sums of item scores, by using the test battery. Latent structure analysis is classified
into latent class analysis, latent trait analysis and latent profile analysis according to
the types of manifest and latent variables, in a strict sense. Latent class analysis treats
discrete (categorical) manifest and latent variables; latent trait analysis deals with
discrete manifest variables and continuous latent variables; and latent profile analysis
handles continuous manifest variables and discrete latent variables. The purpose of
latent structure analysis is similar to that of factor analysis (Spearman, 1904), so in
a wide sense, factor analysis is also included in latent structure analysis. Introducing
latent variables in data analysis is ideal; however, it is sensible and meaningful to
explain the phenomena under consideration by using latent variables. In this book,
latent class analysis is taken up in the focus of discussion, and applications of latent
class models to data analyses are treated in the several themes, that is, exploratory
latent class analysis, confirmatory latent class analysis, analysis of longitudinal data,
v
vi Preface
path analysis with latent class models and so on. Along with it, latent profile and
latent trait models are also treated in the parameter estimation. The author would like
to expect the present book to play a significant role in introducing latent structure
analysis to not only young researchers and students studying behavioral sciences,
but also those investigating in the other scientific research fields.
References
Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis. In
S. A. Stuffer, L. Guttman, and others (Eds.), Measurement of Prediction: Studies in Social
Psychology in World War II, 4, Princeton University Press.
Spearman, S. (1904). “General-intelligence”, objectively determined and measured. American
Journal of Psychology, 15, 201–293
Acknowledgements
vii
Contents
ix
x Contents
1.1 Introduction
Latent structure analysis is classified into four analyses, i.e., latent class analysis,
latent profile analysis, latent trait analysis, and factor analysis. Latent class analysis
was introduced for explaining social phenomena by Lazarsfeld [11], and it analyzes
discrete (categorical) data, assuming a population or group under study is divided
into homogeneous subgroups which are called latent classes. As in the latent class
model, assuming latent classes in a population, latent profile analysis (Gibson, 1959)
was proposed for the study of interrelationships among continuous variables. In
this sense, the model may be regarded as a latent class model [1, 8]. Latent trait
analysis has been developed in a mental test theory (Lord, 1952, Lord & Novic,
1968) and also employed in social attitude measurements [10]. The latent trait model
was designed to explain responses to manifest categorical variables depending on
latent continuous variables, for example, ability, attitude, and so on. The systematic
discussion on the above models was given in Lazarsfeld & Henry [9]. Factor analysis
dates back to the works of Spearman [20], and the single factor model was extended
to the multiple factor model [21]. The analysis treats manifest and latent continuous
variables and explains phenomena under study by extracting simple structures to
explain inter-relations between the manifest and latent variables. Although “latent
structure analysis” is now a general term for the analyses with the above models,
in many cases, the name is used for latent class analysis in a narrow sense after
Lazarsfeld [11]. In the early years of the developments of the latent structure models,
the main efforts on studies were placed on parameter estimations by solving the
equations with respect to the means and covariances of manifest variables, which
are called the accounting equations; however, now, the methods are only in the
historical development. As the efficiency of computers has been increased rapidly,
these days, the method of maximum likelihood (ML) can be easily applied to data
analyses. Especially, the expectation–maximization (EM) algorithm [3] provided a
great contribution to parameter estimation in latent structure analysis.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_1
2 1 Overview of Basic Latent Structure Models
In this chapter, latent structure models are reviewed. Section 1.2 treats the latent
class model and the accounting equations are given. In Sect. 1.3, a latent trait model
with discriminant and item difficulty parameters is discussed. A comparison of the
model and a latent class model is also made. Section 1.4 treats the latent profile model,
which is regarded as a factor analysis model with categorical factors, and in Sect. 1.5,
the factor analysis model is briefly reviewed. Section 1.6 reviews generalized linear
models (GLM) [15, 16], and latent structure models are treated in a GLM framework,
and in Sect. 1.7, the EM algorithm for the ML estimation of latent structure models is
summarized. Finally, in Sect. 1.8, a summary and discussions of the present chapter
are provided.
In the latent class model, it is assumed a population is divided into some subpop-
ulations, in which individuals are homogeneous in responses to items under study.
The subpopulations are called latent classes in the analysis. Let X i be manifest vari-
ables that take categories {1, 2, . . . , K i }, i = 1, 2, . . . , I , which imply the responses
to be observed, and let ξ be a latent variable that takes categories {1, 2, . . . , A},
which denote latent classes and they are expressed by integers for simplicity of
the notations. Let X = (X 1 , X 2 , . . . , X I )T be the I -dimensional column vector
of the manifest variables X i ; let P(X = x|a) be the conditional probability of
X = x = (x1 , x2 , . . . , x I )T for given latent variable (class) a; and let P(X i = xi |a)
be those of X i = xi . Then, in the latent class model, conditional probability
P(X = x|a) of X = x for a given latent variable (class) a is expressed by
I
P(X = x|a) = P(X i = xi |a). (1.1)
i=1
The above equation indicates that manifest variables X i are statistically indepen-
dent in latent class a. The assumption is called that of local independence. Let va
be the probability that a randomly selected individual in a population is from latent
class a. Then, from (1.1), we have
A
A
I
P(X = x) = va P(X = x|a) = va P(X i = xi |a), (1.2)
α=1 α=1 i=1
where
A
Ki
va = 1; P(X i = xi |a) = 1, i = 1, 2, . . . , I. (1.3)
α=1 xi =1
1.2 Latent Class Model 3
The above equations are referred to as the accounting equations. In many data
analyses, manifest variables are binary, for example, binary categories are {yes, no},
{positive, negative}, {success, failure}, and so on. Such responses are formally
denoted as integers {1,0}. Let πai be the positive response probabilities of manifest
variables X i , i = 1, 2, . . . , I . Then, the Eqs. (1.2) are expressed as follows:
A
I
P(X = x) = va πaixi (1 − πai )1−xi . (1.4)
α=1 i=1
According to the above accounting Eqs. (1.2) and (1.4), the latent class model is
also viewed as a mixture of the independent response models (1.1). The interpre-
tation of latent classes is done by latent response probabilities (πa1 , πa2 , . . . , πa I )
(Table 1.1). Exploratory latent class analysis is performed by the general model
(1.2) and (1.4), where any restrictions are not placed on model parameters va and
P(X i = xi |a). On the other hand, in confirmatory analysis, some constraints are
placed on the model parameters. The constraints are made according to phenomena
under study or the information of practical scientific research.
Remark 1.1 In latent class model (1.2) with constraintsin (1.3), the number of
I
parameters va is A − 1 and that of P(X i = xi |a) is A i=1 (K − 1). Since the
I i
number of manifest probabilities (parameters) P(X = x) is i=1 K i − 1, in order
to identify the latent class model, the following inequality has to hold:
I
I
K i − 1 > (A − 1) + A (K i − 1). (1.5)
i=1 i=1
Remark 1.2 In the latent class model (1.2), single latent variable ξ has been assumed
for explaining a general framework of the model. In a confirmatory latent class
analysis, some latent variables can be set for the analysis, for example, an application
of the latent class model to explain skill acquisition patterns [2] and latent class factor
analysis model [14]; however, in such cases, since latent variables are categorical and
finite, the models can be viewed as restricted cases of the general latent class model.
For example, for a latent class model with two latent variables ξ j with categorical
4 1 Overview of Basic Latent Structure Models
sample spaces 1, 2, . . . , A j , j = 1, 2, setting (ξ1 .ξ2 ) = (a, b) as a new latent
variable ζ = a + A1 (b − 1), a = 1, 2, . . . , A1 , b = 1, 2, . . . , A2 , the model can be
viewed as a restricted case of the general model.
I
P(X = x|θ) = P(X i = xi |θ),
i=1
Then, we have
+∞
I
P(X = x) = ∫ ϕ(θ ) P(X i = xi |θ)dθ. (1.6)
−∞
i=1
Comparing (1.2) and (1.6), model (1.6) can be approximated by a latent class
model. Let
−∞ = θ(0) < θ(1) < θ(2) < . . . < θ(A−1) < +∞ = θ(A) , (1.7)
and let
θa
va = ∫ ϕ(θ )dθ, a = 1, 2, . . . , A. (1.8)
θa−1
A
I
P(X = x) ≈ va P(X i = xi |θa ), (1.9)
α=1 i=1
where we set
1 exp(Dai (θ − di ))
Pi (θ ) = = , i = 1, 2, . . . , I,
1 + exp(−Dai (θ − di )) 1 + exp(Dai (θ − di ))
(1.10)
where ai and di are discriminant and difficulty parameters, respectively, for test item
i, i = 1, 2, . . . , I , and D = 1.7. This model is an extension of the (Rasch model
(1960)) and popularly used in item response theory. In general, the above functions
are referred to as item characteristic functions. Positive response probabilities Pi (θ )
are usually continuous functions in latent trait θ (Fig. 1.1).
1.2
0.8
0.6
a =2
a =2 a =4
0.4 a =5
a =3
0.2
0
-1.95
-1.8
-1.65
-1.5
-1.35
-1.2
-1.05
-0.9
-0.75
-0.6
-0.45
-0.3
-0.15
0.15
0.3
0.45
0.6
0.75
0
0.9
1.05
1.2
1.35
1.5
1.65
1.8
1.95
we have
exp(Dai (θ − di ))
(ai (θ − di )) ≈ .
1 + exp(Dai (θ − di ))
The treatment of the logistic models in both theoretical and practical discussion
is easier than that on the normal distribution model (ai (θ − di )), so the logistic
models are used in item response models. The graded response model [18, 19] is an
extension of this model.
In order to improve the Guttman scale model, the latent distance model [9] was
proposed by using step functions. Let θ be a latent trait on interval [0, 1] and let the
thresholds θ(i) , i = 1, 2, . . . , I be given as
θ(0) = 0 < θ(1) < θ(2) < . . . < θ(I ) < 1 = θ(I +1) . (1.11)
where 0 ≤ πiL < πiH ≤ 1. Probabilities πiL imply guessing errors and 1 − πiH
forgetting ones. In the above model, thresholds (1.11) imply the difficulties of items
as well. If we set
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.04
0.08
0.12
0.16
0.2
0.24
0.28
0.32
0.36
0.4
0.44
0.48
0.52
0.56
0.6
0.64
0.68
0.72
0.76
0.8
0.84
0.88
0.92
0.96
1
theta =0.1 theta = 0.3 theta = 0.4 theta = 0.7 theta = 0.9
the latent distance model is a restricted version of the latent class model. As in the
two-parameter logistic model, the graphs of Pi (θ ) are illustrated in Fig. 1.2.
I
f (x|a) = f i (xi |a). (1.14)
i=1
A
I
f (x) = va f i (xi |a), (1.15)
a=1 i=1
where
8 1 Overview of Basic Latent Structure Models
A
va = 1.
a=1
If the conditional density functions f i (xi |a) are normal with mean vectors μai
and variances ψi2 , i = 1, 2, . . . , I , then, from (1.14), we have
I
1 (xi − μai )2
f (x|a) = exp − . (1.16)
i=1 2π ψi2 2ψi2
This model can be expressed with linear equations, and in latent class a, the
following equations are given:
X i = μai + ei , i = 1, 2, . . . , I, (1.17)
where ei are the error terms that are independently distributed according to normal
distributions with means 0 and variances ψi2 , i = 1, 2, . . . , I . From the above equa-
and covariances of the manifest variables, σi = Var(X i ) and
2
tions, the variances
σi j = Cov X i , X j , are described as follows:
A 2
σi2 = a=1 v μ − μi + ψi2 (i = 1, 2, . . . , I ),
A a ai (1.18)
σi j = a=1 va μai − μi μa j − μ j (i = j) ,
where
A
E(X i ) = μi = va μai , i = 1, 2, . . . , I. (1.19)
a=1
0.3
0.25
0.2
0.15
0.1
0.05
0
-5
5
-4.6
-4.2
-3.8
-3.4
-3
-2.6
-2.2
-1.8
-1.4
-1
-0.6
-0.2
0.2
0.6
1
1.4
1.8
2.2
2.6
3
3.4
3.8
4.2
4.6
Fig. 1.3 The density function of 0.6N (−2, 1) + 0.4N (5, 1)
m
Xi = λi j ξ j + εi , i = 1, 2, . . . , I,
j=1
where
⎧
⎪
⎪ E(X i ) =E(εi ) = 0, i = 1, 2, . . . , I ;
⎪
⎪
⎪
⎨ E ξ j = 0, j = 1, 2, . . . , m;
Var ξ j = 1, j = 1, 2, . . . , m;
⎪
⎪
⎪
⎪ Var(εi ) = ψi2 > 0, i = 1, 2, . . . , I ;
⎪
⎩ Cov(εk , εl ) = 0, k = l.
Comparing (1.16) and (1.21), the latent profile model can be viewed as a factor
analysis model with categorical factors.
In latent class analysis, the latent class factor analysis model is briefly reviewed
[14]. Let ξ j , j = 1, 2, . . . , m be binary latent variables; let X i , i = 1, 2, . . . , I be
binary manifest variables; let f (x|ξ ) be the conditional probability function of X =
(X 1 , X 2 , . . . , X I )T given ξ = (ξ1 , ξ2 , . . . , ξm )T . Assuming there are no interactions
between the latent variables, then, the model is expressed as follows:
I exp αi + xi mj=1 λi j ξ j
f (x|ξ ) = . (1.22)
i=1 1 + exp αi + xi mj=1 λi j ξ j
In a factor analysis model, the predictor in the above model is mj=1 λi j ξ j and
regression coefficients λi j are log odds with respect to binary variables X i and ξ j ,
and the parameters are interpreted as the effects of latent variables ξ j on manifest
X i , that is, the positive response probabilities are changes by multiplying
variables
exp λi j . The latent class factor analysis model (1.22) is similar to the factor analysis
model (1.21).
Generalized linear models (GLMs) are widely applied to regression analyses for both
continuous and categorical response variables [15, 16]. As in the above discussion,
let f i (xi |ξ ) be the conditional density or probability function of manifest variables
X i given latent variable vector ξ . Then, in GLMs, the function is assumed to be the
following exponential family of distributions:
xi θi − bi (θi )
f i (xi |ξ ) = exp + ci (xi , ϕi ) , i = 1, 2, . . . , I, (1.23)
ai (ϕi )
1.6 Latent Structure Models in a Generalized Linear Model Framework 11
where θi and ϕi are parameters and ai (ϕi )(> 0), bi (θi ) and ci (xi , ϕi ) are specific
functions for response manifest variables X i , i = 1, 2, . . . , I . This assumption is
referred to as the random component. If X i is the Bernoulli trial with P(X i = 1) = πi ,
then, the conditional probability function is
πi
f i (xi |ξ ) = πixi (1 − πi ) 1−xi
= exp xi log + log(1 − πi ) ,
1 − πi
In this formulation, for binary manifest variables, latent class model (1.1) can be
expressed as follow:
I
πai
P(X = x|a) = exp xi log − log(1 − πai )
i=1
1 − πai
I
I
= exp xi θai − log(1 − πai )
i=1 i=1
I
= exp x θ a − T
log(1 − πai ) , (1.25)
i=1
where
πai
θ aT = (θa1 , θa2 , . . . , θa I ), θai = log , i = 1, 2, . . . , I.
1 − πai
For normal variable X i with mean μi and variance ϕi2 , the conditional density
function is
1 xi μi − 21 μi2 xi2
f i (xi |ξ ) = exp + − 2 ,
2π ψ 2 ψi2 2ψi
i
where
1 2 x2
θi = μi , ai (ϕi ) = ψi2 , bi (θi ) = μi , ci (xi , ϕi ) = − i 2 − log 2π ψi2 .
2 2ψi
i=1
ψi2 i=1
ψi2 i=1
(1.26)
Let us set
⎛ ⎞
ψ12
⎜ ψ22 0 ⎟
⎜ ⎟ T
=⎜ .. ⎟, θ = (θ1 , θ2 , . . . , θ I ), x T = (x1 , x2 , . . . , x I ).
⎝ 0 . ⎠
ψ I2
I
I
xi θi − bi (θi )
f (x|ξ ) = f i (xi |ξ ) = exp + ci (xi , ϕi )
i=1 i=1
ai (ϕi )
I
−1 −1
= exp x T
θ −1 T
b(θ ) + ci (xi , ϕi ) .
i=1
(1.28)
From the above discussion, by using appropriate linear predictors and link func-
tions, latent structure models can be expressed by GLMs, for example, in the factor
analysis model (1.21) and the latent class factor analysis model (1.22), the linear
predictors are expressed by mj=1 λi j ξ j and the link functions are identity ones, and
then, θi = mj=1 λi j ξ j , i = 1, 2, . . . , I . Hence, the effects of latent variables on the
manifest variables can be measured with the entropy coefficient of determination
1.6 Latent Structure Models in a Generalized Linear Model Framework 13
(ECD) [4]. Let f (x) and g(ξ ) be the marginal density or probability functions of
manifest variable vector X and latent variable vector ξ , respectively. Then, in the
latent structure model (1.28), we have
f (x|ξ ) f (x)
KL(X, ξ ) = ∫ f (x|ξ )g(ξ )log d xdξ + ∫ f (x)g(ξ )log d xdξ
f (x) f (x|ξ )
= ∫( f (x|ξ ) − f (x))g(ξ )log f (x|ξ )d xdξ = tr −1 Cov(θ , X).
(1.29)
If the manifest and latent variables are discrete (categorical), the related integrals in
(1.29) are substituted with appropriate summations. From the above KL information,
the entropy coefficient of determination (ECD) is given by
The ECD expresses the explanatory or predictive power of the GLMs. Applying
ECD to model (1.21), we have
m
I j=1 λi j
2 I Ri2
i=1 ψi2 i=1 1−Ri2
ECD(X, ξ ) = I m 2 = ,
j=1 λi j I Ri2
+1 i=1 1−Ri2 +1
i=1 ψi2
where Ri2 are the coefficients of determination of predictors θi = mj=1 λi j ξ j on the
manifest variables X i , i = 1, 2, . . . , I [5, 7]. Similarly, from model (1.22), we also
get
I m
j=1 λi j Cov X i , ξ j
2
i=1
ECD(X, ξ ) = I m 2 .
i=1 j=1 λi j Cov X i , ξ j + 1
Discussions of ECD in factor analysis and latent trait analysis are made in Eshima
et al. [5] and Eshima [6]. In this book, ECD is used for measuring the predictive power
of latent variables for manifest variables, and is also applied to make path analysis
in latent class models.
Remark 1.4 In basic latent structure models treated in this chapter (1.23), since
Cov(θi , X i )
KL(X i , ξ ) = , i = 1, 2, . . . , I,
a(ϕ)
I
I
Cov(θi , X i )
KL(X, ξ ) = KL(X i , ξ ), = .
i=1 i=1
ai (ϕi )
In models (1.21),
KL(X i , ξ )
ECD(X i , ξ ) = = Ri2 , i = 1, 2, . . . , I.
KL(X i , ξ ) + 1
The expectation–maximization (EM) algorithm [3] for the maximum likelihood (ML)
estimation from incomplete data is reviewed in the latent structure model framework.
The algorithm is a powerful tool for the ML estimation of latent structure models. In
latent structure analysis, (X, ξ ) and X are viewed as the complete and incomplete
data, respectively. In the latent structure model, for parameter vector φ, the condi-
tional density or probability function of X given ξ , f (x|ξ ), the marginal density or
probability function of X, f (x), and that of ξ , g(ξ ) are denoted by f (x|ξ )φ , f (x)φ ,
and g(ξ )φ , respectively. Let f (x, ξ )φ be the joint density function of the complete
data (x, ξ ), then,
Let
Q φ |φ = E log f (x, ξ )φ
|x, φ
l φ = maxlog f (x)φ ,
φ
By using the above iterative procedure, the ML estimates of latent structure models
can be obtained. If there exists a sufficient statistic t(x, ξ ) for parameter vector φ
such that
exp φt(x, ξ )T
f (x, ξ |φ) = b(x, ξ ) , (1.33)
s(φ)
(ii) M-step
1.8 Discussion
Basic latent structure models, i.e., the latent class model, latent trait model, latent
profile model, and factor analysis model, are overviewed in this chapter. These models
are based on, what we call, the assumption of local independence, that is, the manifest
variables are statistically independent, given latent variables. These models can be
expressed in a GLM framework, and the multivariate formulation of latent structure
models is also given in (1.28). Studies of latent structure models through a GLM
16 1 Overview of Basic Latent Structure Models
framework will be important to grasp the models in a general way and to apply
them in various research domains, and it may lead to the construction of new latent
structure models. It is expected that new latent structure models are designed in the
applications. The EM algorithm is a useful tool to perform the ML estimation of
latent structure models, and a brief review of the method is also given in this chapter.
In the following chapters, the EM algorithm is used to estimate the model parameters.
References
1. Bartholomew, D. J. (1987). Latent variable models and factor analysis. Charles & Griffin.
2. Dayton, M., & Macready, G. B. (1976). A probabilistic model for validation of behavioral
hierarchies. Psychometrika, 41, 190–204.
3. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete
data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, B, 39,
1–38.
4. Eshima, N., & Tabata, M. (2010). Entropy coefficient of determination for generalized linear
models. Computational Statistics and Data Analysis, 54, 1381–1389.
5. Eshima, N., Tabata, M., & Borroni, C. G. (2018). An entropy-based approach for measuring
factor contributions in factor analysis models. Entropy, 20, 634.
6. Eshima, N. (2020). Statistical data analysis and entropy. Springer Nature.
7. Eshima, N., Borroni, C. G., Tabata, M., & Kurosawa, T. (2021). An entropy-based tool to help
the interpretation of common-factor spaces in factor analysis. Entropy, 23, 140. https://doi.org/
10.3390/e23020140-24
8. Everitt, B. S. (1984). An introduction to latent variable models. Chapman & Hall.
9. Gibson, W. A. (1959). Three multivariate models: Factor analysis, latent structure analysis,
and latent profile analysis, Psychometrika 24, 229–252.
10. Lazarsfeld, P. F., & Henry, N. M. (1968). Latent structure analysis. Houghton Mifflin.
11. Lazarsfeld, P. F. (1959). Latent structure analysis, psychology: A study of a science, Koch, S.
ed. McGrowHill: New York.
12. Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis.
In Soufer, S. A., Guttman, L., & others (Eds.), Measurement and prediction: Studies in social
psychology I World War II (Vol. 4). Prenceton University Press.
13. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.
14. Lord, F. M. (1952). A theory of test scores (Psychometric Monograph, No. 7), Richmond VA:
Psychometric Corporation.
15. Magidson, J., & Vermunt, J. K. (2001). Latent class factor and cluster models: Bi-plots, and
related graphical displays. Sociological Methodology, 31, 223–264.
16. McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London.
17. Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear model. Journal of the Royal
Statistical Society A, 135, 370–384.
18. Rasch, G. (1960). Probabilistic model for some intelligence and attainment tests. Danish
Institute for Educational Research.
19. Samejima, F. (1973). A method of estimating item characteristic functions using the maximum
likelihood estimate od ability. Psychometrika, 38, 163–191.
20. Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimen-
sional latent space. Psychometrika, 39, 111–121.
21. Spearman, S. (1904). “General-intelligence”, objectively determined and measured. American
Journal of Psychology, 15, 201–293.
22. Thurstone, L. L. (1935). Vector of mind: Multiple factor analysis for the isolation of primary
traits. Chicago, IL, USA: The University of Chicago Press.
Chapter 2
Latent Class Cluster Analysis
2.1 Introduction
In behavioral sciences, there are many cases where we can assume that human behav-
iors and responses depend on latent concepts, which are not directly observed. In such
cases, it is significant to elucidate the latent factors to affect and cause human behav-
iors and responses. For this objective, latent class analysis was proposed by Lazars-
feld [11] to explore discrete (categorical) latent factors that explain the relationships
among responses to items under studies. The responses to the items are treated by
manifest variables and the factors by latent variables. By use of models with manifest
and latent variables, it is possible to analyze the phenomena concerned. A general
latent class model is expressed with (1.2) and (1.3), and for binary manifest variables
the model is expressed by (1.4). These equations are called accounting equations.
The parameters in models (1.2) and (1.4) are manifest probabilities P(X = x) and
latent probabilities va , P(X i = xi |a), and πai . Although the manifest probabilities
can be estimated directly by the relative frequencies of responses X = x as consistent
estimates, the latent probabilities cannot be estimated easily. In the early stages of
the development of latent class analysis, the efforts for the studies were concentrated
on the parameter estimation by solving the accounting equations, for example, Green
[10], Anderson [1], Gibson [6, 7], Madansky [13], and so on; however, these studies
are now only in the study history. As increasing computer efficiency, methods for
the ML estimation were widely applied; however, it was critical to obtain proper
estimates of the latent probabilities, that is, the estimates have to be between 0 and 1.
The usual ML estimation methods often derived the improper solutions in real data
analyses, in which improper solutions imply the latent probability estimates that are
outside of interval [0, 1]. To overcome the problem, two ways for the ML estimation
were proposed. One is a proportional fitting method by Goodman [8, 9], which is
included in the EM algorithm for the ML estimation [3]. The second is a method in
which a parameter transformation is employed to deal with the ML estimation by a
direct use of the Newton–Raphson algorithm [5]. Although the convergence rate of
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 17
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_2
18 2 Latent Class Cluster Analysis
Goodman’s method is slow, the method is simple and flexible to apply in real data
analysis. In this sense, Goodman’s contribution to the development of latent class
analysis is great.
This chapter consists of seven sections including this section. In Sect. 2.2,
Goodman’s method for the ML estimation of the latent class model is derived from the
EM algorithm and the property of the algorithm is discussed. Section 2.3 applies the
ML estimation algorithm to practical data analyses. In Sect. 2.4, in order to measure
the goodness-of-fit of latent class models, the entropy coefficient of determination [4]
is used. Section 2.5 considers two methods for comparing latent classes are discussed
and the methods are illustrated by using numerical examples. In Sect. 2.6, a method
for the ML estimation of the latent profile model is constructed according to the EM
algorithm, and a numerical example is also given to demonstrate the method. Finally,
Sect. 2.7 provides a discussion on the latent class analysis presented in this chapter
and a perspective of the analysis leading to further studies in the future.
In latent class model (1.2) with constraints (1.3), the complete data would be
obtained as responses to I items X = (X 1 , X 2 , . . . , X I )T in latent classes a.
Let n(x1 , x2 , . . . , x I ) and n(x1 , x2 , . . . , x I , a) be the numbers of observations with
response x = (x1 , x2 , . . . , x I )T and those in latent class a, respectively, and let
φ = ((va ), (P(X i = xi |a))) be the parameter row vector. Concerning numbers of
observations n(x) and n(x, a), it follows that
A
n(x) = n(x, a).
α=1
A
I
P(X = x) = va P(X i = xi |a), (2.1)
α=1 i=1
the log likelihood function of parameter vector φ based on incomplete data (n(x))
is given by
A
I
l(φ|(n(x))) = n(x)log va P(X i = xi |a) , (2.2)
x α=1 i=1
2.2 The ML Estimation of Parameters in the Latent Class Model 19
where the summation in the above formula x is made over all response patterns
x = (x1 , x2 , . . . , x I ). Since the direct maximization of log likelihood function (2.2)
with respect to φ is very complicated, the EM algorithm is employed. Given that
complete data, statistics t(x, a) = (n(x, a)) in (1.33) are obtained as a sufficient
statistic vector for parameters φ, and we have the log likelihood function of parameter
vector φ as follows:
A
I
l(φ|(n(x, a))) = n(x, a)log va P(X i = xi |a)
α=1 x i=1
A
I
= n(x, a) logva + logP(X i = xi |a) . (2.3)
α=1 x i=1
In this sense, sufficient statistic vector t(x, a) = (n(x, a)) is viewed as the
complete data. Let s φ = ((s va ), (s P(X i = xi |a))) be the estimate of φ at the sth
iteration in the EM algorithm. Then, from (1.34) and (1.35), the E- and M-steps are
formulated as follows.
The EM algorithm for model (1.2) with constraints (1.3).
(i) E-step
Let
s+1
s+1
t(x, a) = n(x, a)
be the conditional expectation of the sufficient statistic for the given incomplete
(observed) data x and parameters s φ. From (1.34), we have
I
s
va s
P(X i = xi |a)
s+1
n(x, a) = n(x) A i=1
I . (2.4)
b=1 vb = xi |b)
s s P(X
i=1 i
s+1
From the above results, we can get s+1 t(x, a) = n(x, a) .
(ii) M-step
Let be the sample space of the manifest variable vector X; let (X i = xi ) be the
sample subspaces of X for given X i = xi , i = 1, 2, . . . , I ; let i be the sample
space of manifest variables X i , i = 1, 2, . . . , I ; and let N = x n(x). From (1.35),
we have
I
s+1
n(x, a) = N va P(X i = xi |a), x ∈ , a = 1, 2, . . . , A. (2.5)
i=1
I
P(X i = xi |a) = 1, a = 1, 2, . . . , A,
x∈ i=1
and
I
P X j = x j |a = P(X i = xi |a), i = 1, 2, . . . , I ; a = 1, 2, . . . , A,
x∈(X i =xi ) j=1
where x∈ is the summation over all response patterns X = x, and x∈(X i =xi ) , the
summation over all response patterns X = x for given X i = xi . Solving equations in
(2.5), with respect to parameters va , P(X i = xi |a), i = 1, 2, . . . , I , under constraints
(1.3), it follows that
⎧
⎪
⎪
s+1
⎪
⎪
s+1
va = N1 n(x, a), a = 1, 2, . . . , A;
⎪
⎨ x∈
1
s+1
P(X i = xi |a) = s+1
n(x, a), (2.6)
⎪
⎪ · v
⎪
⎪ N s+1
a
⎪
⎩
x∈(X i =xi )
xi ∈ i , i = 1, 2, . . . , I ; a = 1, 2, . . . , A.
[8, 9], and the ML estimates of the parameters v a and P (X i = xi |a) can be obtained
as the convergence values of the above estimates, ∞ va and ∞ P(X i = xi |a). From
(2.6), it is seen that for any integer s,
0 ≤ s va ≤ 1, a = 1, 2, . . . , A;
0 ≤ P(X i = xi |a) ≤ 1, xi ∈ i , i = 1, 2, . . . , I ; a = 1, 2, . . . , A.
s
Thus, if the above algorithm converges, the estimates are proper for the likelihood
function, and satisfy the following equations:
∂
∂va
l(φ|(n(x))) = 0, a = 1, 2, . . . , A;
∂
∂ P(X i =xi |a)
l(φ|(n(x))) = 0, xi ∈ i , i = 1, 2, . . . , I ; a = 1, 2, . . . , A.
For observed data set {n(x)}, the goodness-of-fit test of a latent class model to
the data can be carried out with the following log likelihood ratio test statistic:
A
n(x) I
G =2
2
n(x) log − log va P (X i = xi |a) . (2.7)
x
N α=1 i=1
minus latent parameters va and P(X i = xi |a), that is, from (1.5) we have
I
I
I
I
K i − 1 − (A − 1) − A (K i − 1) = Ki − A K i − (I − 1) . (2.8)
i=1 i=1 i=1 i=1
After estimating a latent class model under study, the interpretation of latent
classes
is made by considering the sets of the estimated latent response probabilities
P (X i = xi |a), i = 1, 2, . . . , I , a = 1, 2, . . . , A. In addition to the interpretation,
it is significant to assess the manifest response vectors x = (x1 , x2 , . . . , x I ) with
respect to latent classes, that is, to assign individuals with the manifest responses to
the extracted latent classes. The best way to assign them to the latent classes is made
with the maximum posterior probabilities, that is, if
I
va P(X i = xi |a)
P(a0 |(x1 , x2 , . . . , x I )) = max A i=1
I , (2.9)
b=1 vb i=1 P(X i = x i |b)
a
α=1 x i=1
A
I
= E n(x, a)|x, φ logva +
s
logP(X i = xi |a)
α=1 x i=1
A
s+1
I
= n(x, a) logva + logP(X i = xi |a) ,
α=1 x i=1
A
A
I
L φ|s φ = Q φ|s φ − λ va − μi P(X i = xi |a)
a=1 a=1 i=1 xi
22 2 Latent Class Cluster Analysis
A
s+1
I
A
= n(x, a) logva + logP(X i = xi |a) − λ va
α=1 x i=1 a=1
A
I
− μai P(X i = xi |a).
a=1 i=1 xi
∂ s ∂ 1 s+1
L φ| φ = Q φ|s φ − λ = n(x, a) − λ = 0, a = 1, 2, . . . , A.
∂va ∂va va x
A
λ= s+1
n(x, a) = N .
a=1 x
i = 1, 2, . . . , I ; a = 1, 2, . . . , A.
Ki
μai = s+1
n(x, a) = s+1
n(x, a), i = 1, 2, . . . , I ; a = 1, 2, . . . , A.
xi =1 x∈(X i =xi ) x
2.2 The ML Estimation of Parameters in the Latent Class Model 23
2.3 Examples
Table 2.1 shows the data from respondents to questionnaire items on role conflict
[20], and the respondents are cross-classified with respect to whether they tend toward
universalistic values “1” or particularistic values “0” when confronted by each of four
different situations of role conflict [18]. Assuming A latent classes in the population,
latent class model (2.1) is applied. Let X i , i = 1, 2, 3, 4 be the responses to the four
situations. According to the condition of model identification, the formula in (2.8)
have to be positive, so we have
From the above inequality, the number of latent classes has to be less than and equal
to 16
5
. Assuming three latent classes, with which the latent class model is denoted by
M(3), the EM algorithm with (2.4) and (2.6) is carried out. The ML estimates of the
parameters are illustrated in Table 2.2, and the following inequalities hold:
From the above results, the extracted latent classes 1, 2, and 3 in Table 2.2 can be
interpreted as ordered latent classes, “low”, “medium”, and “high”, in the universal-
istic attitude in the role of conflict. The latent class model with three latent classes for
Table 2.2 The estimates of the parameters for a latent class model with three latent classes
(Stouffer-Toby data in Table 2.1)
Latent class Proportion Latent positive item response probability
X1 X2 X3 X4
1 0.220 0.005 0.032 0.024 0.137
2 0.672 0.194 0.573 0.593 0.830
3 0.108 0.715 1.000 0.759 0.943
a The log likelihood ratio test statistic (2.6) is calculated as G 2 (3) = 0.387(d f = 1, P = 0.534)
Table 2.3 The estimates of the parameters for a latent class model with two latent classes (Stouffer-
Toby data in Table 2.1)
Latent class Class proportion Latent positive item response probability
X1 X2 X3 X4
1 0.279 0.007 0.060 0.073 0.231
2 0.721 0.286 0.670 0.646 0.868
a G 2 (2) = 2.720(d f = 6, P = 0.843)
four binary items has only one degree of freedom left for the test of goodness-of-fit
to the data, so a latent class model with two latent classes M(2) is estimated and the
results are shown in Table 2.3. In this case, two ordered latent classes, “low” and
“high” in the universalistic attitude, are extracted. The goodness-of-fit of both models
to the data is good. In order to compare the two models, the relative goodness-of-fit
of M(2) to M(3) can be assessed by
From this, M(2) is better than M(3) to explain the present response behavior.
Stouffer and Toby [20] observed the data in Table 2.1 to order the respondents in
a latent continuum with respect to the relative priority of personal and impersonal
considerations in social obligations. In this sense, it is significant to have obtained
the ordered latent classes in the present latent class analysis. According to posterior
probabilities (2.9), the assessment results of respondents with manifest responses
x = (x1 , x2 , x3 , x4 )T for M(2) and M(3) are demonstrated in Table 2.4. Both results
are almost the same. As shown in this data analysis, we can assess the respondents
response patterns x = (x1 , x2 , x3 , x4 ), not simple total of the responses to
with their
4
test items i=1 xi (Table 2.4).
Table 2.5 illustrates test data on creative ability in machine design [15]. Engi-
neers are cross-classified with respect to their dichotomized scores, that is, above
the subtest mean (1) or below (0), obtained on each of four subtests that measured
creative abilities in machine design [18]. If we can assume a one-dimensional latent
continuum with respect to the creative ability, it may be reasonable to expect to
2.3 Examples 25
Table 2.4 Assignment of the manifest responses to the extracted latent classes (Data in Table 2.1)
Response M(2) latent M(3) latent Response M(2) latent M(3) latent
pattern class class pattern class class
0000 1 1 0001 1 2
1000 2 2 1001 2 2
0100 2 2 0101 2 2
1100 2 2 1101 2 2
0010 1 2 0011 2 2
1010 2 2 1011 2 2
0110 2 2 0111 2 2
1110 2 2 1111 2 3
derive ordered latent classes in latent class analysis as in the analysis of Stouffer-
Toby’s data. First, for three latent classes, we have the results of latent class analysis
shown in Table 2.6. The goodness-of-fit of the model to the data set is bad, since
we get G 2 (3) = 4.708(d f = 1, P = 0.030). Similarly, for a latent class model with
two latent classes, the goodness-of-fit of the model to the data set is also bad, that
is, G 2 (2) = 25.203(d f = 6, P = 0.000). From the results, it is not appropriate to
apply the latent class cluster analysis to the data set. For each of the four different
Table 2.6 The estimates of the parameters for a latent class model with three latent classes (data
in Table 2.5)
Latent class Class proportion Latent positive item response probability
X1 X2 X3 X4
1 0.198 0.239 0.000 0.808 0.803
2 0.398 0.324 0.360 0.089 0.111
3 0.404 0.810 1.000 0.926 0.810
a G 2 (3) = 4.708(d f = 1, P = 0.030)
26 2 Latent Class Cluster Analysis
subtests, it may be needed to assume a particular skill to obtain scores above the
mean, where the four skills cannot be ordered with respect to difficulty for obtaining
them. Assuming the particular skills for solving the subtests, a confirmatory latent
class analysis of the data is carried out in Chap. 4.
The third data (Table 2.7) were obtained from noncommissioned officers to
items on attitude toward the Army [18]. The respondents were cross-classified with
respect to their dichotomous responses, which were made according to dichotomized
responses “1” as “favorable” and “0” “unfavorable” toward the Army for each of the
four different items on general attitude toward the Army. If there exists a latent
continuum with respect to the attitude, we can assume ordered latent classes as in
the first data (Table 2.1). The estimated latent class models with three and two latent
classes are given in Tables 2.8 and 2.9, respectively. As shown in the results of the
test of the goodness-of-fit of the models, the degrees of the models are fair. As shown
in the tables, the estimated latent classes can be ordered, because, for example, in
Table 2.8, the following inequalities hold:
Hence, the extracted latent classes 1–3 can be interpreted as “low”, “medium”,
and “high” groups in favorable attitude toward the Army, respectively. Comparing
Table 2.8 The estimates of the parameters for a latent class model with three latent classes
(Lazarsfeld-Stouffer’s data in Table 2.7)
Latent class Class proportion Latent positive item response probability
X1 X2 X3 X4
1 0.260 0.000 0.296 0.386 0.406
2 0.427 0.374 0.641 0.672 0.768
3 0.313 0.637 0.880 1.000 1.000
a G 2 (3) = 1.787(d f = 1, P = 0.181)
2.3 Examples 27
Table 2.9 The estimates of the parameters for a latent class model with two latent classes
(Lazarsfeld-Stouffer’s data in Table 2.7)
Latent class Class proportion Latent positive item response probability
X1 X2 X3 X4
1 0.445 0.093 0.386 0.442 0.499
2 0.555 0.572 0.818 0.906 0.944
a G 2 (2) = 8.523(d f = 6, P = 0.202)
model M(2) is better than M(3). The data are treated in Chapter 3 again, assuming
ordered latent classes.
Comparing Tables 2.2, 2.6, and 2.8, each of Tables 2.2 and 2.8 shows three ordered
latent classes; however, the estimated latent classes in Table 2.6 are not consistently
ordered with respect to positive response probabilities for four test items. It can be
thought the universalistic attitude in the role conflict in Stouffer-Toby’s data and
the favorable attitude toward the Army are one-dimensional, but in machine design,
latent classes may not be assessed one-dimensionally.
I
I
KL(X, ξ ) = KL(X i , ξ ) = Cov(θi , X i ). (2.12)
i=1 i=1
A
E(X i ) = va πai = πi̇ ,
α=1
28 2 Latent Class Cluster Analysis
A
A
πai
E(θi ) = va θai = va log ,
α=1 α=1
1 − πai
and we have
A
πai
KL(X i , ξ ) = Cov(θi , X i ) = va πai − πi̇ log , i = 1, 2, . . . , I. (2.13)
a=1
1 − πai
In latent class model (2.1), the latent classes are interpreted with the latent response
probabilities to items, {P(X i = xi |a), i = 1, 2, . . . , I }, α = 1, 2, . . . , A. When
there are two latent classes in a population, we can always say one latent class
is higher than the other one in a concept or latent trait. However, where the number
of latent classes is greater than two, we cannot easily assess and compare the latent
classes without latent concepts or traits, and so it is meaningful to make methods for
2.5 Comparison of Latent Classes 29
comparing latent classes in latent class cluster analysis. In this section, first, a tech-
nique similar to canonical analysis is employed to make a latent space to compare
the latent classes, that is, to construct a latent space for locating the latent classes.
We discuss the case where the manifest variables X i are binary. Let
and
A
πi = P(X i = 1) = va πai .
a=1
I
T = {ci0 (1 − X i ) + ci1 X i }, (2.14)
i=1
where ci0 and ci1 are the weights for responses X i = 0 and X i = 1, respectively. Let
Z i = ci0 (1 − X i ) + ci1 X i , i = 1, 2, . . . , I.
and
I
Cov Z i , Z j = (ci0 − ci1 ) c j0 − c j1 va (πai − πi ) πa j − π j , i = j. (2.16)
i=1
According to the above formulae, ci0 and ci1 are not identifiable, so we set ci0 = 0,
ci1 = ci , and Z i = ci X i , i = 1, 2, . . . , I . Then, the above formulae are rewritten as
A
A
Var(Z i ) = ci 2 πi (1 − πi ) = ci 2 va πai (1 − πai ) + ci 2 va (πai − πi )2 , i = 1, 2, . . . , I,
a=1 a=1
and
A
Cov Z i , Z j = ci c j va (πai − πi ) πa j − π j , i = j.
a=1
30 2 Latent Class Cluster Analysis
Let
A
σi j B = va (πai − πi ) πa j − π j
a=1
and
A
va πai (1 − πai ), i = j;
σi j W = a=1
0, i = j.
The first term of the right-hand side of the above equation represents the between-
class variance of T and the second term the within-class variance. Let
and
V B (T ) V B (T )
→ max . (2.20)
VW (T ) c VW (T )
V B (T ) → max V B (T ). (2.22)
c
2.5 Comparison of Latent Classes 31
V B (T )
VW (T )
where λ is the Lagrange multiplier. Differentiating the above function with respect
to vector c = (c1 , c2 , . . . , c I ), we have
∂g(c)
= 2 B (c1 , c2 , . . . , c I )T − 2λ W (c1 , c2 , . . . , c I )T = 0.
∂c
If W is non-singular, it follows that
−1/2 −1/2 −1/2
W B W − λE W (c1 , c2 , . . . , c I )T = 0,
Tk = (X 1 , X 2 , . . . , X I )ck , k = 1, 2, . . . , K ,
we have
−1/2 −1/2
V B (Tk ) = ck B ckT = ξ kT W B W ξ k = λk , k = 1, 2, . . . , K . (2.24)
−1/2 −1/2
From (2.24), it is seen T1 is the solution of (2.22). Since matrix W B W
is symmetric, eigenvectors ξ k are orthogonal with respect to the inner product, and
thus, it follows that
32 2 Latent Class Cluster Analysis
−1/2 −1/2
ck B clT = ξ kT W B W ξ l = 0, k = l.
Table 2.13 Score functions of the latent class model in Table 2.13
Score function Eigenvalue Weight
X1 X2 X3 X4
T1 0.900 2.508 −0.151 0.468 −0.975
T2 0.530 −0.245 −0.741 0.2302 0.615
1.7
Class 3
1.2
Class 1
0.7
0.2
-0.8
The above Euclid distances between latent classes make a tree graph shown in
Fig. 2.2.
Second, a method for comparing latent classes based on entropy is considered.
For simplicity of the discussion, we discuss an entropy-based method for comparing
latent classes in cases where the manifest variables X i are binary. Let
2.5
1.5
0.5
0
class 2 class 3 class 1
Fig. 2.2 The tree graph of latent classes in Table 2.12 based on the Euclidian distance
34 2 Latent Class Cluster Analysis
K
pk K
qk
D( p||q) = pk log , D(q|| p) = qk log , (2.26)
k=1
qk k=1
pk
where
K
K
pk = qk = 1.
k=1 k=1
K
D ∗ ( p||q) = ( pk − qk )(log pk − logqk )l.
k=1
I
P(X = x|a) = P(X i = xi |a). (2.28)
i=1
I
∗
D ( p(X|a)|| p(X|b)) = D ∗ ( p(X i |a)|| p(X i |b))
i=1
I
= {D( p(X i |a)|| p(X i |b)) + D( p(X i |b)|| p(X i |a))}.
i=1
(2.30)
example, {yes, no}, {positive, negative}, {success, failure}, and so on. Let πai be
the positive responses of manifest variables X i , i = 1, 2, . . . , I in latent classes
a = 1, 2, . . . , A. Then, (2.30) becomes
⎧ ⎫
⎪ π log πai + (1 − π )log 1 − πai + π log πbi ⎪
I ⎪ ⎪
⎨ ai πbi
ai
1 − πbi
bi
πai ⎬
D ∗ (P(X = x|a)||P(X = x|b)) = .
⎪ 1 − πbi ⎪
i=1⎪
⎩ + (1 − πbi )log ⎪
⎭
1 − πai
(2.31)
Applying the above results to Table 2.12, Table 2.15 illustrates the KL distances
between latent classes for manifest variables X i . From this table, we have
⎧ ∗ 4
⎨ D (P(X = x|1)||P(X = x|2)) = i=1 D ∗ ( p(X i |1)|| p(X i |2)) = 3.94
D ∗ (P(X = x|2)||P(X = x|3)) = 2.34,
⎩
D ∗ (P(X = x|1)||P(X = x|3)) = 5.04.
(2.32)
Based on the above measures, cluster analysis is used to compare the latent class
model. Latent classes 2 and 3 are first combined, and the distance between {class 2,
class 3} and class 1 is calculated by
min D ∗ (P(X = x|1)||P(X = x|2)), D ∗ (P(X = x|1)||P(X = x|3))
= D ∗ (P(X = x|1)||P(X = x|2)) = 3.94.
From this, we have a tree graph shown in Fig. 2.3, and the result is similar to that
in Fig. 2.2. As demonstrated above, the entropy-based method for comparing latent
classes can be easily employed in data analyses.
Remark 2.3 From (2.31), we have
I
πai πbi
D ∗ (P(X = x|a)||P(X = x|b)) = (πai − πbi ) log − log .
i=1
1 − πai 1 − πbi
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
class 2 class 3 class 1
Fig. 2.3 The tree graph of latent classes in Table 2.12 based on KL information (2.32)
The interpretation of the above quantity can be given as that similar to ECD (1.30),
i.e., the ratio of the variation of the two distributions in entropy.
The latent profile model has been introduced in Sect. 1.4. In this section, an ML
estimation procedure via the EM algorithm is constructed. Let X i , i = 1, 2, . . . , I
class a =
be manifest continuous variables; let X ai be the latent variables in latent
1, 2, . . . , A, which are distributed according to normal distributions N μai , ψi2 ; and
let Z a be the latent binary variables that take 1 for an individual in latent class a and
0 otherwise. Then, the model is expressed as
A
Xi = Z a X ai , i = 1, 2, . . . , I. (2.33)
a=1
where
A
xi j = Z a j X ai j , i = 1, 2, . . . , I ; j = 1, 2, . . . , n. (2.35)
a=1
Let φ = va , μai , ψi2 be the parameter vector, and s φ = s va , s μai , s ψi2 be the
estimated parameters in the s th step in the EM algorithm. In order to construct the
EM algorithm, the joint density function f (x|a) (1.16) is expressed by
I
1 (xi − μai )2
f x| μai , ψi ≡ f (x|a) =
2
exp − . (2.36)
i=1 2π ψi2 2ψi2
From (1.16), the log likelihood function of φ based on the complete data D is
given by
2
A
n
I
1
A
n
I
Z a j xi j − μai
logl(φ|D) = Z a j logva + n log −
a=1 j=1 i=1 2π ψi2 a=1 j=1 i=1
2ψi2
2
A
n
n
I A n I
Z a j xi j − μai nI
= Z a j logva − logψi −
2
2
− log2π.
a=1 j=1
2 i=1 a=1 j=1 i=1
2ψ i 2
(2.37)
Let x j = x1 j , x2 j , . . . , x I j , j = 1, 2, . . . , n be the observed vectors for
individuals j. Then, the EM algorithm is given as follows:
(i) E-step
For estimate s φ at the p th step, compute the conditional expectation of log f (x, ξ |φ)
given the incomplete data X = x and parameter s φ:
A
n
Q φ|s φ = E logl(φ|D)|x, s φ = E Z a j |x j , s φ logva
a=1 j=1
2
n
I A n I
E Z a j |x j , s φ xi j − μai nI
− logψi2 − 2
− log2π, (2.38)
2 2ψi 2
i=1 a=1 j=1 i=1
where
va f x j | s μai , s ψi2
s
E Z a j |x j , φ = A
s
. (2.39)
a=1 va f x j | μai , ψi
s s s 2
(ii) M-step
38 2 Latent Class Cluster Analysis
s s
A
L φ| φ = Q φ| φ − λ va . (2.40)
a=1
∂ s 1
n
L φ| φ = E Z a j |x, s φ − λ = 0, a = 1, 2, . . . , A.
∂va va j=1
A
n
λ= E Z a j |x j , s φ = n.
a=1 j=1
1
n
s+1
va = E Z a j |x j , s φ . (2.41)
n j=1
Similarly, the partial differentiation of the Lagrange function with respect to ψi2
gives
2
A n E Z a j |x j , s φ xi j − s+1 μai
∂ s n
L φ| φ = − 2 + = 0, i = 1, 2, . . . , I.
∂ψi2 2ψi a=1 j=1
2ψi4
By using the above algorithm, the ML estimates of parameters in the latent profile
model can be obtained. In some situations, we may relax the local independence of
manifest variables, for example, correlations between some variables are assumed;
however, overviewing the above process for constructing the EM algorithm, the
modification is easy to make via a similar manner. In order to demonstrate the above
algorithm, an artificial data set and the estimated parameters are given in Tables 2.16
and 2.17, respectively. The artificial data can be produced as a mixture of N(μ1 , )
and N(μ2 , ), where
⎛ ⎞
ψ12 0 ··· 0
⎜ 0 ψ22 ··· 0 ⎟
⎜ ⎟
=⎜ . .. .. .. ⎟.
⎝ .. . . . ⎠
00 · · · ψ10
2
Remark 2.4 If in the latent profile model, the error variances of manifest variables
X i in latent classes a are different, that is, Var(X i |a) = ψai2 , i = 1, 2, . . . , I , then,
the estimates in the EM algorithm are modified as follows:
n 2
j=1 Z a j |x j , s φ xi j − s+1 μai
s+1
ψai2 = n , i = 1, 2, . . . , I ; a = 1, 2, . . . , A.
j=1 E Z a j |x j , φ
s
I
I
1 (xi − μi )2
f (x|a) = f i (xi |a) = exp − ,
i=1 i=1 2π ψ 2 2ψi2
i
where f i (xi |a) are the conditional density functions in latent class a. As in the factor
analysis model, we have
1 xi μi − 21 μi2 xi2
f i (xi |a) = exp + − 2 ,
2π ψi2 ψi2 2ψi
40 2 Latent Class Cluster Analysis
Table 2.17 The estimated parameters in a latent profile model with two latent classes
X1 X2 X3 X4 X5 X6 X7 X8 X9 X 10
μ1i 31.26 27.21 30.59 28.08 31.14 32.08 33.74 29.25 30.25 22.61 v1 0.140
μ2i 38.11 39.90 40.42 40.74 39.53 39.91 39.03 40.15 38.69 42.04 v2 0.860
ψi2 48.11 36.51 8.81 64.74 26.79 34.01 65.52 11.77 46.33 81.96 – –
so we can set
x2
1 2
θi = μi , ai (ϕi ) = ψi2 , bi (θi ) = μi , ci (xi , ω) = − i 2 − log 2π ψi2 , i = 1, 2, . . . , I.
2 2ψi
A
θi = μi = μai Z a , i = 1, 2, . . . , I.
a=1
I
1
A
I
1
A
KL(X, Z) = tr −1 Cov(θ, X) = Cov(θi , X i ) = Cov(μai Z a , X i ). (2.44)
ψ2
i=1 i a=1 i=1
ψi2 a=1
42 2 Latent Class Cluster Analysis
Since
A
E(X i ) = va μai ,
a=1
we have
A
A
Cov(μai Z a , X i ) = va μai
2
− μi2 ,
a=1 a=1
where
A
μi = va μai , i = 1, 2, . . . , I.
a=1
KL(X, Z)
ECD(X, Z) = . (2.45)
KL(X, Z) + 1
KL(X i , Z)
ECD(X i , Z) = , (2.46)
KL(X i , Z) + 1
where
1
A
KL(X i , Z) = Cov(μai Z a , X i ).
ψi2 a=1
The information is that of X which the latent variable has. As in (2.9), the best
way to classify observed data X = x into the latent classes is based on the maximum
posterior probability of Z, that is, for
I
va f i (xi |a)
P(a0 |(x1 , x2 , . . . , x I )) = max A i=1
I , (2.47)
a
b=1 vb i=1 f i (xi |b)
Table 2.18 Assessment of the latent profile model with two latent classes in Table 2.18
X1 X2 X3 X4 X5 X6 X7 X8 X9 X 10 X
KL 0.12 0.54 1.36 0.30 0.33 0.23 0.06 1.24 0.19 0.56 4.92
ECD 0.11 0.35 0.58 0.23 0.25 0.18 0.05 0.55 0.16 0.36 0.83
2.7 Discussion
In this chapter, first, a general latent class analysis is discussed and for the ML
estimation of the latent class model, the EM algorithm is constructed. For three
data sets, latent class analysis has been demonstrated. Concerning the χ 2 -test of
the goodness-of-fit of the latent class model, the model fits the Stouffer-Toby and
Lazarsfeld-Stouffer’s data sets; however, we cannot have a good fit to McHugh’s
data set. Since the estimated latent class models for Stouffer-Toby’s and Lazarsfeld-
Stouffer’s data sets have latent classes ordered as shown in Tables 2.2, 2.3, 2.8, and
2.9, it is meaningful to discuss latent class analysis assuming ordered latent classes,
for example, the latent distance model. The basic latent class model treats latent
classes parallelly, that is, without any assumption on latent response probabilities, and
then, the analysis is called an exploratory latent class analysis or latent class cluster
analysis [14, 21]. The number of latent classes in the latent class model is restricted
by inequality (1.5), so to handle more ordered latent classes, it is needed to make
parsimonious models as another approach. In order to assess the model performance,
the explanatory power or goodness-of-fit of the model can be measured with ECD
as demonstrated in Sect. 2.4. In the interpretation of latent classes, a method for
locating the latent classes in a Euclidian space is given and the method is illustrated.
An entropy-based method to compare the latent classes is also presented, and the
method measures, in a sense, the KL distances between the latent classes, and the
relationship among the latent classes is illustrated with cluster analysis. In Sect. 2.6,
the latent profile model is considered, and the ML estimation procedure via the
EM algorithm is constructed. A numerical illustration is given to demonstrate the
latent profile analysis. These days, computer efficiency has been greatly increased,
so the ML estimation procedures given in the present chapter can be realized in the
EXCEL work files. The author recommends readers to make the calculations for
the ML estimation of the latent class models for themselves. The present chapter
has treated the basic latent class model, that is, an exploratory approach to latent
class analysis. There may exist further studies to develop latent structure analysis,
for example, making latent class model with ordered latent classes and extending
the latent distance model (Lazarsfeld and Henry, 1968) and latent class models with
explanatory variables [2]. Latent class analysis has also been applied in medical
research [16, 17], besides in psychological and social science research [22]. In order
to challenge confirmatory latent class approaches, it is important to extend research
areas to apply the latent class model, and due to that, new latent structure models will
be constructed to make effective and significant methods of latent structure analysis.
References 45
References
3.1 Introduction
In latent class analysis with two latent classes, we can assume one class is higher
than the other in a sense; however, for more than two latent classes, we cannot
necessarily order them in one-dimensional sense. In such cases, to compare and
interpret the latent classes, the dimensions of latent spaces to locate them are the
number of latent classes minus one. In Chap. 2, a method for locating latent classes
in a latent space has been considered, and the distances between latent classes are
measured with the Euclidian distance in the latent space and then, cluster analysis is
applied to compare the latent classes. Moreover, the Kullback–Leibler information
(divergence) is also applied to measure the distances between the latent classes. Let
πai , i = 1, 2, . . . , I ; a = 1, 2, . . . , A be the positive response probabilities to binary
item X i in latent class a. If
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 47
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_3
48 3 Latent Class Analysis with Ordered Latent Classes
In the Guttman scaling, test items are ordered in the response difficulty and the
purpose of the scaling is to evaluate the subjects under study in one-dimensional scale
(trait or ability). Let X i , i = 1, 2, . . . , I be responses to test items i, and let us set
X i = 1 for positive responses and X i = 0 for negative ones. If X i = 1, then, X i−1 =
1, i = 2, 3, . . . , I in the Guttman scaling, and thus, in a strict sense, there would
be I + 1 response patterns, (0, 0, . . . , 0), (1, 0, . . . , 0), . . . , (1, 1, . . . , 1), which we
could observe; however, in the real observation, the other response patterns will also
occur due to two kinds of response errors. One is the intrusion error and the other
is the omission error. Hence, the Guttman scaling patterns can be regarded as skills
with which the subjects can solve or respond successfully, for example, suppose that
the ability of calculation in the arithmetic is measured by the following three items:
then, the items are ordered in difficulty as the above order, and the response patterns to
be observed would be (0, 0, 0), (1, 0, 0), (1, 1, 0), (1, 1, 1). However, it is sensible
to take the response errors into account. Let Si be skills for manifest responses
X i , i = 1, 2, . . . , I ; let Si = 1 be states of skill acquisitions for solving items i;
Si = 0 be those of non-skill acquisitions for items i and let us set.
3.2 Latent Distance Analysis 49
π Li (si = 0)
P(X i = 1|Si = si ) = , i = 1, 2, . . . , I. (3.2)
π H i (si = 1)
Then, the intrusion error probabilities are π Li and the omission error probabilities
1 − π H i , and the following inequalities should hold:
The latent classes and the positive response probabilities π Li and π H i are illus-
trated in Table 3.1, where latent classes are denoted with the numbers of skill acqui-
I
sitions, i=1 si . In this sense, the latent classes are ordered in a hypothesized ability
or trait, and it is significant for an individual with a response x = (x1 , x2 , . . . , x I )T
to assign to one of the latent classes, that is, an assessment of the individual’s ability.
In the latent distance model (3.2), the number of parameters is 3I .
Remark 3.1 Term “skill” is used in the above explanation. Since it can be thought
that skills represent thresholds in a continuous trait or ability to respond for test
binary items, term “skill” is employed for convenience’s sake in this book.
Overviewing the latent distance modes historically, the models are restricted
versions of (2.2), considering the easiness of the parameter estimation. Proctor [14]
proposed a model with
π L (si = 0)
P(X i = 1|Si = si ) = , i = 1, 2, . . . , I. (3.4)
1 − π L (si = 1)
The intrusion and omission error probabilities are the same as π L , constant through
all items. Dayton and Macready [2] used the model with
π L (si = 0)
P(X i = 1|Si = si ) = , i = 1, 2, . . . , I, (3.5)
π H (si = 1)
and the following improvement version of the above model was also proposed by
Dayton and Macready [3]:
In the above model, the intrusion error and omission error probabilities are the
same as π Li for items i = 1, 2, . . . , I . The present model (3.2) with (3.3) is a general
version of the above models.
In the present chapter, the ML estimation of model (3.2) with (3.3) is considered.
For this model, the following reparameterization is employed [6]:
exp(αi )
π
1+exp(αi ) (= Li ) (si = 0)
P(X i = 1|Si = si ) = exp(αi +exp(βi )) , i = 1, 2, . . . , I. (3.7)
π
1+exp(αi +exp(βi )) (= H i ) (si = 1)
In this expression, the constraints (3.3) are satisfied. The above model expression
can be simplified as follows:
exp(αi + si exp(βi ))
P(X i = 1|Si = si ) = , i = 1, 2, . . . , I. (3.8)
1 + exp(αi + si exp(βi ))
Let S = (S1 , S2 , .., S I )T be a latent response (skill acquisition) vector and let X =
(X 1 , X 2 , .., X I )T a manifest response vector. Then, the latent classes corresponding
I
to latent response s = (s1 , s2 , .., s I )T can be described with score k = i=1 si . From
(2.1), we have
P(X = x|S = s)
I xi 1−xi
exp(αi + si exp(βi )) 1
=
i=1
1 + exp(αi + si exp(βi )) 1 + exp(αi + si exp(βi ))
I
exp{xi (αi + si exp(βi ))}
=
i=1
1 + exp(αi + si exp(βi ))
and
I
I I
exp{xi (αi + si exp(βi ))}
P(X = x) = vk P(X = x|k) = vk .
k=0 k=0 i=1
1 + exp(αi + si exp(βi ))
In order to estimate the parameters φ = ((vk ), (αi ), (βi ))T , the following EM
algorithm is used.
EM algorithm I
(i) E-step
3.2 Latent Distance Analysis 51
where
exp{xi (s αi + si exp(s βi ))}
s
P(X i = xi |k) = , xi = 0, 1.
1 + exp(s αi + si exp(s βi ))
(ii) M-step
By using
the complete
data (3.9), the loglikelihood function based on the complete
data s+1 n(x, k) is given by
Q φ|s φ = l φ| s+1 n(x, k)
I I
exp{xi (αi + si exp(βi ))}
= s+1
n(x, k)log vk
k=0 x i=1
1 + exp(αi + si exp(βi ))
I
= s+1
n(x, k)logvk
k=0 x
I
I
+ s+1
n(x, k) {xi (αi + si exp(βi )) − log(1 + exp(αi + si exp(βi )))} .
k=0 x i=1
(3.10)
I
= s+1
n(x, k)(xi − P(X i = 1|Si = si )), i = 1, 2, . . . , I ;
k=0 x
∂ Q(φ|s φ) s+1
I
= n(x, k)(xi − P(X i = 1|Si = si ))si exp(βi ), i = 1, 2, . . . , I.
∂βi k=0 x
∂ 2 Q(φ|s φ) I
= − s+1
n(x, k)P(X i = 1|Si = si )(1 − P(X i = 1|Si = si )),
∂αi2 k=0 x
i = 1, 2, . . . , I ;
∂ 2 Q φ|s φ
I
=− s+1 n(x, k)P(X = 1|S = s )(1 − P(X = 1|S = s ))s exp(β ),
i i i i i i i i
∂αi ∂βi x k=0
i = 1, 2, . . . , I ;
∂ 2 Q φ|s φ
I
= s+1 n(x, k)
∂βi2 k=0 x
{xi − P(X i = 1|Si = si ) − P(X i = 1|Si = si )(1 − P(X i = 1|Si = si ))si exp(βi )}si exp(βi ),
i = 1, 2, . . . , I ;
⎛ ⎞
∂ 2 Q(φ|s φ) ∂ 2 Q(φ|s φ)
H= ⎝ 2∂αi ∂αsj 2∂αi ∂βsi ⎠. (3.12)
∂ Q(φ| φ) ∂ Q(φ| φ)
∂βi ∂αi ∂βi ∂β j
Let φ (α,β) = ((αi ), (βi )) and let t th iterative value of φ (α,β) be φ (α,β)t =
((αit ), (βit )), where φ (α,β)1 = ((s αi ), (s βi )), Then, φ (α,β)t+1 is obtained as follows:
3.2 Latent Distance Analysis 53
φ (α,β)t+1 = φ (α,β)t − H −1
t g t , t = 1, 2, . . . , (3.13)
values
of the gradient vector (3.11) and the Hessian matrix
(3.12) at φ = φ t = s+1 vk , φ (α,β)t , respectively. From this algorithm, we can get
limt→∞ φ (α,β)t = s+1 αi , s+1 βi .
Remark 3.2 The Newton–Raphson method for obtaining the estimates s+1 αi and
s+1
βi in the M-step makes a quick convergence of sequence φ (α,β)t within several
iterations.
Remark 3.3 Without constraints in (3.3), the latent distance model is a latent class
model with the following equality constraints:
P(X i = 1|0) = P(X i = 1|1) = · · · P(X i = 1|i − 1)(= π Li ),
i = 1, 2, . . . , I.
P(X i = 1|i) = P(X i = 1|i + 1) = · · · P(X i = 1|I )(= π H i ),
Then, the EM algorithm can be applied for estimating the parameters. Let
φ = ((va ), (π Li ), (π H i ))T be the parameters to be estimated and let s φ =
((s vk ), (s π Li ), (s π H i ))T be the estimates of the parameters at the s th iteration. Then,
the EM algorithm is given as follows:
EM algorithm II
(i) E-step
I
s
vk s
P(X i = xi |a)
s+1
n(x, k) = n(x) I i=1
I , k = 0, 1, 2, . . . , I, (3.14)
m=0 vm = xi |b)
s s P(X
i=1 i
(ii) M-step
s+1
s+1
n(x, k) n(x, k)
s+1
vk = x
= x
, k = 0, 1, 2, . . . , I ;
λ N
1 s+1
I
s+1
π̂ H i = n(x, k)xi , i = 1, 2, . . . , I ; (3.15)
N k=i x
1 s+1
i−1
s+1
π̂ Li = n(x, k)xi , i = 1, 2, . . . , I. (3.16)
N k=0 x
54 3 Latent Class Analysis with Ordered Latent Classes
The algorithm is a proportional fitting one; however, the results do not necessarily
guarantee the inequality constraints in (3.3).
The data in Table 2.1 is analyzed by using the latent distance model. The data
are from respondents to questionnaire items on role conflict [16], and the positive
responses X i , i = 1, 2, 3, 4 are 171, 108, 111, and 67, respectively. It may be valid
that item 1 is the easiest and item 4 is the most difficult to obtain positive responses,
whereas items 2 and 3 are intermediate. The estimated class proportions and the
positive response probabilities are given in Table 3.2. From a test of the goodness-
of-fit to the data is very good. The assessment of responses in the five latent classes
is
illustrated in Table 3.3, and the results are compared with the response scores
4
1 x i . Assuming a latent continuum in a population, the estimated item response
probabilities in the latent distance model and the five latent classes are illustrated
in Fig. 3.1. As demonstrated in this example, it is significant to grade the respon-
dents with response patterns instead of simple scores, for example, response patterns
(1, 1, 1, 0), (1, 1, 0, 1), (1, 0, 1, 1), and (0, 1, 1, 1) have manifest score 3; however,
they are assigned to latent classes 3, 4, 1, 0, respectively.
Table 3.2 Results of latent distance analysis of the data in Table 2.1
Latent class Proportion Latent positive item response probability
X1 X2 X3 X4
0 0.296 0.324 0.253 0.364 0.136
1 0.344 0.988 0.253 0.364 0.136
2 0.103 0.988 0.940 0.364 0.136
3 0.049 0.988 0.940 0.948 0.136
4 0.208 0.988 0.940 0.948 0.973
G2 = 0.921(d f = 3, P = 0.845)
Table 3.3 Assignment of the manifest responses to the extracted latent classes (latent distance
analysis of data set in Table 2.1)
Response pattern Scorea Latent class Response pattern Score Latent class
0000 0 0 0001 1 0
1000 1 1 1001 2 1
0100 1 0 0101 2 0
1100 2 2 1101 3 4
0010 1 0 0011 2 0
1010 2 1 1011 3 1
0110 2 0 0111 3 0
1110 3 3 1111 4 4
a Scores imply the sums of the positive responses
3.2 Latent Distance Analysis 55
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1 Class 4
Class 0 Class 1 Class 2 Class 3
0
0.02
0.06
0.1
0.14
0.18
0.22
0.26
0.3
0.34
0.38
0.42
0.46
0.5
0.54
0.58
0.62
0.66
0.7
0.74
0.78
0.82
0.86
0.9
0.94
0.98
X1 X2 X3 X4
Fig. 3.1 Graph of the estimated latent distance model for data set in Table 2.1
In order to assess the goodness-of-fit of the latent distance model, that is, the
explanatory power, the entropy approach with (2.12) and (2.13) is applied to the
above analysis. Comparing Tables 2.10, 2.11, and 3.4, the goodness-of-fit of the
latent distance model is better than the other models. From Table 3.4, 70% of the
entropy of response variable vector X = (X 1 , X 2 , X 3 , X 4 )T are explained by the
five ordered latent classes in the latent distance model, and item 1 (X 1 ) is associated
with the latent variable stronger than the other manifest variables.
Data in Tables 2.5 (McHugh’s data) and those in Table 2.7 (Lazarsfel-Stouffer’s
data) are also analyzed with the latent distance model. The first data were obtained
from four test items on creative ability in machine design [12], and the second data
were from noncommissioned officers that were cross-classified with respect to their
dichotomous responses, “favorable” and “unfavorable” toward the Army for each of
the four different items on general attitude toward the Army [13]. Before analyzing
the data sets, the marginal frequencies of positive response to items are given in Table
3.5. Considering the marginal positive response frequencies, in McHugh’s data set it
is natural to think there are no orders in difficulty with respect to item responses X i ;
whereas in Lazarsfel-Stouffer’s data set (Table 2.7) it may be appropriate to assume
the difficulty order in the item responses, i.e., the skill acquisition order
S1 ≺ S2 ≺ S3 ≺ S4 .
Table 3.4 Assessment of the latent distance model for the Stouffer-Toby data
Manifest variable X1 X2 X3 X4 Total
KL 0.718 0.606 0.386 0.625 2.335
ECD 0.418 0.377 0.278 0.385 0.700
56 3 Latent Class Analysis with Ordered Latent Classes
Table 3.5 Marginal positive response frequencies of McHugh’s and Lazarsfel-Stouffer’s data
Data set Marginal positive response frequency
X1 X2 X3 X4
McHugh’s data 65 75 78 73
Lazarsfeld-Stouffer’s data 359 626 700 736
The results of latent distance analysis of Lazarsfel-Stouffer’s data set are given in
Table 3.6 and the estimated model is illustrated in Fig. 3.2. The goodness-of-fit of the
model to the data set is not statistically significant at the level of significance 0.05,
and comparing the results with those in Table 2.8 or Table 2.9, the latter is better to
explain the data set. Figure 3.2 demonstrates the estimated latent distance model, and
the entropy-based assessment of the latent distance model is illustrated in Table 3.7.
The Guttman scaling is an efficient method to grade subjects with their response
patterns; however, in the practical observation or experiments, we have to take their
Table 3.6 Results of latent distance analysis of the data set in Table 2.7
Latent class Proportion Latent positive item response probability
X1 X2 X3 X4
0 0.388 0.027 0.366 0.445 0.498
1 0.030 0.569 0.366 0.445 0.498
2 0.038 0.569 0.813 0.445 0.498
3 0.031 0.569 0.813 0.914 0.498
4 0.513 0.569 0.813 0.914 0.981
G2 = 6.298(d f = 3, P = 0.098)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3 Class 3
0.2 Class 2
Class 1 Class 4
0.1 Class 0
0
0.02
0.06
0.1
0.14
0.18
0.22
0.26
0.3
0.34
0.38
0.42
0.46
0.5
0.54
0.58
0.62
0.66
0.7
0.74
0.78
0.82
0.86
0.9
0.94
0.98
X1 X2 X3 X4
Fig. 3.2 Graph of the estimated latent distance model for data set in Table 2.7
3.2 Latent Distance Analysis 57
Table 3.7 Assessment of the latent distance model for Lazarsfel-Stouffer’s Data
Manifest variable X1 X2 X3 X4 Total
KL 0.496 0.219 0.300 0.478 1.493
ECD 0.332 0.179 0.231 0.324 0.599
response errors into account. In this respect, the latent distance model provides a
good approach to deal with the method. The approach is referred to as the latent
Guttman scaling in this book. In applying latent distance models to data sets, the
contents of items to be used have to be considered beforehand.
i−1
θ(0) = 0, θ(i) = vk , i = 1, 2, . . . , I + 1. (3.17)
k=0
Then, θ(i) are interpreted as the threshold for positively or successfully responding
to item i, that is,X i = 1, i = 1, 2, . . . , I . Let us assign the following scores to latent
classes i:
ti θ ∈ θ(i) , θ(i+1)
,i = 0, 1,
. . . , I − 1
T (θ ) = . (3.18)
tI θ ∈ θ(I ) , 1
The information ratio about latent trait θ , which score T (θ ) has, is defined by
Cov(T (θ ), θ )2
K(T (θ )|θ ) ≡ corrr(T (θ ), θ )2 = . (3.19)
Var(T (θ ))Var(θ )
58 3 Latent Class Analysis with Ordered Latent Classes
1 1
E(θ ) = , var(θ ) = .
2 12
From (3.18), we also get
I
I
E(T (θ )) = tk vk = tk θ(k+1) − θ(k) , (3.20)
k=0 k=0
I
Var(T (θ )) = tk2 θ(k+1) − θ(k) − E(T (θ ))2 , (3.21)
k=0
I
1
2
Cov(T (θ ), θ ) = tk θ(k+1) − θ(k) − E(T (θ )) .
2
(3.22)
2 k=0
The amount of information about latent trait θ that the manifest variables have is
defined by
where F is the class of functions defined by (3.18). We have the following theorem:
Theorem 3.1 Let θ be uniformly distributed on interval [0, 1], and let function T (θ )
be defined by (3.18). Then, K(T (θ )|θ ) is maximized by
θi+1 + θi
ti = a + b, i = 0, 1, 2, . . . , I, (3.25)
2
I
K(X 1 , X 2 , . . . , X I |θ ) = 3 θi+1 θi (θi+1 − θi ). (3.26)
i=0
3.3 Assessment of the Latent Guttman Scaling 59
I
E(T (θ )) = ti θ(i+1) − θ(i) = 0, (3.27)
a=0
I
Var(T (θ )) = ti2 θ(i+1) − θ(i) − E(T (θ ))2 = 1. (3.28)
i=0
I
Var(T (θ )) = ti2 θ(i+1) − θ(i) = 1. (3.29)
i=0
In order to make the maximization of the above function with respect to scores
ti , it is sufficient to maximize the following one:
I
2
ti θ(i+1) − θ(i)
2
.
i=0
I
2
I
I
g= ti θ(i+1) − θ(i)
2
−λ ti θ(i+1) − θ(i) − μ ti2 θ(i+1) − θ(i) .
i=0 i=0 a=0
I
2
I
I
g= ti θ(i+1) − θ(i)
2
−λ ti θ(i+1) − θ(i) − μ ti2 θ(i+1) − θ(i) . (3.30)
i=0 i=0 a=0
From this,
θ(i+1) − θ(i) θ(i+1) + θ(i) − λ − 2μti = 0.
I
1 − λ − 2μ ti θ(i+1) − θ(i) = 0. (3.32)
i=0
λ = 1,
θ(i+1) + θ(i) − 1
ti = , i = 0, 1, 2, . . . , I. (3.33)
2μ
Multiplying (3.30) by ti and summing up both sides of the equations with respect
to i = 0, 1, 2, . . . , I , we have
I
2
I
I
ti θ(i+1) − θ(i) − λ
2
ti θ(i+1) − θ(i) − 2μ ti2 θ(i+1) − θ(i) = 0.
i=0 i=0 i=0
I
2
ti θ(i+1) − θ(i)
2
− 2μ = 0.
i=0
I
=3 θ(i+1) θ(i) θ(i+1) − θ(i) . (3.35)
i=0
Since K(T (θ )|θ ) (3.19) is the square of the correlation coefficient between T (θ )
and θ , hence the theorem follows.
From Theorem 3.1, we set
θ(i+1) + θ(i)
ti = , i = 0, 1, 2, . . . , I. (3.36)
2
The above discussion is applied to the latent distance models estimated in
Tables 3.2 and 3.6. For Table 3.2, we have
θ(0) = 0, θ(1) = 0.296, θ(2) = 0.640, θ(3) = 0.743, θ(4) = 0.792, θ(4) = 1,
From the results, the latent Guttman scaling in Table 3.2 is better than that in
Table 3.6.
The following theorem gives the maximization of (3.24) with respect to θ(i) , i =
0, 1, 2, . . . , I .
i
θ(i) = , i = 0, 1, 2, . . . , I
I +1
I (I + 2)
max K(X 1 , X 2 , . . . , X I |θ) = . (3.37)
(θ(a) ) (I + 1)2
∂
K(X 1 , X 2 , . . . , X I |θ ) = 3 θ(i+1) − θ(i−1) θ(i+1) + θ(i−1) − 2θ(i) = 0.
∂θ(i)
i
θ(i) = , i = 0, 1, 2, . . . , I + 1. (3.38)
I +1
K(X 1 , X 2 , . . . , X I |θ )
e f f iciency = . (3.39)
max K(X 1 , X 2 , . . . , X I |θ)
( (a) )
θ
The efficiencies of the latent distance models in Tables 3.2 and 3.6 are calculated,
respectively, as 0.962 and 0.840.
Remark 3.5 The efficiency of the latent Guttman scaling may also be measured
with entropy. Let p = ( p1 , , p2 , , . . . , p A ) be any probability distribution. Then, the
entropy is defined by
A
H ( p) = − pa log pa .
a=1
1 The maximum of the above entropy is logA for the uniform distribution q =
,
A A
1
, . . . , 1
A
. The result is the same as that in Theorem 3.2. Then, the efficiency
of distribution p can be defined by
H ( p)
e f f iciency = .
logA
Applying the above efficiency to the latent distance models estimated in Tables 3.2
and 3.6, we have 0.892 and 0.650, respectively. In the sense of entropy, the latent
Guttman scaling in Table 3.2 is better than that in Table 3.6 as illustrated above by
using (3.39).
3.4 Analysis of the Association Between Two Latent Traits … 63
In this setup, the skills Sk1 ≺ Sk2 ≺ . . . Sk Ik constitute the latent Guttman scaling.
Let θk(a) be thresholds for skills Ska , a = 0, 1, 2, . . . , Ik + 1; k = 1, 2, and then, we
set
vmn = P θ1(m) ≤ θ1 < θ1(m+1) , θ2(n) ≤ θ2 < θ2(n+1) ,
m = 0, 1, 2, . . . , I1 ; n = 0, 1, 2, . . . , I2 .
Then, putting sk = sk1 , sk2 , . . . , sk Ik , k = 1, 2, and
I1
I2
m= s1i , n = s2i ,
i=0 i=0
I1
I2
P((X 1 , X 2 ) = (x 1 , x 2 )) = vmn P((X 1 , X 2 ) = (x 1 , x 2 )|(S1 , S2 ) = (s1 , s2 )),
k=0 l=0
where
According to the model, the joint levels of traits θk of individuals can be scaled,
and the association between the traits can also be assessed. Let Tk (θk ), k = 1, 2 be
64 3 Latent Class Analysis with Ordered Latent Classes
functions of scores for latent traits θk , which are made by (3.18) and (3.36). Then, the
correlation coefficient between scores Tk (θk ), k = 1, 2, Corr(T1 (θ1 ), T2 (θ2 )) is used
for measuring the association between traits θ1 and θ2 , because Corr(θ1 , θ2 ) cannot
be calculated. If θ1 and θ2 are statistically independent, T1 (θ1 ) and T2 (θ2 ) are also
independent, and then, we have Corr(T1 (θ1 ), T2 (θ2 )) = 0.
The above model is applied to data in Table 3.8, which were obtained from 145
children from 1 to 5 years old. Latent trait θ1 and θ2 implied the general intelligence
and the verbal ability of children, respectively, and these abilities are measured with
three manifest binary variables ordered as X ki , i = 1, 2, 3; k = 1, 2, respectively.
The parameter can be estimated via the EM algorithm as in the previous section. The
estimated latent probabilities are illustrated in Table 3.9, and the responses to the
manifest variables X ki are demonstrated in Figs. 3.3 and 3.4. From Fig. 3.3, we have
The densities are illustrated in Fig. 3.5, and the association between traits.θ1 and
θ2 is summarized. It seems the association is positive, and in effect we obtain estimate
"
Corr (T1 (θ1 ), T2 (θ2 )) = 0.780. From this, the association between the two latent traits
is strong. The respondents shown in Table 3.8 are assigned to latent classes in Table
3.10, and it implies an assessment of respondents’ grades in the latent traits.
In this section, two-dimensional latent continuous traits are discretized, and
ordering of latent classes can be carried out in each latent trait; however, it may
be useful to grade all the latent classes with a method, because without it, for latent
classes (i. j), i = 0, 1, 2, 3; j = 0, 1, 2, 3 we may simply employ scores i + j
to grade the latent classes. In Sect. 4.10 in Chap. 4, an entropy-based method to
order latent classes is discussed, and grading (ordering) of the above latent classes
(i. j), i = 0, 1, 2, 3; j = 0, 1, 2, 3 (Table 3.9) will be treated as an example.
In analyzing Stouffer-Toby’s data (Table 2.1), latent class cluster analysis and latent
distance analysis have been used. From the results in Tables 2.3 and 2.11, it is
appropriate to assume there exist ordered latent classes that explain behavior in the
Table 3.8 Data on the general intelligence ability and the verbal ability from 145 pupils
θ1 θ2 θ1 θ2 θ1 θ2
X 11 X 12 X 13 X 21 X 22 X 23 Freq X 11 X 12 X 13 X 21 X 22 X 23 Freq X 11 X 12 X 13 X 21 X 22 X 23 Freq
0 0 0 0 0 0 13 0 1 1 0 1 0 0 0 0 1 1 0 1 0
1 0 0 0 0 0 5 1 1 1 0 1 0 1 1 0 1 1 0 1 2
0 1 0 0 0 0 1 0 0 0 1 1 0 1 0 1 1 1 0 1 0
1 1 0 0 0 0 2 1 0 0 1 1 0 7 1 1 1 1 0 1 5
0 0 1 0 0 0 1 0 1 0 1 1 0 3 0 0 0 0 1 1 0
1 0 1 0 0 0 2 1 1 0 1 1 0 5 1 0 0 0 1 1 0
3.5 Latent Ordered-Class Analysis
0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 1 1 0
1 1 1 0 0 0 2 1 0 1 1 1 0 2 1 1 0 0 1 1 1
0 0 0 1 0 0 9 0 1 1 1 1 0 2 0 0 1 0 1 1 0
1 0 0 1 0 0 3 1 1 1 1 1 0 11 1 0 1 0 1 1 0
0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 1
1 1 0 1 0 0 2 1 0 0 0 0 1 0 1 1 1 0 1 1 1
0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0
1 0 1 1 0 0 2 1 1 0 0 0 1 0 1 0 0 1 1 1 1
0 1 1 1 0 0 1 0 0 1 0 0 1 1 0 1 0 1 1 1 0
(continued)
65
Table 3.8 (continued)
66
θ1 θ2 θ1 θ2 θ1 θ2
X 11 X 12 X 13 X 21 X 22 X 23 Freq X 11 X 12 X 13 X 21 X 22 X 23 Freq X 11 X 12 X 13 X 21 X 22 X 23 Freq
1 1 1 1 0 0 4 1 0 1 0 0 1 0 1 1 0 1 1 1 3
0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 1 1 1 0
1 0 0 0 1 0 1 1 1 1 0 0 1 0 1 0 1 1 1 1 2
0 1 0 0 1 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1 3
1 1 0 0 1 0 1 1 0 0 1 0 1 1 1 1 1 1 1 1 38
0 0 1 0 1 0 0 0 1 0 1 0 1 0
1 0 1 0 1 0 2 1 1 0 1 0 1 1
Source Eshima [8]
3 Latent Class Analysis with Ordered Latent Classes
3.5 Latent Ordered-Class Analysis 67
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
θ1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
θ2
0
0.04
0.08
0.12
0.16
0.2
0.24
0.28
0.32
0.36
0.4
0.44
0.48
0.52
0.56
0.6
0.64
0.68
0.72
0.76
0.8
0.84
0.88
0.92
0.96
X21 X22 X23
0.4
0.3
θ2
0.2
0.7885
0.1 0.4565
0 0.277
0.101
0.277
0.4125 0.109
0.7365 θ
1
data set, that is, role conflict. The results in Table 2.3 are those based on a latent class
cluster model and the analysis is an exploratory latent class analysis, and on the other
hand, the results in Table 3.2 are those by a confirmatory analysis. In the role conflict
for Stouffer-Toby’s data set, it may be suitable to assume ordered latent classes that are
located in a latent continuum. For the data set, the number of latent classes in the latent
class cluster model is less than and equal to three according to the condition of model
Table 3.10 Assignment of the manifest responses to the extracted latent classes based on Table 3.8
θ1 θ2 θ1 θ2 θ1 θ2
X 11 X 12 X 13 X 21 X 22 X 23 LCa X 11 X 12 X 13 X 21 X 22 X 23 LC X 11 X 12 X 13 X 21 X 22 X 23 LC
0 0 0 0 0 0 (0,0) 0 1 1 0 1 0 (3,2) 0 0 1 1 0 1 (3,3)
1 0 0 0 0 0 (1,0) 1 1 1 0 1 0 (3,2) 1 0 1 1 0 1 (3,3)
0 1 0 0 0 0 (0,0) 0 0 0 1 1 0 (0,1) 0 1 1 1 0 1 (3,3)
1 1 0 0 0 0 (2,0) 1 0 0 1 1 0 (1,2) 1 1 1 1 0 1 (3,3)
0 0 1 0 0 0 (0,0) 0 1 0 1 1 0 (2,2) 0 0 0 0 1 1 (0,0)
1 0 1 0 0 0 (1,0) 1 1 0 1 1 0 (2,2) 1 0 0 0 1 1 (1,3)
3.5 Latent Ordered-Class Analysis
θ1 θ2 θ1 θ2 θ1 θ2
X 11 X 12 X 13 X 21 X 22 X 23 LCa X 11 X 12 X 13 X 21 X 22 X 23 LC X 11 X 12 X 13 X 21 X 22 X 23 LC
1 1 1 1 0 0 (3,1) 1 0 1 0 0 1 (3,3) 1 1 0 1 1 1 (3,3)
0 0 0 0 1 0 (0,0) 0 1 1 0 0 1 (3,3) 0 0 1 1 1 1 (3,3)
1 0 0 0 1 0 (1,0) 1 1 1 0 0 1 (3,3) 1 0 1 1 1 1 (3,3)
0 1 0 0 1 0 (0,0) 0 0 0 1 0 1 (0,1) 0 1 1 1 1 1 (3,3)
1 1 0 0 1 0 (2,2) 1 0 0 1 0 1 (1,3) 1 1 1 1 1 1 (3,3)
0 0 1 0 1 0 (0,0) 0 1 0 1 0 1 (3,3)
1 0 1 0 1 0 (1,1) 1 1 0 1 0 1 (3,3)
a LC implies latent class
3 Latent Class Analysis with Ordered Latent Classes
3.5 Latent Ordered-Class Analysis 71
identification, meanwhile, six in the latent distance model. In order to extract ordered
latent classes, it is sensible to make a parsimonious model. Let θa , a = 1, 2, . . . , A
be parameters that express the locations of latent classes in a latent continuum, such
that θa < θa+1 , a = 1, 2, . . . , A−1, and let πi (θa ), i = 1, 2, . . . , I ; a = 1, 2, . . . , A
be latent positive response probabilities to binary items i in latent classes a, which
satisfy the following inequalities:
The functions πi (θa ) are specified before analyzing a data set under study, and it is
appropriate that the number of parameters in the model is as small as possible and that
the parameters are easy to interpret. Since the positive response probabilities πi (θa )
are functions of location or trait parameters θa , such models are called structured
latent class model. In this section, the following logistic model is used [7]:
exp(θα − di )
πi (θα ) = , a = 1, 2, . . . A; i = 1, 2, . . . I, (3.41)
1 + exp(θα − di )
where di are item difficulty parameters as in the latent trait model and we set d1 = 0
for model identification. The above model is called the Rasch model [15]. Then,
the constraints (3.40) are held by the above model. The number of parameters to be
estimated is
2 A + I − 1.
Thus, in order to identify the model, we have to keep the following constrain:
1
I
2 A + I − 1 < 2I − 1 ⇔ A < 2 −I , (3.42)
2
because the number of the accounting equations is 2 I − 1, i.e., the number of the
joint probabilities of manifest responses X = (X 1 , X 2 , . . . , X I )T minus one.
Let P(X = x|a) be the joint probability of X = (x1 , x2 , . . . , x I )T for given latent
class a. Then, from (3.41) we have
I xi 1−xi
exp(θa − di ) 1
P(X = x|a) =
i=1
1 + exp(θa − di ) 1 + exp(θa − di )
I
exp{xi (θa − di )}
= ,
i=1
1 + exp(θa − di )
A I
A
exp{xi (θa − di )}
P(X = x) = va P(X = x|a) = va .
α=1 α=1 i=1
1 + exp(θa − di )
72 3 Latent Class Analysis with Ordered Latent Classes
In order to estimate the parameters φ = ((va ), (θa ), (di ))T , the EM algorithm is
used and the summary of the algorithm is given as follows:
EM algorithm
(i) E-step
Let s φ = ((s va ), (s θa ), (s di ))T be the estimate of parameter vector φ at the s th
iteration in the EM algorithm. Then, the conditional expectations of complete data
(n(x, a)) given parameters s φ = ((s va ), (s θa ), (s di )) are calculated in the (s + 1) th
iteration as follows:
I s
s
va i=1 P(X i = xi |a)
s+1
n(x, a) = n(x) I I s , a = 0, 1, 2, . . . , I, (3.43)
sv
b=0 b i=1 P(X i = x i |b)
where
exp{xi (s θa − s di )}
s
P(X i = xi |a) = , xi = 0, 1.
1 + exp(s θa − s di )
(ii) M-step
s+1
As in (3.10), the loglikelihood function of the complete data n(x, a) (3.43) is
given by
Q φ|s φ = l φ| s+1 n(x, a)
A I
exp{xi (θa − di )}
= s+1
n(x, a)log va
α=1 x i=1
1 + exp(θa − di )
A
= s+1
n(x, a)logva
α=1 x
I
A
+ s+1
n(x, a) {xi (θa − di ) − log(1 + exp(θa − di ))} . (3.44)
α=1 x i=1
however, for estimating the other parameters θa and di , the Newton-Raphson method
needs to be used in the M-step. The first derivatives of Q(φ|s φ) with respect to θa
and di , respectively, are calculated as follows:
3.5 Latent Ordered-Class Analysis 73
∂ Q(φ|s φ) s+1
I
= n(x, a)(xi − P(X i = 1|a)), a = 1, 2, . . . , A;
∂θa i=1 x
∂ Q(φ|s φ) A
=− s+1
n(x, a)(xi − P(X i = 1|a)), i = 2, 3, . . . , I.
∂di α=1 x
∂ 2 Q(φ|s φ) I
= − s+1
n(x, a) P(X i = 1|a)(1 − P(X i = 1|a))
∂θa2 x i=1
I
= −N · s+1 va P(X i = 1|a)(1 − P(X i = 1|a)),
i=1
a = 1, 2, . . . , A;
∂ 2 Q(φ|s φ) s+1
= n(x, a)P(X i = 1|a)(1 − P(X i = 1|a))
∂θa ∂di x
= N · s+1 va P(X i = 1|a)(1 − P(X i = 1|a)),
a = 1, 2, . . . , A; i = 1, 2, . . . , I ;
∂ 2 Q(φ|s φ) A
= − s+1
n(x, a)P(X i = 1|a)(1 − P(X i = 1|a))
∂di2 α=1 x
A
=− N · s+1 va P(X i = 1|a)(1 − P(X i = 1|a)),
α=1
i = 2, 3, . . . , I.
∂ 2 Q(φ|s φ) ∂ 2 Q(φ|s φ)
= = 0, a = b; i = j.
∂θa ∂θb ∂di ∂d j
Table 3.11 The estimated latent ordered-class model with five latent classes (Stouffer-Toby’s data)
Latent class θ Proportion Latent positive item response probability
X1 X2 X3 X4
1(−1.563) 0.058 0.173 0.037 0.040 0.011
2(1.072) 0.124 0.745 0.348 0.365 0.133
3(1.244) 0.267 0.776 0.388 0.406 0.154
4(1.329) 0.320 0.791 0.408 0.427 0.166
5(4.771) 0.231 0.992 0.956 0.959 0.861
G2 = 1.092(d f = 3, P = 0.779)
⎛ ⎞
∂ 2 Q(φ|s φ) ∂ 2 Q(φ|s φ)
H= ⎝ 2∂θa ∂θsb 2∂θa ∂dsi ⎠. (3.46)
∂ Q(φ| φ) ∂ Q(φ| φ)
∂di ∂θa ∂di ∂d j
Let φ (θ,d) = ((θa ), (di )) and let the t th iterative value of φ (θ,d) be φ (θ,d)t =
((θat ), (dit )), where φ (θ,d)1 = ((s θa ), (s di )). Then, φ (α,β)t+1 is obtained as follows:
φ (θ,d)t+1 = φ (θ,d)t − H −1
t g t , t = 1, 2, . . . ,
where H t and
g t are values
of the gradient vector (3.45) and the Hessian matrix
(3.46) at φ t = s+1 va , φ (θ,d)t . From this algorithm, we can get limt→∞ φ (θ,d)t =
s+1
s+1
θa , di .
The above model is applied to the analysis of Stouffer-Toby’s data in Table 2.1.
Considering the inequality constraint (3.42), first, for the number of ordered latent
classes A = 5, the results are shown in Table 3.11. The goodness-of-fit to the data set
is good as indicated in the table, G 2 = 1.092(d f = 3, P = 0.779). According to the
latent response probabilities to four items in latent classes 2 through 4 (Fig. 3.6), the
latent classes are similar, and second, the latent ordered-class model with three latent
classes is used for analyzing the data. The results are illustrated in Table 3.12, and it
shows the goodness-of-fit to the data is very good, G 2 = 1.092(d f = 7, P = 0.993).
The response probabilities for three latent classes are illustrated in Fig. 3.7. In
Sect. 2.3, the data have been analyzed with usual latent class models with three and
two latent classes. Although the ordered latent classes have been derived as shown in
Tables 2.2 and 2.3, the present models are more parsimonious than the usual models.
The assessment of respondents based on the latent ordered-class analysis in Table
3.12 is demonstrated in Table 3.13.
In this section, latent ordered-class analysis is discussed by using model (3.41);
however, the analysis does not confine us to the model. Structured models can be
made as far as the models are identified. The present approach assumes ordered latent
classes in the population, which can be located in a latent continuum. In this sense, it
is related to item response models that assume latent continuums in the populations.
3.5 Latent Ordered-Class Analysis 75
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-1.56 1.07 1.24 1.33 4.77
X1 X2 X3 X4
Fig. 3.6 Locations of five ordered latent classes and their positive response probabilities in Table
3.11
Table 3.12 The estimated latent ordered-class model with three latent classes (Stouffer-Toby’s
data)
Latent class Proportion Latent positive item response probability
X1 X2 X3 X4
1(−0.397) 0.150 0.434 0.123 0.131 0.039
2(1.412) 0.647 0.811 0.439 0.458 0.184
3(5.160) 0.203 0.995 0.974 0.975 0.914
G2 = 1.092(d f = 7, P = 0.993)
where
A
Cov(X i , θ ) = va πi (θa )θa − E(X i )E(θ )
a=1
76 3 Latent Class Analysis with Ordered Latent Classes
1.2
0.8
0.6
0.4
0.2
0
-0.397 1.412 5.160 θ
X1 X2 X3 X4
Fig. 3.7 Location of three latent ordered classes and their positive response probabilities in Table
3.12
Table 3.13 Assignment of the manifest responses to the extracted latent classes (latent ordered-
class analysis of data in Table 2.1)
Response pattern Score Latent class Response pattern Score Latent class
0000 0 1 0001 1 2
1000 1 2 1001 2 2
0100 1 2 0101 2 2
1100 2 2 1101 3 2
0010 1 2 0011 2 2
1010 2 2 1011 3 2
0110 2 2 0111 3 2
1110 3 2 1111 4 3
A
A
= va πi (θa )θa − P(X i = 1) va θa ,
a=1 a=1
i = 1, 2, . . . , I.
From this, we can obtain ECD (3.47) and ECDs for manifest variables X i are also
given by
3.5 Latent Ordered-Class Analysis 77
Cov(X i , θ )
ECD(X i , θ ) = . (3.48)
1 + Cov(X i , θ )
By using (3.47) and (3.48), the estimated model in Table 3.12 is assessed. The
results are demonstrated in Table 3.14. Since ECD(X, θ ) = 0.628, 62.8% of the
uncertainty of manifest variable vector X = (X 1 , X 2 , X 3 , X 4 )T is explained by the
model.
In this section, latent ordered-class model (3.41) has been discussed and the model
has been applied to data in Table 2.1. A more general model is given by
exp(βi (θa − di ))
πi (θa ) = , a = 1, 2, . . . , A; i = 1, 2, . . . , I, (3.49)
1 + exp(βi (θa − di ))
2 I − 2(A + I − 1) > 0.
and
βi Cov(X i , θ )
ECD(X i , θ ) = , i = 1, 2, . . . , I.
1 + βi Cov(X i , θ )
78 3 Latent Class Analysis with Ordered Latent Classes
A
I
A I
exp(xi βi (θa − di ))
P(X = x) = va P(X i = xi |θa ) = va ,
α=1 i=1 α=1 i=1
1 + exp(βi (θa − di ))
(3.50)
where
exp(xi s βi (θa − s di ))
s
P(X i = xi |θa ) = , xi = 0, 1.
1 + exp(s βi (θa − s di ))
(ii) M-step
s+1
The log likelihood function of the complete data n(x, a) (3.51) is given by
Q φ|s φ = l φ| s+1 n(x, a)
A I
exp(xi βi (θa − di ))
= s+1
n(x, a)log va
α=1 x i=1
1 + exp(βi (θa − di ))
3.6 The Latent Trait Model (Item Response Model) 79
A
= s+1
n(x, a)logva
α=1 x
I
A
+ s+1
n(x, a) {xi βi (θa − di ) − log(1 + exp(βi (θa − di )))} . (3.52)
α=1 x i=1
For estimating the other parameters βi and di , the Newton-Raphson method needs
to be used in the M-step. The first derivatives of Q(φ|s φ) with respect to θa and di ,
respectively, are calculated as follows:
∂ Q(φ|s φ) s+1
A
= n(x, a)(θa − di )(xi − P(X i = xi |θa )), i = 1, 2, . . . , I ;
∂βi a=1 x
(3.53)
∂ Q(φ|s φ) A
=− s+1
n(x, a)βi (xi − P(X i = xi |θa )), i = 1, 2, . . . , I.
∂di α=1 x
(3.54)
⎛ ⎞
∂ Q(φ|s φ)
g= ⎝ ∂βi s ⎠. (3.55)
∂ Q(φ| φ)
∂di
∂ 2 Q(φ|s φ) A
= − s+1
n(x, a)(θa − di )2 P(X i = 1|a)(1 − P(X i = 1|a)),
∂βi2 a=1 x
(3.56)
a = 1, 2, . . . , A;
∂ 2 Q(φ|s φ) A
=− s+1
n(x, a)
∂βi ∂di a=1 x
{(xi − P(X i = 1|a)) − (θa − di )βi P(X i = 1|a)(1 − P(X i = 1|a))},
(3.57)
i = 1, 2, . . . , I ;
80 3 Latent Class Analysis with Ordered Latent Classes
∂ 2 Q(φ|s φ) A
=− s+1
n(x, a)βi2 P(X i = 1|a)(1 − P(X i = 1|a)), (3.58)
∂di2 α=1 x
i = 2, 3, . . . , I.
∂ 2 Q(φ|s φ) ∂ 2 Q(φ|s φ)
= = 0, i = j. (3.59)
∂βi ∂β j ∂di ∂d j
s
φ t+1 = s φ t − H −1
t g t , t = 1, 2, . . . ,
where H t and g t are values of the gradient vector (3.55) and the Hessian matrix
(3.60) at s φ t .
Remark 3.6 The expectation of the Hessian matrix is minus the Fisher informa-
tion matrix. Although the Fisher information matrix is positive definite, however,
Hessian matrices (3.60) calculated in the iterations are not necessarily guaranteed to
be negative definite. Since in latent class a,
# $
E (X i − P(X i = 1|a))2 |a = P(X i = 1|a)(1 − P(X i = 1|a)), i = 1, 2, . . . , I,
∂ 2 Q(φ|s φ) s+1
A
≈ n(x, a)(θa − di )βi P(X i = 1|a)(1 − P(X i = 1|a)),
∂βi ∂di a=1 x
(3.61)
i = 1, 2, . . . , I ;
Table 3.15 Upper limits of latent classes θ(a) , class values θa and latent class proportions va
a 1 2 3 4 5 6 7 8 9 10
θ(a) −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 +∞
θa −2.5 −1.75 −1.25 −0.75 −0.25 0.25 0.75 1.25 1.75 2.5
va 0.023 0.044 0.092 0.150 0.191 0.191 0.150 0.092 0.044 0.023
Table 3.16 The estimated parameters in latent trait model (3.50) from the Stouffer-Toby’s data
Manifest variable X1 X2 X3 X4
βi 1.128 1.559 1.330 2.076
di −1.471 −0.006 −0.061 0.643
G 2 = 8.570, (d f = 7, P = 0.285)
are set. The values and the latent class proportions va calculated with the standard
normal distribution (1.8) are illustrated in Table 3.15. These values are fixed in the
estimation procedure. In the estimation procedure, (3.61) is employed for (3.57), and
the estimated parameters are given in Table 3.16. According to the test of goodness-
of-fit, latent trait model (3.58) fairly fits the data set and it is reasonable to assume
a latent continuous trait to respond to the four test items. The graphs of the item
response functions are illustrated in Fig. 3.8. The assessment of test items (manifest
variables) as indicators of the latent trait is shown in Table 3.17. In order to estimate
latent trait θ of an individual with response vector x, the following method is used.
Let f (x, θ ) be the joint probability function of x = (x1 , x2 , x3 , x4 ) and θ . The
estimate is given by θmax such that f (x, θmax ) = max f (x, θ ). Since θ is distributed
θ
according to the standard normal distribution ϕ(θ ), from (1.6) we have
1 θ2 I
exp(xi βi (θ − di ))
log f (x, θ ) = − log2π − + log
2 2 i=1
1 + exp(βi (θ − di ))
θ2
I I
1
= − log2π − + xi βi (θ − di ) + log(1 + exp(βi (θ − di ))).
2 2 i=1 i=1
82 3 Latent Class Analysis with Ordered Latent Classes
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
X1 X2 X3 X4
Fig. 3.8 Graph of latent trait model (3.50) estimated from Stouffer-Toby’s data (Table 2.1)
d I
log f (x, θ ) = −θ + βi (xi − P(X i = 1|θ )) = 0. (3.63)
dθ i=1
By solving the above equation with respect to θ , we can get the estimate of latent
trait θ of a respondent with manifest response vector x = (x1 , x2 , x3 , x4 ) (Table
3.18).
Remark 3.7 In order to solve Eq. (3.63), the following Newton–Raphson method is
employed. Let θ (m) be the estimate of θ in the m th iteration. Then, the algorithm for
obtaining a solution in (3.63) is given by
(m+1) (m)
d
log f x, θ (m)
θ =θ − dθ
2
, m = 0, 1, 2, . . . ,
d
dθ 2
log f x, θ (m)
where
3.6 The Latent Trait Model (Item Response Model) 83
Table 3.18 Assessment of respondents by using the estimated latent trait model (Table 3.15)
Response pattern θa Response pattern θa
0000 −1.097 0001 −0.320
1000 −0.515 1001 −0.150
0100 −0.836 0101 0.278
1100 0.289 1101 0.384
0010 −0.313 0011 0.051
1010 0.061 1011 0.157
0110 −0.464 0111 0.585
1110 0.658 1111 0.900
Table 3.19 The estimated parameters in latent trait model (3.50) from the Lazarsfeld-Stouffer’s
data
Manifest variable X1 X2 X3 X4
βi 1.672 1.099 1.391 1.577
di 0.525 −0.586 −0.831 −0.983
G 2 = 7.515, (d f = 7, P = 0.377)
d2 I
2
log f (x, θ ) = −1 − βi2 P(X i = 1|θ )(1 − P(X i = 1|θ )).
dθ i=1
Second, McHugh’s data in Table 2.5 are analyzed with model (3.50). The log like-
lihood ratio test statistic G 2 = 22.011(d f = 7, P = 0.003) is obtained, and thus, the
model fitness to the data set is bad. It may be concluded that there is no latent contin-
uous trait distributed according to the standard normal distribution or the latent trait
space not one-dimensional. Finally, Lazarsfel-Stouffer’s data (Table 2.7) are analyzed
with model (3.50), and the estimated parameters and the latent response probabilities
P(X i = xi |θ ) are illustrated in Table 3.19 and Fig. 3.9, respectively. The latent trait
model makes a moderate fit to the data set, that is, G 2 = 7.515(d f = 7, P = 0.377).
The predictive or explanatory power of latent trait θ for manifest variables (Table
3.20) is similar to that of Stouffer-Toby’s data (Table 3.16). As demonstrated above,
the latent trait model can be estimated in a framework of the latent class model, and
the EM algorithm is effective to estimate the model parameters.
3.7 Discussion
In this chapter, latent class analyses with ordered latent classes have been discussed.
In latent distance analysis, the model is an extension of the Guttman scale model, and
the intrusion and omission errors are incorporated into the model itself. Assuming a
84 3 Latent Class Analysis with Ordered Latent Classes
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
X1 X2 X3 X4
Fig. 3.9 Graph of latent trait model (3.50) estimated from Lazarsfeld-Stouffer’s data (Table 2.7)
References
1. Croon, M. A. (1990). Latent class analysis with ordered latent classes. British Journal of
Mathematical and Statistical Psychology, 43, 171–192.
2. Dayton, C. M., & Macready, G. B. (1976). A probabilistic model for validation of behavioral
hierarchies. Psychometrika, 43, 189–204.
3. Dayton, C. M., & Macready, G. B. (1980). A scaling model with response errors and intrinsically
unscalable responses. Psychometrika, 45, 343–356.
4. De Leeuw, J., & Verhelst, N. (1986). Maximum likelihood estimation in generalized Rasch
models. Journal of Educational Statistics, 11, 183–196.
5. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete
data via the EM algorithm (with discussion). J R Stat Soc B, 39, 1–38.
6. Eshima, N., & Asano, C. (1988). On latent distance analysis and the MLE algorithm.
Behaviormetrika, 24, 25–32.
7. Eshima, N., & Asano, C. (1989). Latent ordered class analysis. Bull Comput Stat Jpn, 2, 25–34.
(in Japanese).
8. Eshima, N. (1992). A hierarchical assessment of latent traits by using latent Guttman scaling.
Behaviormetrika, 19, 97–116.
9. Lazarsfeld, P. F., & Henry, N. M. (1968). Latent structure analysis. Boston: Houghton Mifflin.
10. Lindsay, B., Clogg, C., & Grego, J. (1991). Semiparametric estimation in the Rasch model and
related exponential response models, including a simple latent class model for item analysis.
Journal of American Statistical Association, 86, 96–107.
11. Magidson, J., & Vermunt, J. K. (2001). Latent class factor and cluster models: Bi-plots, and
related graphical displays. Sociological Methodology, 31, 223–264.
12. McHugh, R. B. (1956). Efficient estimation of local identification in latent class analysis.
Psychometrika, 20, 331–347.
13. Price, L. C., Dayton, C. M., & Macready, G. B. (1980). Discovery algorithms for hierarchical
relations. Psychometrika, 45, 449–465.
14. Proctor, C. H. (1970). A probabilistic formulation and statistical analysis of Guttman scaling.
Psychometrika, 35, 73–78.
15. Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Illinois: The
University of Chicago Press.
16. Stouffer SA, Toby J (1951) Role conflict and personality. Am J Soc 56:395–406
17. Vermunt, J. K. (2010). Latent class models. Int Encycl Educ, 7, 238–244.
Chapter 4
Latent Class Analysis with Latent Binary
Variables: An Application for Analyzing
Learning Structures
4.1 Introduction
Usual latent class analysis is carried out without any assumptions on latent response
probabilities for test items. In this sense, latent classes in the analysis are treated
parallelly and the analysis is referred to as the latent class cluster analysis [10].
In Chap. 3, latent class analyses with ordered latent classes have been discussed
with models that incorporate the ordered structures into the models themselves. In
latent distance analysis, the response items are ordered with respect to the item levels
(difficulties) that are located in a one-dimensional latent continuum, and an individual
beyond the levels responds to the correspondent items with higher probabilities than
an individual with below the levels. The latent distance model can be applied to
learning studies as well, for example, assessing individuals’ acquisition states of
several skills for solving test binary items. Let X i , i = 1, 2, . . . , I be manifest
response variables corresponding to items i, such that
1 (success to item i)
Xi = , (4.1)
0 ( f ailur e)
In this case, the test scales the states of the skill acquisitions, which are not
observed directly, and thus, Si are viewed as latent binary variables. In the above
assumption, the following inequalities for success probabilities for test items are
naturally required:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 87
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_4
88 4 Latent Class Analysis with Latent Binary Variables …
If the skills under study have prerequisite relations, for example, skill i prerequisite
to skill i + 1, i = 1, 2, . . . , I , then, the latent states of skill acquisition are
and the latent states correspond to latent classes, so the model is the same as the
latent distance model. For example, assuming S1 is the state of addition skill in
arithmetic, S2 that of multiplication skill and S3 that of division skill, then, the
skill of addition is prerequisite to that of multiplication and the skill of multi-
plication is prerequisite to that of division, and the scale patterns (S1 , S2 , S3 ) are
(0, 0, 0), (1, 0, 0), (1, 1, 0), and(1, 1, 1). However, in general cases, skills under
consideration may have such a hierarchical order as the above, for example, for
skills S1 , S2 , S3 , S4 , there may be a case with skill patterns (0, 0, 0, 0),(1, 0, 0, 0),
(1, 1, 0, 0),(1, 1, 0, 0), (1, 0, 1, 0), (1, 1, 1, 0), (1, 1, 1, 1). For treating such cases,
extensions of the latent distance model were proposed by several authors [3–5, 8, 11].
In this chapter, latent class analysis with latent binary variables is discussed.
Section 4.2 reviews latent class models for dealing with scale patterns of the latent
variables. In Sect. 4.3, the ML estimation procedure for a structured latent class
model for explaining learning structures is discussed. Section 4.4 provides numerical
examples to demonstrate the analysis. In Sect. 4.5, an approach to consider learning
or developmental processes is given. Sections 4.6 and 4.7 consider a method for
evaluating mixed ratios of learning processes in a population. In Sect. 4.8, a path
analysis in learning and/or developmental structures is treated, and in Sect. 4.9, a
numerical example is provided to demonstrate the analysis. Finally, in Sect. 4.10,
a summary of the present chapter and discussions on the latent class analysis with
binary latent variables are given for leading to further studies to develop the present
approaches in the future.
In (4.1) and (4.2), let be the sample space of latent variable (skill or trait acquisition)
vector S = (S1 , S2 , . . . , S I ) and let v(s) be the latent class proportions with latent
variable vector S = s ∈ , where s = (s1 , s2 , . . . , s I ). Then, an extended version
of latent distance model (3.6) was made as follows [5]:
P(X = x) = v(s)P(X = x|S = s), (4.3)
s
4.2 Latent Class Model for Scaling Skill Acquisition Patterns 89
where
I
P(X = x|S = s) = P(X i = xi |Si = si )
i=1
I
xi 1−x i
exp(αi + si exp(βi )) 1
=
1 + exp(αi + si exp(βi )) 1 + exp(αi + si exp(βi ))
i=1
I
exp{xi (αi + si exp(βi ))}
= . (4.4)
1 + exp(αi + si exp(βi ))
i=1
In which follows, the term “skill” is employed for convenience of the discussion.
In the above model, the intrusion (guessing) and omission (forgetting) error prob-
abilities for responding to items i, P(X i = 1|Si = 0) and P(X i = 0|Si = 1), are,
respectively, expressed as follows:
exp(αi )
P(X i = 1|Si = 0) = , P(X i = 0|Si = 1)
1 + exp(αi )
1
= , i = 1, 2, . . . , I. (4.5)
1 + exp(αi + exp(βi ))
The above inequalities are satisfied by the structured model (4.5), so this model
is an extension of the following three models.
As reviewed in Chap. 3, in Proctor [11], the intrusion and omission error
probabilities were given by
In this model, the intrusion and omission error probabilities are constant through
test items. Following the above model, in Macready and Dayton [3], the following
error probabilities are used:
In the above model, the intrusion and omission error probabilities are, respectively,
constant through the items. In Dayton and Macready [4],
In this model, the intrusion and omission error probabilities are equal for each test
item. The above three models do not satisfy the inequalities (4.7) in the parameter
estimation, without making any structures as model (4.5). In the next section, an
ML estimation procedure for model (4.3) with (4.4) is given according to the EM
algorithm.
where
exp xi t αi + si exp t βi
t
P(X = x|S = s) = , xi = 0, 1.
1 + exp(t αi + si exp(t βi ))
φ (α,β)u+1 = φ (α,β)u − H −1
u g u , u = 1, 2, . . . , (4.13)
where
∂ Q φ| t φ
= t+1
n(x, s)(xi − P(X i = 1|Si = si )),
∂αi s x
i = 1, 2, . . . , I ;
∂ Q φ| t φ
= t+1
n(x, s)(xi − P(X i = 1|Si = si ))si exp(βi ),
∂βi s x
i = 1, 2, . . . , I ;
∂ 2 Q φ|t φ
=− t+1
n(x, s)P(X i = 1|Si = si )(1 − P(X i = 1|Si = si )),
∂ai2 s x
i = 1, 2, . . . , I ;
∂ 2 Q φ|t φ
=− t+1 n(x, s)P(X = 1|S = s )(1 − P(X = 1|S = s ))s exp(β ),
i i i i i i i i
∂αi ∂βi s x
i = 1, 2, . . . , I ;
92 4 Latent Class Analysis with Latent Binary Variables …
∂ 2 Q φ t φ
t+1
= n(x, s){xi − P(X i = 1|Si = si )
∂βi2 s x
− P(X i = 1|Si = si )(1 − P(X i = 1|Si = si ))si exp(βi )}si exp(βi ),
i = 1, 2, . . . , I ;
The above algorithm has been made, where the latent sample space has all the
2 I skill acquisition patters; however, to identify the latent class model, the number
of latent classes A are restricted by
The above algorithm has the following property. If we set the initial value of class
proportion v(s) as 0 v(s) = 0, from (4.10) we have 1 n(x, s) = 0 for all the manifest
response patterns x. From (4.12) it follows that 1 v(s) = 0, and inductively, we obtain
t
v(s) = 0, t = 1.2, 3, . . .
0 = {(0, 0, 0, 0), (1, 0, 0, 0), (1, 1, 0, 0), (1, 1, 1, 0), (1, 1, 1, 1)},
the above algorithm can be used for estimating the latent distance model. In the
present latent class analysis, it is meaningful to detect latent classes (skill acquisition
patterns) s with positive class proportions v(s) > 0. In the next section, through
numerical examples with practical data sets in Chap. 2, an exploratory method for
determining the latent classes is demonstrated.
By using the Stouffer-Toby data (Table 2.1), McHugh data (Table 2.5), and
Lazarsfeld-Stouffer data (Table 2.7), the present latent class analysis is demonstrated.
From restriction (4.15) with I = 4, we have A < 8, so the maximum number of latent
classes is seven. From this, considering response data in Tables 2.1, , 2.5, and 2.7,
the following skill acquisition patterns (latent classes) are assumed in the data sets
4.4 Numerical Examples (Exploratory Analysis) 93
Table 4.1 The sets of initial skill acquisition patterns (0 ) for the three data sets
Stouffer-Toby data (0, 0, 0, 0), (0, 1, 1, 0), (0, 0, 0, 1), (0, 1, 0, 1), (1, 1, 0, 1), (0, 0, 1, 1), (1, 1, 1, 1)
McHugh data (0, 0, 0, 0), (0, 1, 0, 0), (1, 1, 0, 0), (1, 1, 1, 0), (0, 0, 1, 1), (0, 1, 1, 1), (1, 1, 1, 1)
Lazarsfeld-Stouffer (0, 0, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1), (0, 1, 0, 1), (0, 0, 1, 1), (0, 1, 1, 1), (1, 1, 1, 1)
data
(Table 4.1) as the initial skill acquisition patterns (latent classes). In order to select
the best model, a backward elimination procedure is used. Let 0 be the initial set of
skill acquisition patterns, for example, in Table 4.1; M(0 ) be the initial model with
0 ; and let M (0 ) be the ML estimate of M(0 ). According to M (0 ), the latent
class with the minimum proportion v (s) is deleted from the initial skill acquisition
patterns. Let 1 be the set of the patterns left, and let M(1 ) be the model with 1 .
Then, the ML estimates M (0 ) and M (1 ) are compared with the log likelihood
ratio test or the Pearson chi-square test, and if the results are statistically significant
with significance level α, then, M(1 ) is accepted and the procedure continues simi-
larly by setting 1 as the initial skill acquisition pattern set, whereas if the results are
not significant, the procedure stops and model M(0 ) is selected as a most suitable
model. The algorithm is shown as follows:
Backward Elimination Procedure
(i) Set 0 as the initial skill acquisition patterns.
(iii) Delete the pattern sk with the minimum value of v (s) from k and set k+1 =
k \ {sk }
(v) If M (k+1 ) is accepted for (better than) M (k ) according to the loglikelihood
ratio test for the relative goodness-of-fit to data set, go to (iii) by substituting
k for k + 1, if not, the procedure stops.
According to the above procedure, we have the final models shown in Table 4.2.
For Stouffer-Toby data and Lazarsfel-Stouffer data, the results are the same as those
in Tables 2.3 and 2.9, respectively. It may be said that concerning Stouffer-Toby
data, there exist latent “universalistic” and “particularistic” states for responding to
the test items, and with respect to Lazarsfeld-Stouffer data, latent “favorable” and
“unfavorable” states to the Army. Hence, it means that all four skills (traits) are equal,
i.e., S1 = S2 = S3 = S4 . The learning structure is expressed as.
For McHugh data, the results are interpreted as S1 = S2 and S3 = S4 , and the
learning structure can be expected as in Fig. 4.1, and the following two learning
processes can be assumed:
Table 4.2 The results of the analysis of the three data sets
Item positive response probability
Pattern* Proportion** X1 X2 X3 X4
Stouffer-Toby (0, 0, 0, 0) 0.279 0.007 0.060 0.073 0.231
(1, 1, 1, 1) 0.721 0.286 0.670 0.646 0.868
Test of GF*** G 2 = 2.720, d f = 6, P = 0.843
McHugh (0, 0, 0, 0) 0.396 0.239 0.244 0.112 0.204
(1, 1, 0, 0) 0.077 0.894 0.996 0.112 0.204
(0, 0, 1, 1) 0.200 0.239 0.244 0.979 0.827
(1, 1, 1, 1) 0.327 0.894 0.996 0.979 0.827
Test of GF*** G 2 = 1.100, d f = 4, P = 0.894
Lazarsfeld-Stouffer (0, 0, 0, 0) 0.445 0.093 0.386 0.442 0.499
(1, 1, 1, 1) 0.555 0.572 0.818 0.906 0.944
Test of GF*** G 2 = 8.523, d f = 6, P = 0.202
* Skill Acquisition Pattern; ** Class Proportion; ***Test of Goodness-of-Fit
S1 → S2 → · · · → S I . (4.17)
and the space is called a learning space in this book. As in the previous section,
notation (sequence) (4.17) can also be expressed by using skill acquisition patterns
in learning space :
Since in the sequence of latent variables {Si } (4.17), Si+1 depends on only the
state of Si from the above discussion, we have the following theorem:
Theorem 4.1 Sequence (4.17) is a Markov chain.
From the above discussion, prerequisite relations among skills to be scaled can be
interpreted as learning processes shown in (4.17). Below, “structure” and “process”
will be used as the same in appropriate cases. Let
i
v(1, 1, . . . , 1, 0, . . . , 0)
q10,i =
I , q11,i = 1 − q10,i , i = 1, 2, . . . , 1 − 1.
v(1, 1, . . . , 1, 0, . . . , 0)
k=i
k
96 4 Latent Class Analysis with Latent Binary Variables …
Si = 1 ⇐⇒ S = (S1 , S2 , . . . , S I ) ∈ i
and
i
Si = 1, Si+1 = 0 ⇔ S = (1, 1, . . . , 1, 0, . . . , 0).
I
P(Si = 1) = P((S1 , S2 , . . . , S I ) ∈ i ) = v(1, 1, . . . , 1, 0, . . . , 0),
k=i k
⎛ ⎞
= v ⎝(1, 1, . . . , 1, 0, . . . , 0)⎠.
i
j−1
P S j = 1|Si = 1 = q11,k , j > i. (4.20)
k=i
4.5 Dynamic Interpretation of Learning (Skill Acquisition) Structures 97
Proof Since sequence (4.17) is a Markov chain with transition matrices (4.19), the
theorem follows.
The probabilities in (4.20) are calculated by multiplying the related path
coefficients, so we have the following definition:
Definition
learning structure (4.17), for j > i, probabilities
4.1 In
P S j = 1|Si = 1 in (4.20) are defined as the pathway effects of Si on S j through
path
q11,i q11,i+1 q11, j−1
Si → Si+1 → · · · → S j .
The pathway effects are denoted by e path Si → Si+1 → · · · → S j .
In the above definition, paths Si → Si+1 → · · · → S j are partial paths of (4.17).
If path Si1 → Si2 → · · · → Sik is not a partial path of (4.17), then, we set
e path Si1 → Si2 → · · · → Sik = 0.
The above discussion is applied to the results of the latent distance analysis (Table
3.2) of Stouffer-Toby data set (Table 2.1). Since
v (1, 0, 0, 0)
q 10,1 = = 0.489,
v (1, 0, 0, 0) + v (1, 1, 0, 0) + v (1, 1, 1, 0) + v (1, 1, 1, 1)
q 11,1 = 0.511,
v (1, 1, 0, 0)
v (1, 1, 1, 0)
All the pathway effects in sequence (4.21) are shown in Table 4.3.
For McHugh data, the results of latent class analysis show there are two learning
processes (4.16) (Fig. 4.1). In this case, for S1 (= S2 ) and S3 (= S4 ), the learning
structure is a mixture of the following processes:
and it is meaningful to consider the mixed ratios of the learning processes in the
population. To treat such cases, the next section discusses general learning structures.
3
v(s1 , s2 , s3 , s4 ) = v(s1 , s2 , s3 , s4 , Pr ocess k). (4.25)
k=1
4.6 Estimation of Mixed Proportions of Learning Processes 99
Fig. 4.2 Path diagram of (4.23) based on the sample space (4.24)
⎧
⎪
⎪ v(1, 0, 0, 0) = v(1, 0, 0, 0, Pr ocess 1)
⎨
v(1, 1, 0, 0) = v(1, 1, 0, 0, Pr ocess 1)
. (4.28)
⎪ v(0, 1, 1, 0) = v(0, 1, 1, 0, Pr ocess 2)
⎪
⎩
v(1, 0, 1, 0) = v(1, 0, 1, 0, Pr ocess 3)
3
wk = 1.
k=1
In this chapter, the above Eqs. (4.30)–(4.32) are called separating equations for
evaluating the mixed proportions wk , k = 1, 2, 3. From the above equations, we get
⎧
v(1,0,0,0)+v(1,1,0,0)
⎪
⎨ w1 = 1−(v(0,0,0,0)+v(1,1,1,0)+v(1,1,1,1))
v(0,1,1,0)
w2 = (1 − w1 ) v(0,1,1,0)+v(1,0,1,0) . (4.33)
⎪
⎩
w 3 = 1 − w1 − w2
Remark 4.2 Let us consider the solution of separating equations in (4.33). It is seen
that
0 < w1 < 1,
and in
so we have
4.6 Estimation of Mixed Proportions of Learning Processes 101
0 < w3 < 1.
Hence, solution (4.33) is proper, and such solutions are called proper solutions.
Properties of the separating equations are discussed generally in the next section.
By using the above method, the mixed proportions of learning processes 1 and
2 (4.22) in McHugh data are calculated. From Table 4.2, we have the following
equations:
w1 = w1 (v(0, 0, 0, 0) + (1, 1, 1, 1)) + v(1, 1, 0, 0),
w2 = 1 − w1 .
v(1, 1, 0, 0) v(0, 0, 1, 1)
w1 = , w2 = .
1 − (v(0, 0, 0, 0) + v(1, 1, 1, 1)) 1 − (v(0, 0, 0, 0) + v(1, 1, 1, 1))
(4.34)
Hence, from (4.34) and Table 4.2, the estimates of the mixed proportions are
calculated as follows:
0.077
w1 = = 0.278, w 2 = 1 − w 1 = 0.722.
1 − (0.396 + 0.327)
The separating equations introduced in the previous section for estimating the
mixed proportions of learning processes are considered in a framework of learning
structures. First, the learning structures are classified into two types.
Definition 4.2 If all learning processes in a population have skill acquisition patterns
peculiar to them, the learning structure is called a clear learning structure. If not, the
learning structure is referred to as an unclear learning structure.
In the above definition, learning structures (4.16) and (4.23) are clear learning
structures, as shown in Figs. 4.1 and 4.2. On the other hand, the following learning
structure is an unclear one:
⎧
⎨ Pr ocess1(w1 ) : S1 → S2 → S3 → S4 ,
Pr ocess2(w2 ) : S2 → S1 → S3 → S4 , (4.35)
⎩
Pr ocess3(w3 ) : S2 → S3 → S1 → S4 .
102 4 Latent Class Analysis with Latent Binary Variables …
Fig. 4.3 a Path diagram of (4.35). b Path diagram of the learning structure with Processes 1 and 2
in (4.35)
From the above structure, Fig. 4.3a is made. From the figure, learning processes
1 and 3 have skill acquisition patterns (1, 0, 0, 0) and (0, 1, 1, 0) peculiar to them,
respectively; however, there are no skill acquisition patterns peculiar to Process 2.
Even if Process 2 is deleted from (4.35), the sample space of (S1 , S2 , S3 , S4 ) is the
same as (4.35); however, the structure is expressed as in Fig. 4.3b, then, the structure
is a clear one. With respect to the learning structure (4.35), from Fig. 4.3a, we have
the following separating equations:
w1
w1 = v(1, 1, 0, 0) + w1 (v(0, 0, 0, 0) + v(1, 1, 1, 0) + v(1, 1, 1, 1))
w1 + w2
+ v(1, 0, 0, 0),
w2 w2
w2 = v(1, 1, 0, 0) + v(0, 0, 1, 0)
w1 + w2 w2 + w3
+ w2 (v(0, 0, 0, 0) + v(1, 1, 1, 0) + v(1, 1, 1, 1)),
w3
w3 = v(0, 0, 1, 0) + w3 (v(0, 0, 0, 0) + v(1, 1, 1, 0) + v(1, 1, 1, 1))
w2 + w3
+ v(0, 1, 1, 0).
v(1, 0, 0, 0) v(0, 1, 1, 0)
w1 = , w2 = 1 − w1 − w3 , w3 = . (4.36)
v(1, 0, 0, 0) + v(1, 1, 0, 0) v(0, 1, 0, 0) + v(0, 1, 1, 0)
In this solution, we see that 0 < w1 < 1, 0 < w3 < 1, however, there are
cases where condition 0 < w2 < 1 does not hold, for example, if v(1, 0, 0, 0) =
0.1, v(1, 1, 0, 0) = 0.1, v(0, 1, 1, 0) = 0.3, v(0, 1, 0, 0) = 0.1, then, we have
4.7 Solution of the Separating Equations 103
The above solution is improper. If we set w2 = 0, that is, a clear learning structure
is shown in Fig. 4.3b, the mixed proportions wk are calculated as follows:
v(1, 0, 0, 0) + v(1, 1, 0, 0)
w1 = , w2 = 0, w3 = 1 − w1 .
v(1, 0, 0, 0) + v(0, 1, 0, 0) + v(1, 1, 0, 0) + v(0, 1, 1, 0)
The above solution is viewed as a proper solution for learning structure with
Processes 1 and 3 in (4.35), and is referred to as a boundary solution for learning
structure (4.35). With respect to the separating equations, in general we have the
following theorem:
Theorem 4.4 Let a clear learning structure be made of K learning processes, and
let wk , k = 1, 2, . . . , K be the mixed proportions of the processes. Then, the set of
separating equations has a proper solution such that
wk > 0, k = 1, 2, . . . , K ,
and
K
wk = 1. (4.37)
k=1
Proof Suppose that a clear learning structure consists of Process 1, Process 2,…, and
Process K . Let be the sample space of skill acquisition patterns s = (s1 , s2 , . . . , s I )
in the clear learning structure; let k (= φ), k = 1, 2, . . . , K be the set of all skill
acquisition patterns peculiar to Process k, and let wk be the proportions of individuals
according to Process k in the population. Then, the separating equations are expressed
as follows:
% '
&
K
wk = v(s) + f k w1 , w2 , . . . , w K |v(s), s ∈ \ k , k = 1, 2, . . . , K ,
s∈k k=1
(4.38)
(K
where f k w1 , w2 , . . . , w K , s ∈ \ k=1 k are positive and continuous functions
(K
of wi , k = 1, 2, . . . , K , given v(s), s ∈ \ k=1 k . From k (= φ), we have
v(s) > 0, k = 1, 2, . . . , K .
s∈k
K
u k = 1.
k=1
Theorem 4.5 Let a learning structure be made of K learning processes, and let
wk , k = 1, 2, . . . , K be the mixed proportions of the processes. Then, the set of
separating equations has a solution such that
wk ≥ 0, k = 1, 2, . . . , K , (4.40)
and
K
wk = 1.
k=1
4.7 Solution of the Separating Equations 105
Proof In the clear learning structure, from Theorem 4.4, the theorem follows. On
the other hand, in the unclear learning structure, deleting some learning processes
from the structure, that is, setting the mixed proportions of the correspondent learning
processes as zeroes,wk = 0, then, we have a clear learning structure that has the same
sample space of skill acquisition patterns as the original unclear learning structure.
Then, we have the solution as in (4.40). This completes the theorem.
A general method for obtaining solutions of the separating equations is given. In
general, a system of separating equations is expressed as follows:
and the above set is convex and closed. Then, function (4.42) has a fixed point
(w1 , w2 , w3 ) ∈ C. From this, the fixed point can be obtained as a convergence value
of the following sequence (wn1 , wn2 , . . . , wn K ), n = 1, 2, . . . :
K
wk = 1
k=1
106 4 Latent Class Analysis with Latent Binary Variables …
K
e path Si → S j = wk e path Si → S j |Pr ocess k . (4.45)
k=1
The effects are the probabilities that paths Si → S j exist in the population
(learning structure). By using the above definition, path coefficients in a general
learning structure (4.44) can be calculated. In (4.35), for example, since
we have
Similarly, we have
Definition 4.4 Let path Si1 → Si2 → · · · → Si J be a partial path in (4.44). Then,
the pathway effect of Si1 on Si J is defined by
K
e path Si1 → Si2 → · · · → Si J = wk e path Si1 → Si2 → · · · → Si J |Pr ocess k .
k=1
By using learning structure (4.35), the above definition is demonstrated. The path
diagram of latent variables Si , i = 1, 2, 3, 4 is illustrated in Fig. 4.4. For example,
the pathway effect of S1 → S2 → S3 is
3
e path (S1 → S2 → S3 ) = wk e path (S1 → S2 → S3 |Pr ocess k)
k=1
4.8 Path Analysis in Learning Structures 107
3
e path (S3 → S1 → S4 ) = wk e path (S3 → S1 → S4 |Pr ocess k)
k=1
= w3 e path (S1 → S2 → S3 |Pr ocess 3).
From the above learning processes, we have seven sub-structures. Let Structure
(i) be the learning structures made by Process (i), i = 1, 2, 3; Structure (i, j) be
the structures composed of Processes i and j, i < j; and let Structure (1, 2, 3) be
the structure formed by three processes in (4.46). Then, the skill acquisition patterns
in the learning structures are illustrated in Table 4.5. From the table, Structures
(1,2,3) and (1,3) have the same skill acquisition patterns, so in this sense, we cannot
identify the structures from latent class model (4.3). The path diagram based on
108 4 Latent Class Analysis with Latent Binary Variables …
Fig. 4.5 Path diagram of skill acquisition patterns in learning Structure (1,2,3)
Fig. 4.6 Path diagram of skill acquisition patterns in learning Structure (1,3)
Table 4.6 The estimated positive response probabilities in learning Structure (1.2.3)
Skill acquisition pattern Class proportion Item response probability
Item 1 Item 2 Item 3 Item 4 Item 5
(0,0,0,0,0) 0.144 0.097 0.294 0.145 0.198 0.140
(0,0,0,0,1) 0.016 0.097 0.294 0.145 0.198 0.969
(0,0,0,1,0) 0.049 0.097 0.294 0.145 0.812 0.140
(0,0,0,1,1) 0.065 0.097 0.294 0.145 0.812 0.969
(0,0,1,1,0) 0.041 0.097 0.294 0.864 0.812 0.140
(0,0,1,1,1) 0.046 0.097 0.297 0.864 0.812 0.969
(0,1,1,1,1) 0.092 0.097 0.781 0.864 0.812 0.969
(1,1,1,1,1) 0.548 0.923 0.781 0.864 0.812 0.969
G2 = 16.499, (d f = 15, P = 0.350)
110 4 Latent Class Analysis with Latent Binary Variables …
v(0, 0, 0, 0, 1)
w1 = ,
v(0, 0, 0, 1, 0) + v(0, 0, 0, 0, 1)
w2 = 1 − w1 − w3 ,
v(0, 0, 1, 1, 0)
w3 = .
v(0, 0, 1, 1, 0) + v(0, 0, 0, 1, 1)
By using the estimates in Table 4.5, the estimates of the mixed proportions are
calculated as follows:
The above solution is a proper solution. The discussion in Sect. 4.5 is applied
to this example. Let v(s1 , s2 , s3 , s4 , s5 |Pr ocess k) be the proportions of individuals
with skills (s1 , s2 , s3 , s4 , s5 ) in Process k, k = 1, 2, 3. For example, considering
(4.47) for Process 1, we have
so it follows that
1
v(0, 0, 0, 0, 1|Pr ocess 1) = v(0, 0, 0, 0, 1)
w1
1
v(0, 0, 0, 1, 1|Pr ocess 1) = v(0, 0, 0, 1, 1),
w1 + w2
1
In Process 1 in (4.46), the sequence is a Markov chain and let q11,i be the related
transition probabilities, for example, q11,5 is related to path S5 → S4 and q11,4
1 1
to
path S4 → S3 , and so on. Then, from Theorem 4.2, we have
4.9 Numerical Illustration (Confirmatory Analysis) 111
From the above results and Table 4.6, first, we have the following path coefficients:
0.924 0.866 0.933 0.856
Pr ocess 1 w1 = 0.246 : S5 → S4 → S3 → S2 → S1 .
According to Definition 4.3, second, the path coefficients of the above learning
structure are calculated, for example, we have
e path (S4 → S3 ) = w1 e path (S4 → S3 |Pr ocess 1) + w3 e path (S4 → S3 |Pr ocess 3)
= 0.246 × 0.866 + 0.386 × 0.923 = 0.571.
All the path coefficients calculated as above are illustrated in Fig. 4.7a. Third,
some pathway effects in the learning structure are demonstrated. For example,
a b
Fig. 4.7 a Path coefficients of learning structure (4.46). b Path coefficients of learning Structure
(1,3)
and so on.
If the solution is improper, that is, there are negative estimates in the solution,
Process 2 is deleted from the learning structure (4.46). Then, the learning structure
becomes as follows:
Pr ocess 1(w1 ) : S5 → S4 → S3 → S2 → S1
, (4.50)
Pr ocess 3(w3 ) : S4 → S3 → S5 → S2 → S1
v(0, 0, 0, 1, 1) + v(0, 0, 0, 0, 1)
w1 = , w3 = 1 − w1 .
v(0, 0, 0, 1, 1) + v(0, 0, 0, 0, 1) + v(0, 0, 0, 1, 0) + v(0, 0, 1, 1, 0)
0.065 + 0.016
w1 = = 0.474, w 3 = 1 − w 1 = 0.526.
0.065 + 0.016 + 0.049 + 0.041
By using the above results, the path diagram of the skill acquisitions of Si , i =
1, 2, 3, 4, 5 is illustrated in Fig. 4.7b.
In this chapter, skill acquisition patterns, which are expressed as latent classes, are
explained with a latent class model that is an extended version of the latent distance
model discussed in Chap. 3. In the latent distance model, linear learning structures
are discussed, so the skill acquisition patterns are naturally ordered; however, in
the analysis of general learning structures, as treated in this chapter, such a natural
ordering the skill acquisition patterns cannot be made, for example, learning struc-
tures in Figs. 4.5 and 4.6, skill acquisition patterns, (0, 0, 0, 0, 1), (0, 0, 0, 1, 1),
(0, 0, 0, 1, 0), (0, 0, 1, 1, 0), cannot be order in a natural sense; however, it may
be required to assess the levels with a manner. In this example, since it is clear
pattern (0, 0, 0, 0, 0) is the lowest and (1, 1, 1, 1, 1) the highest, it is sensible to
measure distances from (0, 0, 0, 0, 0) or (1, 1, 1, 1, 1) to skill acquisition patterns
(s1 , s2 , s3 , s4 , s5 ) with a method. In order to do it, an entropy-based method for
measuring distances between latent classes proposed in Chap. 2 (2.30) is used. In
latent class model (4.3) with (4.4), let 0 = (0, 0, . . . , 0) and s = (s1 , s2 , . . . , s I ), and
let P(X = x|0) and P(X = x|s) be the conditional distributions with the skill acqui-
sition patterns, respectively. Then, from (2.30) the entropy-based distance between
the skill acquisition patterns, i.e., latent classes, is defined by
I
P(X i = 1|0 )
D ∗ (P(X = x|s )
P(X = x|0 ) ) = {P(X i = 1|0 ) log
i=1
P(X i = 1|si )
1 − P(X i = 1|0 )
+ (1 − P(X i = 1|0 )) log
1 − P(X i = 1|si )
P(X i = 1|si )
+ P(X i = 1|si ) log
P(X i = 1|0 )
114 4 Latent Class Analysis with Latent Binary Variables …
1 − P(X i = 1|si )
+ (1 − P(X i = 1|si )) log }.
1 − P(X i = 1|0 )
I
P(X i = 1|0) P(X i = 1|si )
= (P(X i = 1|0) − P(X i = 1|si )) log − log ,
1 − P(X i = 1|0) 1 − P(X i = 1|si )
i=1
(4.51)
where
Let
I
D ∗ (P(X = x|s)||P(X = x|0)) = D ∗ (P(X i = xi |si )||P(X i = xi |0))
i=1
I
= si D ∗ (P(X i = xi |1)||P(X i = xi |0)).
i=1
(4.52)
In Fig. 4.6, let 0 = (0, 0, . . . , 0) and s = (0, 0, 1, 1, 0), then, from (4.51) we have
4
D ∗ (P(X = x|s)||P(X = x|0)) = D ∗ (P(X i = xi |1)||P(X i = xi |0)).
i=3
Applying the above method to Table 4.6, the skill acquisition patterns are ordered.
Table 4.7 shows the entropy-based distances D ∗ (P(X i = xi |1)||P(X i = xi |0)), and
by using the distances, we have the distances D ∗ (P(X = x|s)||P(X = x|0)) which
are in an increasing order (Table 4.8). For example, with respect to skill acquisition
Table 4.7 Entropy-based distances D ∗ (P(X i = xi |1)||P(X i = xi |0)) with respect to manifest
variables X i for Table 4.5
Manifest variable X1 X2 X3 X4 X5
D ∗ (P(X i = xi |1)||P(X i = xi |0)) 3.894 1.046 2.605 1.757 4.359
4.10 A Method for Ordering Skill Acquisition Patterns 115
Table 4.8 Entropy-based distances D ∗ (P(X = x|s)||P(X = x|0)) with respect to skill acquisi-
tion patterns for Table 4.5
Skill acquisition pattern D ∗ (P(X = x|s)||P(X = x|0))
(0, 0, 0, 0, 0) 0
(0, 0, 0, 1, 0) 1.757
(0, 0, 0, 0.1) 4.359
(0, 0, 1, 1, 0) 4.362
(0, 0, 0, 1, 1) 6.116
(0, 0, 1, 1, 1) 8.721
(0, 1, 1, 1, 1) 9.767
(1, 1, 1, 1, 1) 13.661
patterns (0, 0, 1, 1, 0) and (0, 0, 0, 1, 1), the latter can be regarded as a higher level
than the former.
Remark 4.3 In the above method for grading the latent classes (skill acquisition
patterns), we can use the distances from 1 = (1, 1, . . . , 1) as well. Then, it follows
that
Hence, the results from the present ordering (grading) method based on 1 =
(1, 1, . . . , 1) are intrinsically the same as that based on 0 = (0, 0, . . . , 0).
In latent class model (4.3), let s1 = (s11 , s12 , . . . , s1I ) and s2 = (s21 , s22 , . . . , s2I )
be skill acquisition patterns. From (4.50) and (4.51), the difference between the two
skill acquisition patterns, i.e., latent classes, is calculated by
I
D ∗ (P(X = x|s1 )||P(X = x|s2 )) = |s1i − s2i |D ∗ (P(X i = xi |1)||P(X i = xi |0)).
i=1
(4.53)
For example, in the example shown in Tables 4.7 and 4.8, for s1 = (0, 0, 1, 1, 0)
and s2 = (0, 1, 1, 1, 1), the difference between the latent classes is calculated as
follows:
The difference calculated above can be interpreted as the distance between the
latent classes, measured in entropy. Figure 4.8 shows an undirected graph and the
values are the entropy-based difference between the latent classes. The distance
between the latent classes can be calculated by summing the values in the shortest
way between the latent classes, for example, in Fig. 4.8, there are two shortest ways
between (0, 0, 0, 1, 0) and (0, 0, 1, 1, 1):
By the first way, the distance is calculated as 4.359 + 2.605 = 6.964, and the
same result is also obtained from the second way. It may be significant to make a
tree graph of latent classes by using cluster analysis with entropy (Chap. 2, Sect. 5),
in order to show the relationship of the latent classes. From Fig. 4.8, we have a tree
graph of the latent classes (Fig. 4.9).
Fig. 4.8 Undirected graph for explaining the differences between the latent classes
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
00000 00010 00110 00001 00011 00111 01111 11111
Fig. 4.9 A tree graph of latent classes in Table 4.6 based on entropy
4.10 A Method for Ordering Skill Acquisition Patterns 117
Table 4.9 Entropy-based distances D ∗ (P(X i = xi |1)||P(X i = xi |0)) with respect to manifest
variables X i for Table 3.8 in Chap. 3
Manifest variable X 11 X 12 X 13 X 21 X 22 X 23
D ∗ (P(X i = xi |1)||P(X i = xi |0)) 3.193 4.368 3.637 4.413 2.855 5.543
Table 4.10 Entropy-based distances D ∗ (P(X = x|s)||P(X = x|0)) with respect to skill acquisi-
tion patterns for Table 3.8 in Chap. 3
Latent class D ∗ (P(X = x|s)||P(X = x|0)) Latent class D ∗ (P(X = x|s)||P(X = x|0))
(0, 0) 0 (0, 2) 8.398
(1, 0) 3.367 (1, 2) 12.036
(2, 0) 8.005 (2, 2) 16.404
(3, 0) 11.198 (3, 2) 19.596
(0, 1) 5.543 (0, 3) 12.812
(1, 1)ara> 9.180 (1, 3) 16.449
(2, 1) 13.549 (2, 3) 20.817
(3, 1) 16.741 (3, 3) 24,010
The present method for grading latent classes is applied to an example treated
in Sect. 3.4 (Chap. 3). The estimated latent classes are expressed as score vectors
(i, j), i = 0, 1, 2, 3; j = 0, 1, 2, 3, which imply pairs of levels for the general
intelligence θ1 and the verbal ability of children θ2 , respectively. In this example,
although the levels of child ability can be graded according to the sum of the scores,
t = i + j, it may be meaningful to use the present method for the grading. From
the estimated model shown in Table 3.8, we have the entropy-based distances with
respect to manifest variables (Table 4.9). Distances D ∗ (P(X = x|s)||P(X = x|0))
are calculated in Table 4.10, where 0 = (0, 0). By using the distances, grading of the
latent classes can be made. For example, for score t = 3, the order of latent classes
(3, 0), (2, 1) (1, 2), and (0.3) is as follows.
4.11 Discussion
The present chapter has applied latent class analysis to explain learning structures.
Skill acquisitions are scaled with the related test items (manifest variables), and the
states of skill acquisition are expressed by latent binary variables, and thus, manifest
responses measure the states with response errors, i.e., omission (forgetting) and
intrusion (guessing) ones. The structures expressed in this context are called the
learning structures in this book. When the skills under consideration are ordered
with respect to prerequisite relationships, for example, for skills in calculation, (1)
118 4 Latent Class Analysis with Latent Binary Variables …
addition, (2) multiplication, and (3) division, the learning structure is called a linear
learning structure. The model in this chapter is an extension of the latent distance
model. From the learning structure, the traces of skill learning process in a population
can be discussed through the path diagrams of skill acquisition patterns, and based
on the traces, dynamic interpretations of the learning structures can be made. In
general, learning structures are not necessarily linear, that is, there exist some learning
processes of skills in a population. Hence, it is valid to assume that the population
is divided into several subpopulations that depend on learning processes of their
own. The present chapter gives a method to explain learning processes of skills
by using cross-sectional data. It is assumed that manifest variables depend only on
the corresponding latent variables (states of skills); however, it is more realistic to
introduce “transfer effects” in the latent class models [1, 2]. In the above example
of skills for calculation, it is easily seen that the skill of addition is prerequisite to
that of multiplication. In this case, the mastery of the skill of multiplication will
facilitate the responses to test items for addition, i.e., a “facilitating” transfer effect
of multiplication on addition, and thus, it is more appropriate to take the transfer
effects into account to discuss learning structures. In the other way around, there
may be cases where “inhibiting” transfer effects are considered in analysis of learning
structures [9]. In this chapter, “transfer effects” have not been hypothesized on the
latent class model. It is significant to go into further studies to handle the transfer
effects as well as prerequisite relationships between skills in studies on learning.
Approaches to pairwise assessment of prerequisite relationships between skills were
made by several authors, for example, White and Clark [12], Macready [9], and
Eshima et al. [5]. It is the first attempt that Macready [7] dealt with the transfer
effects in a pairwise assessment of skill acquisition by using latent class models with
equality constraints. In order to improve the model, Eshima et al. [5] proposed a
latent class model structured with skill acquisition and transfer effect parameters for
making pairwise assessments of kill acquisition; however, transfer effect parameters
in the model are common to the related manifest variables. The study on the pairwise
assessment of prerequisite relationships among skills is important to explain learning
structures, and the studies based on latent structure models are left as significant
themes in the future.
References
6. Eshima, N., Asano, C., & Tabata, M. (1996). A developmental path model and causal analysis
of latent dichotomous variables. British Journal of Mathematical and Statistical Psychology,
49, 43–56.
7. Eshima, N., Asano, C., & Obana, E. (1990). A latent class model for assessing learning
structures. Behaviormetrika, 28, 23–35.
8. Goodman, L. A. (1975). A new model for scaling response patterns: An application of quasi-
independent concept. Journal of the American Statistical Association, 70, 755–768.
9. Macready, G. B. (1982). The use of latent class models for assessing prerequisite relations and
transference among traits. Psychometrika, 47, 477–488.
10. Magidson, J., & Vermunt, J. K. (2001). Latent class factor and cluster models: Bi-plots, and
related graphical displays. Sociological Methodology, 31, 223–264.
11. Proctor, C. H. (1970). A probabilistic formulation and statistical analysis of Guttman scaling.
Psychometrika, 35, 73–78.
12. White, R. T., & Clark, R. M. (1973). A test of inclusion which allows for errors of measurement.
Psychometrika, 38, 77–86.
Chapter 5
The Latent Markov Chain Model
5.1 Introduction
The Markov chain model is important for describing time-dependent changes of states
in human behavior; however, when observing changes of responses to a question
about a particular characteristic, individuals’ responses to the question may not reflect
the true states of the characteristic. As an extension of the Markov chain model, the
latent Markov chain model was proposed in an unpublished Ph.D. dissertation by
Wiggins L. M. in 1955 [5, 14]. The assumptions of the model are (i) at every observed
time point, a population is divided into several latent states, which are called latent
classes as well in the present chapter; (ii) an individual in the population takes one of
the manifest states according to his or her latent state at the time point; and (iii) the
individual changes the latent states according to a Markov chain. The assumptions
are the same as those of the hidden Markov model in time series analysis [6, 7].
In behavioral sciences, for individuals in a population, responses to questions about
particular characteristics may be observed several times to explain the changes of
responses. In this case, the individuals’ responses to the questions are viewed as the
manifest responses that may not reflect their true states at the observed time points,
that is, intrusion and omission errors have to be taken into consideration. The response
categories to be observed are regarded as the manifest states and the true states of
the characteristics, which are not observed directly, as the latent states. Concerning
parameter estimation in the latent Markov chain model, algebraic methods were
studied by Katz and Proctor [15] and [14]. The methods were given for cases where
the number of manifest states equals that of latent states, so were not able to treat
general cases. Moreover, the methods may derive improper estimates of transition
probabilities, for example, negative estimates of the probabilities, and any method
of assessing the goodness-of-fit of the model to data sets was not given. These
shortages hinder the application of the model to practical research in behavioral
sciences. The Markov chain model is a discrete-time model, and may be an asymptotic
one for the continuous-time model. In most social or behavioral phenomena, an
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 121
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_5
122 5 The Latent Markov Chain Model
Let X t be manifest variables that take values on sample space mani f est =
{1, 2, . . . , J } at time points t = 1, 2, . . . , and let St be the corresponding latent vari-
ables on sample space latent = {1, 2, . . . , A}. In what follows, states on mani f est
are called manifest states and those on laten latent ones. At time point t, it is assumed
an individual in a population takes a manifest state on mani f est according to his
latent state on latent and he changes the latent states according to a (first-order)
Markov chain St , t = 1, 2, . . . . First, the Markov chain is assumed to be time-
homogeneous, that is, the transition probabilities are independent of time points. Let
m ab , a, b = 1, 2, . . . , A be the transition probabilities; let va , a = 1, 2, . . . , A be
the probabilities of S1 = a, that is, the initial state distribution; and let pa j be the
probabilities of X t = j, given St = a, that is, pa j = P(X t = j|St = a), and let
p(x1 , x2 , . . . , x T ) be the probabilities with which an individual takes manifest state
transition x1 → x2 → · · · → x T . Then, the following accounting equations can be
obtained:
−1
T
p(x1 , x2 , . . . , x T ) = vs1 ps1 x1 m st st+1 pst+1 xt+1 , (5.1)
s t=1
5.2 The Latent Markov Chain Model 123
where the summation in the above equations implies that over all latent states s =
(s1 , s2 , . . . , sT ). The parameters are restricted as
A
A
J
va = 1, m ab = 1, pax = 1. (5.2)
a=1 b=1 x=1
The above equations specify the time-homogeneous latent Markov chain model
that is an extension of the Markov chain model, which is expressed by setting A = J
and paa = 1, a = 1, 2, . . . , A. The Markov chain model is expressed as
−1
T
p(x1 , x2 , . . . , x T ) = vx1 m xt xt+1 .
t=1
For the latent Markov chain model, the path diagram of manifest variables X t and
latent variables St is illustrated in Fig. 5.1.
Second, the non-homogeneous model, that is, the latent Markov chain model
with non-stationary transition probabilities, is treated. Let m (t)ab , a, b = 1, 2, . . . , A
be transition probabilities at time point t = 1, 2, . . . , T − 1. Then, the accounting
equations are given by
−1
T
p(x1 , x2 , . . . , x T ) = vs1 ps1 x1 m (t)st st+1 pst+1 xt+1 , (5.3)
S t=1
where
A
A
J
va = 1, m (t)ab = 1, pax = 1. (5.4)
a=1 b=1 x=1
In the above model, it is assumed that the manifest response probabilities pa j are
independent of time points. If the probabilities depend on the observed time points,
the probabilities are expressed as p(t)ax , and then, the above accounting equations
are modified as follows:
−1
T
p(x1 , x2 , . . . , x T ) = vs1 p(1)s1 x1 m (t)st st+1 p(t+1)st+1 xt+1 , (5.5)
S t=1
124 5 The Latent Markov Chain Model
where
A
A
J
va = 1, m (t)ab = 1, p(t)ax = 1. (5.6)
a=1 b=1 x=1
A
T
p(x1 , x2 , . . . , x T ) = va p(t)axt . (5.7)
a=1 t=1
−1
T
T
p(x1 , x2 , . . . , x T ) = vs1 m (t)st st+1 p(t)st xt , (5.8)
S t=1 t=1
Theorem 5.1 The latent class model (1.2) and the latent Markov chain model (5.5)
are equivalent.
Remark 5.1 The latent Markov chain models treated above have responses to
one question (manifest variable) at each observed time point. Extended versions of
the models can be constructed by introducing a set of questions (a manifest vari-
able vector) X = (X 1 , X 2 , . . . , X I ). For the manifest variable vector, responses are
observed as
x1 → x2 → · · · → xT ,
where
x t = (xt1 , xt2 , . . . , xt I ), t = 1, 2, . . . , T.
Setting T = 1, the above model is the usual latent class model (1.2). Let
pist xti , i = 1, 2, . . . , I be the response probabilities for manifest variables X t =
5.2 The Latent Markov Chain Model 125
I
(X 1 , X 2 , . . . , X I ), given latent state St = st , that is, i=1 pist xti at time points
t = 1, 2, . . . , T . Then, model (5.5) is extended as
−1
I T
I
p(x 1 , x 2 , . . . , x T ) = vs1 pis1 x1i m (t)st st+1 pist+1 xt+1i , (5.9)
s i=1 t=1 i=1
where
A
A
J
va = 1, m (t)ab = 1, piax = 1,
a=1 b=1 x=1
and notation s implies the summation over all latent state transitions s =
(s1 , s2 , . . . , sT ). An ML estimation procedure via the EM algorithm can be built
with a method similar to the above ones. The above model is related to a multi-
variate extension of the Latent Markov chain model with covariates by Bartolucci
and Farcomeni [3].
First, the EM algorithm is considered for model (5.1) with constraints (5.2). Let
n(x1 , x2 , . . . , x T ) be the numbers of individuals who take manifest state transitions
(responses) x → x2 → · · · → x T ; let n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT ) be those with
the manifest state transitions and latent state transitions, s1 → s2 → · · · → sT ; and
let N be the total of the observed individuals. Then,
n(x1 , x2 , . . . , x T ) = n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT )
s ,
N = x n(x1 , x2 , . . . , x T )
T
where s and x imply summations over s = (s1 , s2 , . . . , sT ) ∈ t=1 latent and
T
x = (x1 , x2 , . . . , x T ) ∈ t=1 mani f est , respectively. In this model, the complete and
incomplete data are expressed by sets Dcomplete = {n(x1 , x2 , . .
. , x T ; s1 , s2 , . . . , sT )}
and Dincomplete = {n(x1 , x2 , . . . , x T )}, respectively. Let ϕ = (v a ), (m (t)ab ), ( pax )
be the parameter vector. Then, we have the following log likelihood function of ϕ,
given the complete data:
⎧ ⎫
⎨ −1
T
T ⎬
l ϕ|Dincomplete = n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT ) logvs1 + logm st st+1 + log pst xt ,
⎩ ⎭
x,s t=1 t=1
(5.10)
126 5 The Latent Markov Chain Model
where x,s implies the summation over manifest and latent states transition patterns
x = (s1 , s2 , . . . , sT ) and s = (s1 , s2 , . . . , sT ). The model parameters ϕ are estimated
by the EM algorithm. Let r ϕ = (r va ), (r m ab ), (r pax )) be the estimates at the r th
iteration in the M-step; and let r +1 D complete = r +1 n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT )
be the conditional expectations of the complete data Dcomplete at the r + 1 th iteration
in the E-step. Then, the E- and M-steps are given as follows.
(i) E-step
In this step, the conditional expectation of (5.10) given parameters r ϕ and Dincomplete
is calculated, that is,
Q ϕ|r ϕ = E l ϕ|Dcomplete |r ϕ, Dincomplete . (5.11)
Since the complete data are sufficient statistics, the step is reduced to calculating
the conditional expectations of the complete data n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT )
and we have
r v T −1 r m T r
r +1 n(x , x , . . . , x ; s , s , . . . , s ) = n(x , x , . . . , x ) s1 t=1 st st+1 t=1 p st xt
1 2 T 1 2 T 1 2 T T −1 r T r .
s v s1 t=1 m st st+1 t=1 p st xt
r
(5.12)
(ii) M-step
Function (5.11) is maximized with respect to parameters vs1 , m st st+1 , and pst xt under
constraints (5.2). By using Lagrange multipliers, κ, λa , a = 1, 2, . . . , A and μc , c =
1, 2, . . . , A, the Lagrange function is given by
A
A
A
A
J
L = Q ϕ|r ϕ − κ va − λa m ab − μc pax . (5.13)
a=1 a=1 b=1 c=1 x=1
∂L ∂L
= 0, a = 1, 2, . . . , A; = 0, a, b = 1, 2, . . . , A;
∂va ∂m ab
∂L
= 0, a = 1, 2, . . . , A, x = 1, 2, . . . , J,
∂ pax
\1
where x,s\1 implies the summation over all x = (x 1 , x 2 , . . . , x T ) and s =
(s2 , s3 , . . . , sT );
T −1 r +1
r +1 t=1 x,s\t,t+1 n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , st−1 , a, b, st+2 , . . . , sT )
m ab = T −1 ,
r +1 n(x
t=1 x,s\t 1 , x 2 , . . . , x T ; s1 , s2 , . . . st−1 , a, st+1 , . . . , sT )
a, b = 1, 2, . . . , A, (5.15)
\t
where x,s\t implies the summation over all x = (x 1 , x 2 , . . . , x T ) and s =
(s1 , s2 , . . . , st−1 , st+1 , . . . , sT );
T −1 r +1 n
x , x , . . . , x
r +1 p t=1 ,x \t ,s\t 1 2 t−1 , b, x t+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT
ab = T −1
r +1 n x , x , . . . , x
t=1 ,x,,s\t, 1 2 t−1 , x t , x t+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT
a = 1, 2, . . . , A; b = 1, 2, . . . , J, (5.16)
where ,x \t ,s\t implies the summation over all x = (x1 , x2 , . . . , xt−1 , xt+1 , . . . , x T )
and s\t = (s1 , s2 , . . . , st−1 , st+1 , . . . , sT ).
Similarly, for the other models, the model identification conditions can be
derived.
Second, an ML
estimation procedure for model (5.3) with constraints (5.4) is
given. Let r ϕ = r va ), (r m (t)ab
), (r
pax ) be the estimates at the r th
iteration in the M-
step; and let r +1 D complete = r +1 n(x1 , x2 , . . . , x T |s1 , s2 , . . . , sT ) be the conditional
expectations of the complete data Dcomplete at the r + 1 th iteration in the E-step.
Then, the E- and M-steps are given as follows.
128 5 The Latent Markov Chain Model
(i) E-step
r +1
n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT )
T −1 r T r
r
v s1 t=1 m (t)st st+1 t=1 p st xt
= n(x1 , x2 , . . . , x T ) T −1 T
. (5.19)
S v s1
r r r
t=1 m (t)st st+1 t=1 p st xt
(ii) M-step
Estimates r +1 va and r +1 pax are given by (5.14) and (5.16), respectively. We have
r +1
m (t)ab as follows:
r +1
r +1 x,s\t,t+1 n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , st−1 , a, b, st+2 , . . . , sT )
m (t)ab = r +1 n(x
,
x,s\t 1 , x 2 , . . . , x T ; s1 , s2 , . . . st−1 , a, st+1 , . . . , sT )
a, b = 1, 2, . . . , A; t = 1, 2, . . . , T − 1. (5.20)
(ii) M-step
Estimates r +1 va and r +1 m (t)ab are given by (5.14) and (5.20), respectively.
Estimates r +1 p (t)st xt are calculated as follows:
r +1 n
x , x , . . . , x
r +1 p ,x \t ,s\t 1 2 t−1 , b, x t+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT
(t)ab =
,
r +1 n x , x , . . . , x
,x,,s\t 1 2 t−1 , x t , x t+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT
a = 1, 2, . . . , A; b = 1, 2, . . . , J ; t = 1, 2, . . . , T. (5.22)
5.4 A Property of the ML Estimation Procedure via the EM Algorithm 129
The parameter estimation procedures in the previous section have the following
properties.
Theorem 5.2 In the parameter estimation procedures (5.11)–(5.16) for the time-
homogeneous latent Markov chain model (5.1) with (5.2), if some of the initial trial
values 0 v a , 0 m ab , and 0 pab are set to extreme values 0 or 1, then, the iterative values
are automatically fixed to the values in the algorithm.
By using the above values, formula (5.16) derives 1 pab = 0. Hence, inductively
it follows that
t
pab = 0, t = 1, 2, . . . .
The above model can be estimated via the procedure mentioned in the previous
section by setting
0
m 12 = 0 m 13 = 0 m 21 = 0.
From Theorem 5.2, the above values are held fixed through iterations, that is,
r
m 12 = r m 13 = r m 21 = 0, r = 1, 2, . . . .
A = J, 0 pab = 0, a = b,
the estimation procedure derives the parameter estimates for the time-homogeneous
Markov chain model. In a general model (5.5) with (5.6), if we set
0
m (t)ab = 0, a = b,
and formally identify the states X t at time points t as item responses, the EM algorithm
for the model derives the ML estimates of the usual latent class model.
Table 5.1 shows the data [15] obtained by observing the changes in the configuration
of interpersonal relationships in a group of 25 pupils at three time points: September,
November, and January. They were asked “with whom would you like to sit?”, and
considering the state of each pair of pupils, the state concerned is one of the following
three states: mutual choice, one-way choice, and indifference that are coded as “2”,
“1”, and “0”, respectively. The observation was carried out three times, two-monthly.
In this case, the latent Markov chain model (5.1) or (5.5) can be used. First, the data
are analyzed by use of the latent Markov chain models and the Markov chain models
with three latent classes (states). The results of the analysis are shown in Table 5.2,
and the estimated parameters of Markov models are illustrated in Tables 5.3 and 5.4.
The latent Markov chain models fit the data set well according to the log likelihood
test statistic G 2 ,
n(x1 , x2 , . . . , x T )
G2 = 2 n(x1 , x2 , . . . , x T )log ,
x
p (x1 , x2 , . . . , x T )
Table 5.2 Results of the analysis of data set I with Markov models
Model G2 df P-val. AIC
Time-homogeneous Markov chain model 27.903 18 0.064 100.369
Time-homogeneous latent Markov chain model 18.641 16 0.288 95.106
Non-homogeneous Markov chain model 17.713 14 0.220 98.179
Non-homogeneous latent Markov chain model 12.565 12 0.401 97.030
Table 5.5 shows an artificial data set, and binary variables X ti , t = 1, 2 imply mani-
fest variables for the same items i = 1, 2, 3, where variables X ti , i = 1, 2, 3 are
indicators of latent variables St , t = 1, 2, assuming the three questions are asked to
the same individuals at two time points. All variables are binary, so the states are
denoted as 1 and 0. The data sets are made in order to demonstrate the estimation
132 5 The Latent Markov Chain Model
procedure for model (5.9) with two latent classes A = 2 and the number of obser-
vation time points T = 2. Since the ML estimation procedure via the EM algorithm
can be constructed as in Sect. 5.3, the details are left for readers. The results of
the parameter estimation are given in Table 5.6. According to the transition matrix,
latent state “1” may be interpreted as a conservative one, and latent state 2 a less
conservative one. In effect, the latent state distribution at the second time point is
calculated by
0.854 0.146
0.644 0.356 = 0.725 0.275 ,
0.492 0.508
and it implies that the individuals with the first latent state are increased. If necessary,
the distributions of St , t ≥ 3 are calculated by
0.854 0.146 t−1
0.644 0.356 , t = 3, 4, . . . .
0.492 0.508
Before constructing a more general model, a data set given in Bye and Schechter [8] is
discussed. Table 5.7 illustrates the data from Social Security Administration services,
and the individuals were assessed as severe or not severe with respect to the extent
of work limitations. The observations were made in 1971, 1972, and 1974, where
response “severe” is represented as “1” and “not severe” “0”. The interval between
1972 and 1974 is two years and that between 1971 and 1972 is one year, that is,
5.7 A Latent Markov Chain Model with Missing Manifest Observations 135
the time interval between 1972 and 1974 is twice as long as that between 1971
and 1972. When applying the Markov model to Data Set II, it is valid to assume
that the observation in 1973 was missed, though the changes of latent states took
place, that is. the transitions of manifest and latent states are X 1 → X 2 → X 3 and
S1 → S2 → U → S3 , respectively. Thus, the joint state transition can be expressed
by
(X 1 , S1 ) → (X 2 , S2 ) → U → (X 3 , S3 ).
In order to analyze the data set, a more general model was proposed by Bye
and Schechter [8]. By using the notations in the previous section, for the time-
homogeneous latent Markov chain model, the accounting equations are given by
A
3
p(x1 , x2 , x3 ) = vs1 m s1 s2 m s2 u m us3 pst xt , (5.23)
S u=1 t=1
(ii) M-step
A r +1
r +1 x,s\1 u=1 n(x1 , x2 , x3 ; a, s2 , s3 ; u)
va = , a = 1, 2, . . . , A; (5.25)
N
r +1 p
ab
T −1 Ar +1 n
x , x , . . . , x
t=1 ,x \t ,s\t u=1 1 2 t−1 , b, x t+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT ; u
=
T −1 A r +1
n x1 , x2 , . . . , xt−1 , xt , xt+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT ; u
t=1 ,x,,s\t, u=1
T = 3, a = 1, 2, . . . , A; b = 1, 2, . . . , J, (5.26)
r +1 Dab + E ab + Fab
m ab = A , (5.27)
b=1 (Dab + E ab + Fab )
where
A
r +1
Dab = n(x1 , x2 , x3 ; a, b, s3 : u),
x,s\1,2 u=1
136 5 The Latent Markov Chain Model
r +1
E ab = n(x1 , x2 , x3 ; s1 , a, s3 ; b),
x,s\2
r +1
Fab = n(x1 , x2 , x3 ; s1 , s2 , b; a).
x,s\3
For the data set, model (5.23) is used for A = 2, J = 2 and the ML estimates
of the parameters are obtained with the above procedure. Testing the goodness-of-fit
of the model to the data set, we have G 2 = 4.754, d f = 2, and P = 0.093 and it
implies the goodness-of-fit of the model to the data set is fair. The estimates of the
parameters are given in Table 5.8.
Remark 5.3 Bye and Schechter [8] uses the Newton method to obtain the ML
estimates of the parameters in (5.23) for A = J = 2. In order to keep the following
constraints
1 exp(β)
v1 = , v2 = ;
1 + exp(β) 1 + exp(β)
1 exp(δa )
m a1 = , m a2 = , a = 1, 2,
1 + exp(δa ) 1 + exp(δa )
1 exp(εa )
pa0 = , pa1 = , a = 1, 2.
1 + exp(εa ) 1 + exp(εa )
The time-homogeneous latent Markov chain model mentioned in the previous section
[8] is extended to a general one. For observed time points ti , i = 1, 2, . . . , T , manifest
responses X i , i = 1, 2, . . . , T are observed, and the responses depend on latent states
Si at time points, where time points ti are assumed integer, such that
If interval ti+1 − ti > 1, there are ti+1 − ti − 1 time points (integers) between ti
and ti+1 , and at the time points the latent states u i j , j = 1, 2, . . . , ti+1 − ti − 1 are
changed as follows:
whereas the manifest states are not observed at the time points when the above
sequences of latent states take place, for example, in Data Set II, it is assumed u 21
138 5 The Latent Markov Chain Model
would occur at time point 1973. The above chain (5.31) is denoted as u i j , i =
1, 2, . . . , T − 1 for short. Then, the changes of latent states are expressed as
s1 → u 1 j → s2 → u 2 j → · · · → u T −1, j → sT . (5.32)
The manifest variables X t are observed with latent state St and the responses
depend on the latent states at time points t, t = 1, 2, . . . , T . This model is
depicted in Fig. 5.3. It is assumed that sequence (5.32) with (5.31) is distributed
according
to a time-homogeneous Markov chain with transition matrix m i j . Let
p x1 , x2 , . . . , x T ; s1 , , s2 , , . . . , , sT ; u 1 j , u 2 j , . . . , u T −1, j be the joint proba-
bilities of manifest responses (x1 , x2 , . . . , x T ) and latent state transition (5.32). Then,
we have
p x1 , x2 , . . . , x T ; s1 , , s2 , , . . . , , sT ; u 1 j , u 2 j , . . . , u T −1, j
⎛ ⎞
T T−1 ht −1
where
t −1
h
m st u t1 m u t1 st+1 (h t = 1)
m st u t1 m u t,hi st+1 m u t j u t,i+1 = . (5.34)
m st st+1 (h t = 0)
j=1
In repeated measurements, the time units are various, for example, day, week,
month, and year. Although the observations are planned to make at regular intervals,
there may be cases where the practices are not carried out. On such occasions., the
above model may be feasible to apply to the cases. The ML estimation procedure
via the EM algorithm can be constructed by extending (5.24) to (5.27).
Human behavior or responses takes place continuously in time; however, our obser-
vation is made in discrete time points, for example, daily, weekly, monthly, and so on.
In this section, the change of latent states in latent is assumed to occur in a continuous
5.9 The Latent Markov Process Model 139
time. Before constructing the model, the Markov process model is briefly reviewed
[13]. It is assumed that an individual in a population takes decisions to change states
in time interval (0, t) according to a Poisson distribution with mean λt, where λ > 0,
and that
state
changes are taken place with a Markov chain with transition matrix
Q = qi j . Let ti , i = 1, 2, . . . be the decision time points that are taken place, and
let S(t) be a latent state at time point t. Then, given the time points ti , i = 1, 2, . . . ,
the following sequence is distributed according to the Markov chain with transition
matrix Q:
∞ ∞
−λt (λt) (λt)n−1 n
n
d
M(t) = −λ e Q + nλ
n
e−λt Q
dt n=0
n! n=0
n!
∞
(λt)n n
= −λM(t) + λ Q e−λt Q
n=0
n!
= −λM(t) + λ Q M(t) = λ( Q − E)M(t).
d
M(t) = R M(t).
dt
From the above equation, given the initial condition P(t) = E, we get
∞
1 n
exp(B) ≡ B ,
n=0
n!
where 0!1 B 0 ≡ E. In (5.36), matrix R = ri j is called a generator matrix and, from
the definition, the following constraints hold:
J
rii ≤ 0; ri j ≥ 0, i = j; ri j = 0. (5.37)
j=1
From the above equation, if we observe a change of states at every time interval
t (Fig. 5.5), the following sequence is the Markov chain with transition matrix
M(t):
S(t) → S(2t) → · · · → S(kt) → . . . .
Considering the above basic discussion, the latent Markov process model is
constructed. Let ti , i = 1, 2, . . . , K be time points to observe manifest states X (ti )
on finite state space mani f est ; let S(t) be the
latentMarkov process with generator
matrix R with constraints (5.37); let M(t) = m (t)i j be the transition matrix at time
point t; and let psx be the probabilities of X (ti ) = s, given S(ti ) = s. For simplicity
of the notation, given the time points, we use the following notation:
X i = X (ti ); Si = S(ti ), i = 1, 2, . . . , K .
Fig. 5.6 The latent Markov process due to observations at any time interval
Then, by using similar notations as for the latent Markov chain modes, the
following accounting equations can be obtained:
K −1
K
p(x1 , x2 , . . . , x T ) = vs1 m (ti+1 −ti )st st+1 pst xt . (5.40)
S i=1 j=1
The above model (X (t), S(t)) is called the latent Markov process model in the
present chapter. The process is illustrated in Fig. 5.6.
In order to estimate the model parameters va , ri j , and pab , the estimation procedure
in the previous section may be used, because it is complicated to make a procedure for
getting the estimates of ri j directly, that is, parameters ri j are elements of generator
matrix R in (5.36). Usually, repeated observations of state changes are carried out
in the intervals with time units, for example, daily, weekly, monthly, and so on, as
in Table 5.7, so such
time
points can be viewed as integers. From this, for transition
matrix M(t) = m (t)i j , we have
M = C DC −1 ,
where
⎛ ⎞
ρ1 0 ··· 0
⎜ 0 ρ2 · · · 0 ⎟
⎜ ⎟
D=⎜ . .. .. ⎟.
⎝ .. . ··· . ⎠
0 0 · · · ρA
(x − 1)(x − m 11 − m 22 + 1) = 0.
M = exp(R)
It follows that
5.9 The Latent Markov Process Model 143
−1
r r 1 m 12 0 0 1 m 12
R = 11 12 =
r21 r22 1 −m 21 0 log(m 11 + m 22 − 1) 1 −m 21
m 12 log(m 11 + m 22 − 1) −m 12 log(m 11 + m 22 − 1)
= .
−m 21 log(m 11 + m 22 − 1) m 21 log(m 11 + m 22 − 1)
(5.46)
and the condition for the generator matrix (5.37) is met for A = 2. Thus, the tran-
sition matrix (5.44) is embeddable under the condition (5.45). Applying the above
discussion to Table 5.8, since
condition (5.45) is satisfied by the estimated transition matrix. From Theorem 5.3,
there exists a unique generator matrix in equation M = exp(R) and then, by using
(5.43), we have
−0.085 0.085
R= .
0.046 −0.046
According to the above generator matrix, the transition matrix can be calculated
at any time point by (5.36), for example, we have
0.875 0.125 0.806 0.194
M(1.5) = , M(2.5) = ,
0.068 0.932 0.106 0.894
0.746 0.254 0.694 0.306
M(3.5) = , M(4.5) = ,...,
0.139 0.861 0.167 0.833
0.353 0.647
M(∞) = .
0.353 0.647
Next, for the following transition matrix with three latent states:
⎛ ⎞
m 11 m 12 m 13
M = ⎝ m 21 m 22 m 23 ⎠,
m 31 m 32 m 33
Setting
g11 g12 m 22 − m 12 m 23 − m 13
= , (5.47)
g21 g22 m 32 − m 12 m 33 − m 13
if
⎧
⎨ (g11 − 1)(g22 − 1) − g12 g21 = 0,
g11 g22 − g12 g21 > 0, (5.48)
⎩
g11 + g22 > 0,
from Theorem 5.3, the above matrix has a unique matrix R such that M = exp(R).
Remark 5.5 Condition (5.48) does not always imply the matrix obtained with (5.42)
is the generator matrix of a Markov process.
and from (5.48) the sufficient condition for getting the unique solution R of M =
exp(R) is checked as follows:
⎧
⎨ (g11 − 1)(g22 − 1) − g12 g21 = 0.321,
g11 g22 − g12 g21 = 0.159,
⎩
g11 + g22 = 0.838.
From this, the three conditions in (5.48) are met. In effect, we can get the generator
matrix by (5.43) as follows:
⎛ ⎞⎛ ⎞⎛ ⎞−1
1 0.145 0.138 log1 0 0 1 0.145 0.138
R = ⎝ 1 −0.491 −0.848 ⎠⎝ 0 log0.548 0 ⎠⎝ 1 −0.491 −0.848 ⎠
1 −0.859 0.512 0 0 log290 1 −0.859 0.512
⎛ ⎞
−0.148 0.166 −0.018
= ⎝ 0.641 −0.949 0.307 ⎠.
0.382 0.361 −0.743
However, the above estimate of the generator matrix is improper, because the
condition in (5.37) is not met, that is, r 13 = −0.018 < 0. For the latent Markov
chain model in Table 5.3, the transition matrix is estimated as
⎛ ⎞
0.957 0.043 0.000
M = ⎝ 0.116 0.743 0.141 ⎠.
0.279 0.101 0.621
The eigenvalues of the above transition matrix are distinct and positive. Through
a similar discussion above, we have the estimate of the generator matrix of the latent
Markov chain model as
⎛ ⎞
−0.046 0.051 −0.005
R = ⎝ 0.104 −0.314 0.210 ⎠.
0.353 0.140 −0.493
However, this case also gives an improper estimate of the generator. Although
the ML estimation procedure for the continuous-time mover-stayer model is compli-
cated, Cook et al. [9] proposed a generalized version of the model, and gave an ML
estimation procedure for the model.
146 5 The Latent Markov Chain Model
5.10 Discussion
This chapter has considered the latent Markov chain models for explaining changes
of latent states in time. The ML estimation procedures of the models have been
constructed via the EM algorithm, and the methods are demonstrated by using numer-
ical examples. As in Chap. 2, the latent states in the latent Markov chain model are
treated parallelly, so in this sense, the analysis can be viewed as latent class cluster
analysis as well, though the latent Markov analysis is a natural extension of the
latent class analysis. In confirmatory contexts as discussed in Sect. 5.4, as shown in
Theorem 5.2, the ML estimation procedures are flexible to handle the constraints for
setting some of the model parameters to the extreme values 0 s or 1 s. As in Chaps. 3
and 4, it is important to make the latent Markov models structured for measurement
of latent states in an ability or trait, in which logit models are effective to express
the effects of latent states [3]. For the structure of latent state transition, logit models
with the effects of covariates have been applied to the initial distribution and the
transition matrices in the latent Markov chain model [19], and the extensions have
also been studied by Bartolucci et al. [4], Bartolucci and Pennoni [5], and Bartolucci
and Farcomeni [3]. Such approaches may link to path analysis with generalized
linear models ([10–12], Chap. 6), and further studies for extending latent Markov
approaches will be expected.
References
12. Eshima, N., Tabata, M., & Zhi, G. (2001). Path analysis with logistic regression models:
Effect analysis of fully recursive causal systems of categorical variables. Journal of the Japan
Statistical Society, 31, 1–14.
13. Hatori, H., & Mori, T. (1993). Finite Markov Chains, Faifukan: Tokyo (in Japanese).
14. Lazarsfeld, P. F., & Henry, N. M. (1968). Latent structure analysis. Houghton Mifflin.
15. Katz, L., & Proctor, C. (1959). The concept of configuration of interpersonal relation in a group
as a time-dependent stochastic process. Psychometrika, 24, 317–327.
16. Singer, B., & Spilerman, S. (1975). Identifying structural parameters of social processes using
fragmentary data. Bulletin of International Statistical Institute, 46, 681–697.
17. Singer, B., & Spilerman, S. (1976). The representation of social processes by Markov models.
American Journal of Sociology, 82, 1–54.
18. Singer, B., & Spilerman, S. (1977). Fitting stochastic models to longitudinal survey data—some
examples in the social sciences. Bulletin of International Statistical Institute, 47, 283–300.
19. Vermunt, J. K., Langeheine, R., & Bockenholt, U. (1999). Discrete-time discrete-state latent
Markov models with time-constant and time-varying covariates. Journal of Educational and
Behavioral Statistics, 24, 179–207.
Chapter 6
The Mixed Latent Markov Chain Model
6.1 Introduction
As a model that explains time-dependent human behavior, the Markov chain has
been applied in various scientific fields [1, 2, 6, 7]. When employing the model for
describing human behavior, it may be usually assumed that every individual in a
population changes his or her states at any observational time point according to
the same low of probability, as an approximation; however, there are cases where
the population is not homogeneous, and it makes the analysis of human response
processes to be complicated [9]. In order to overcome the heterogeneity of the popu-
lation, it is valid to consider the population is divided into subpopulations that depend
on the Markov chains of their own. In order to analyze the heterogeneous popula-
tion, Blumen et al. [8] proposed the mover-stayer model, in which the population
is divided into two subpopulations of, what we call, “movers” and “stayers”. The
movers change their states according to a Markov chain and the stayers do not change
their states from the initial observed time points. Human behavior, which is observed
repeatedly, is more complicated than the mover-stayer model. An extended version
of the mover-stayer model is the mixed Markov chain model that was introduced
first by Poulsen, C. S. in 1982 in his Ph. D. dissertation, though the work was not
officially published [18, 19]. Eshima et al. [12, 13], Bye & Schechter [10], Van de
Pol & de Leeuw [20], and Poulsen [17] also discussed similar topics. Van de Pol [18]
proposed the mixed latent Markov chain model as an extension of the mixed Markov
chain model. Figure 1 shows the relation of the above Markov models, in which the
arrows indicate the natural directions of extension.
Following Chap. 5, this chapter provides a discussion of dynamic latent structure
analysis within a framework of the latent Markov chain model [11]. In Sect. 6.2,
dynamic latent structure models depicted in Fig. 6.1 are briefly reviewed, and the
equivalence of the latent Markov chain model and the mixed latent Markov chain
model is shown. Section 6.3 discusses the ML estimation procedure for the models via
the EM algorithm in relation to that for the latent Markov chain model. In Sect. 6.4,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 149
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_6
150 6 The Mixed Latent Markov Chain Model
Fig. 6.1 Relation of Markov models. *The directions expressed by the arrows imply the extensions
of the related models
In the present section, the mover-stayer model, the mixed Markov chain model, and
the mixed latent Markov chain model are reviewed.
(i) The mover-stayer model
Let = {1, 2, . . . , A} be the manifest state space of the Markov chain. It is assumed
that a population is divided into two types of individuals, movers and stayers, and
the proportions are set as λ1 and λ2 , respectively, where
λ1 + λ2 = 1.
Let va be probabilities that take the initial states a of the mover and let wa be
those of the stayer at the first time point, a = 1, 2, . . . , A. Then,
A
A
va = wa = 1.
a=1 a=1
−1
T −1
T
p(x1 , x2 , . . . x T ) = λ1 vx1 m xt xt+1 +λ2 wx1 δxt xt+1 , (6.1)
s t=1 s t=1
where the above summation is made over all response patterns s = (s1 , s2 , . . . , sT )
and
1 xt = xt+1
δxt xt+1 = .
0 xt = xt+1
Remark 6.1 When the Markov chain in (6.1) is time-dependent, the transition
probabilities for movers m xt xt+1 are substituted for m (t)xt xt+1 , t = 1, 2, . . . , T − 1.
K −1
T
p(x1 , x2 , . . . , x T ) = ψk vkx1 m kxt xt+1 , (6.2)
k=1 t=1
where
K
A
A
ψk = 1, vka = 1, m kab = 1.
k=1 a=1 b=1
For K = 2, setting
1a=b
m 2ab = ,
0 a = b
−1
T
K
p(x1 , x2 , . . . , x T ) = ψk vkx1 pks1 x1 pks,t+1 xt+1 m kst st+1 , (6.3)
k=1 s t=1
where
K
B
A
B
ψk = 1, vkb = 1, pkba = 1, m kbc = 1. (6.4)
k=1 b=1 a=1 c=1
For A = B and pkba = δab , where δab is the Kronecker delta, (6.3) expresses
(6.2), and setting K = 1, model (6.3) becomes the latent Markov chain model (5.1).
Remark 6.2 When manifest response probabilities and transition ones are depen-
dent on observed time points t, pkab and m kab are replaced by p(t)kab and m (t)kab ,
respectively.
Theorem 6.1 The latent Markov chain model and the mixed latent Markov chain
model are equivalent.
Proof In latent Markov chain model (5.1) with B = C K latent states, let latent
state space = {1, 2, . . . , B} of latent variables St be divided into K subspaces as
k = {C(k − 1) + 1, C(k − 1) + 2, . . . , Ck}, k = 1, 2, . . . , K . If the subspaces are
closed with respect to the state transition, the transition matrix of the latent Markov
chain model is given as the following type:
⎛ ⎞
M1 0 ··· 0
⎜ 0 M2 · · · 0 ⎟
⎜ ⎟
M=⎜ .. .. . . .. ⎟, (6.5)
⎝ . . . . ⎠
0 · · · · · · MK
6.2 Dynamic Latent Class Models 153
C
m C(k−1)+i,C(k−1)+ j = 1, i = 1, 2, . . . , C; k = 1, 2, . . . , K .
j=1
C
vC(k−1)+c
λk = vC(k−1)+i , v(k)c = , c = 1, 2, . . . , C; k = 1, 2, . . . , K , (6.6)
i=1
λk
it follows that
C
v(k)c = 1, k = 1, 2, . . . , K .
c=1
and the latent Markov chain model expresses the mixed latent Markov chain model.
This completes the theorem.
In Chap. 5, the equivalence of the latent class model and the latent Markov chain
model is shown in Theorem 5.1, so the following result also holds true:
Theorem 6.2 The mixed latent Markov chain model is equivalent to the latent class
model.
In this section, the ML estimation procedures of latent class models via the EM algo-
rithm are summarized. Although the methods can be directly made for the individual
latent structure models as shown in the previous chapters, the ML estimation proce-
dures are given through that for the latent Markov chain model in Chap. 5, based on
Theorem 5.2.
(i) The mover-stayer model
Let the manifest and latent state space be set as = {1, 2, . . . , A} and =
{1, 2, . . . , 2 A}, respectively, and let space be divided into the following two
subspaces:
If the initial trial values of estimates of the parameters in (5.15) and (5.16) in the
ML estimation algorithm are put, respectively, as follows:
⎧
⎨ 0 (a ∈ 1 , b ∈ 2 ; a ∈ 2 , b ∈ 1 )
0
m ab = 1 (a = b ∈ 2 ) , (6.7)
⎩
0 (a = b ∈ 2 )
and
1 (a = b or a = b + A)
0
pab = (6.8)
0 (other wise)
1
n(x1 , x2 , . . . , xt−1 , b, xt+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT ) = 0
(a = b or a = b + A). (6.10)
Hence, by setting the initial trial values of the parameters as (6.7) and (6.8), the
ML estimates of parameters in the mover-stayer model can be obtained via the EM
algorithm for the latent Markov chain model in (5.11) through (5.16).
(ii) The mixed Markov chain model
For manifest state space = {1, 2, . . . , A}, latent state space is divided into
K
where = k=1 k . If we set the initial trial values for (5.15) and (5.16) in the EM
algorithm for the latent Markov chain model (5.1) as
0
m ab = 0(a ∈ k , b ∈ l , k = l),
and
1 (a = b + A(k − 1), k = 1, 2, . . . , K )
0
pab =
0 (other wise)
pkcx = pc+C(k−1),x , c = 1, 2, . . . , C; x = 1, 2, . . . , A; k = 1, 2, . . . , K ,
setting the initial trial values of the parameters as in (6.5), the algorithm for the
latent Markov chain model in (5.11) through (5.16) makes the ML estimates of the
parameters.
In order to demonstrate the above discussion, for the data set in Table 5.1, the mover-
stayer model and the mixed latent Markov chain model are estimated via the EM
algorithm for the latent Markov chain model in Chap. 5. The mover-stayer model is
estimated with the method in the previous section. For the same patterns of initial trial
values of the parameters (6.7) and (6.8), we have the ML estimates of the parameters
in Table 6.1. According to the log likelihood test of goodness-of-fit to the data set,
156 6 The Mixed Latent Markov Chain Model
6.5 Discussion
The present chapter has treated a basic version of the mixed latent Markov chain
model, in which it is assumed the response probabilities to test items at a time point
and the transition probabilities depend only on the latent states at the time point.
In this sense, the basic model gives an exploratory analysis similar to the latent
6.5 Discussion 157
Table 6.2 The estimated parameters in the mixed latent Markov chain model
Initial distribution
Mover (latent state) Stayer (latent state)
1 2 3 4 5 6
0.215 0.124 0.033 0.611 0.000 0.017
Latent transition matrix
Latent state
1 2 3 4 5 6
Latent State 1 0.727 0.265 0.008 0* 0* 0*
2 0.368 0.491 0.141 0* 0* 0*
3 0.385 0.000 0.615 0* 0* 0*
4 0* 0* 0* 1.000 0.000 0.000
5 0* 0* 0* 0.000 0.598 0.402
6 0* 0* 0* 0.051 0.949 0.000
Response probability
Manifest state
1 2 3
Latent State 1 0.968 0.032 0.000
2 0.000 1.000 0.000
3 0.000 0.000 1.000
4 0.968 0.032 0.000
5 0.000 1.000 0.000
6 0.000 0.000 1.000
The log likelihood ratio statistic G 2 = 8.809, d f = 3, p = 0.032.
The numbers with “*” imply the fixed values
class cluster analysis. As discussed here, the model is an extension of the latent
structure models in Fig. 6.1 from a natural viewpoint; however, the mixed latent
Markov chain model is equivalent to the latent Markov chain model, and also to the
latent class model. The parameter estimation in the mixed latent Markov chain model
via the EM algorithm can be carried out by using that for the latent Markov chain
model as shown in Sect. 6.3, and the method has been demonstrated in Sect. 6.4.
The estimation algorithm is convenient to handle the constraints for extreme values,
setting a part of the response and transition probabilities as zeroes and ones, and the
property is applied to the parameter estimation in the mixed latent Markov chain
model. Applying the model to various research fields, there may be cases where
the response probabilities to manifest variables and the transition probabilities are
influenced by covariates and histories of latent state transitions, and for dealing with
such cases, the mixed latent Markov chain models have been excellently developed
in applications by Langeheine [16], Vermunt et al. [21], Bartolucci [3], Bartolucci &
Farcomeni [4], Bartolucci et al. [5], and so on. Figure 6.3 illustrates the manifest
158 6 The Mixed Latent Markov Chain Model
Fig. 6.3 A path diagram of the mixed latent Markov chain model with the effects of latent state
histories on manifest variables
where α(t) and β(t)t−1 are parameters. For polytomous variables, generalized logit
models can be constructed by considering phenomena under study. Similarly, for
Fig. 6.4, appropriate logit models can also be discussed. In such models, it may be
useful to make path analysis of the system of variables. Further developments of latent
Markov modeling in data analysis can be expected. In Chap. 7, an entropy-based
approach to path analysis [14, 15] is applied to latent class models.
References
1. Andersen, E. B. (1977). Discrete statistical models with social science application. Amsterdam:
North-Holland Publishing Co.
2. Bartholomew, D. J. (1983). Some recent development of social statistics. International
Statistical Review, 51, 1–9.
3. Bartolucci, F. (2006). Likelihood inference for a class of latent Markov models under linear
hypotheses on the transition probabilities. Journal of the Royal Statistical Society, B, 68, 155–
178.
4. Bartolucci, F., & Farcomeni, A. (2009). A multivariate extension of the dynamic logit model
for longitudinal data based on a latent Markov heterogeneity structure. Journal of the American
References 159
7.1 Introduction
The total effect = the direct effect + the indirect effect. (7.1)
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 161
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_7
162 7 Path Analysis in Latent Class Models
Eshima et al. [8] proposed path analysis for categorical variables in logit models by
using log odds ratios, and the above decomposition was given. Kuha and Goldthorpe
[16] also gave a path analysis method of categorical variables by using odds ratios,
however, decomposition (7.1) approximately holds true. Following the approaches,
an entropy-based method of path analysis for generalized linear models, which can
make the effect decomposition shown in (7.1), was proposed by Eshima et al. [9].
The present chapter applies a method of path analysis in Eshima et al. [8, 9]
to multiple-indicator, multiple-cause models and the latent Markov chain model.
Section 7.2 discusses a multiple-indicator, multiple-cause model. In Sect. 7.3, the
path analysis method is reviewed, and the effects of variables are calculated in some
examples. Section 7.4 gives a numerical illustration to make a path analysis in the
multiple-indicator, multiple-cause model. In Sect. 7.5, path analysis in the latent
Markov chain model is considered, and in Sect. 7.6, a numerical example is presented
to demonstrate the path analysis. Section 7.7 provides discussions and a further
perspective of path analysis in latent class models.
where notation s implies the summation over all latent variable patterns s =
(s1 , s2 , . . . , s K ). In this model, the following inequalities have to hold:
estimated according to the EM algorithm for the usual latent class model in Chap. 2,
imposing the equality constraints, there may be cases that the estimated models are
not identified with the hypothesized structures. Hence, in order to estimate the latent
probabilities P(X ki = 1|Sk = sk ), it is better to formulate the latent probabilities as
models in Chaps. 3 and 4, that is,
exp(αki )
1+exp(αki ) (sk = 0)
P(X ki = 1|Sk = sk ) = exp(αki +βki ) , (7.4)
1+exp(αki +βki ) (sk = 1)
where
Since f ki (xki |sk ) = P(X ki = xki |Sk = sk ), for binary latent variables Sk ,
formulae in (7.4) are expressed as
Similarly, we have
exp(s2 γ1 + s2 δ1 s1 )
g12 (s2 |s1 ) = , s1 , s2 ∈ {0,1}, (7.8)
1 + exp(γ1 + δ1 s1 )
f (u 1 , u 2 , u 3 , y) = f 1 (u 1 ) f 2 (u 2 |u 1 ) f 3 (u 3 |u 1 , u 2 ) f (y|u 1 , u 2 , u 3 ),
where functions f i (∗|∗), i = 1,2, 3 and f (∗|∗) imply the conditional probability
functions related to the variables. In Fig. 7.3, the relationship is expressed as follows:
7.3 An Entropy-Based Path Analysis of Categorical Variables 165
U1 ≺ U2 ≺ U3 ≺ Y.
the total, direct, and indirect effects of parent variables Uk on descendant variable
Y are discussed. For baseline category (U1 , U2 , U3 , Y ) = u ∗1 , u ∗2 , u ∗3 , y ∗ , the total
effect of (U1 , U2 , U3 ) = (u 1 , u 2 , u 3 ) on Y = y can be defined by the following log
odds ratio:
f (y|u 1 , u 2 , u 3 ) f y ∗ u ∗1 , u ∗2 , u ∗3
log ∗ ∗ ∗ = log f (y|u 1 , u 2 , u 3 ) − log f y ∗ |u 1 , u 2 , u 3
∗
f (y |u 1 , u 2 , u 3 ) f y|u 1 , u 2 , u 3
− log f y|u ∗1 , u ∗2 , u ∗3 − f y ∗ |u ∗1 , u ∗2 , u ∗3 .
3
= y − y ∗ βk u k − u ∗k . (7.10)
k=1
The above log odds ratio implies the decrease of the uncertainty of response
variable Y for a change of parent variables (U1 , U2 , U3 ) from baseline u ∗1 , u ∗2 , u ∗3 ,
that is, the amount of information on Y explained by the parent variables. Since the
logit model in (7.9) has no interactive effects of the explanatory variables Ui , the log
∗ ∗
odds ratio is a bilinear
∗ form
∗
of y−y
∗ ∗
and u k −u k with respect to regression coefficients
βk . The baseline u 1 , u 2 , u 3 , y is formally substituted for the related means of the
variables (μ1 , μ2 , μ3 , ν) and the total effect of (U1 , U2 , U3 ) = (u 1 , u 2 , u 3 ) on Y = y
is defined by
3
(y − ν)βk (u k − μk ). (7.11)
k=1
3
(y − ν(u 1 ))βk (u k − μk (u 1 )), (7.12)
k=2
166 7 Path Analysis in Latent Class Models
where ν(u 1 ) and μk (u 1 ) are the conditional means of Y and Uk , k = 2,3 given
1∗ =∗ u 1∗, respectively.
U The above formula can be derived by formally setting
u 1 , u 2 , u 3 , y ∗ = (u 1 , μ2 (u 1 ), μ3 (u 1 ), ν(u 1 )) in (7.10). Subtracting (7.11) from
(7.12), it follows that
Putting u ∗1 , u ∗2 , u ∗3 , y ∗ = (μ1 (u 2 , u 3 ), u 2 , u 3 , ν(u 2 , u 3 )), where ν(u 2 , u 3 ) and
μ1 (u 2 , u 3 ) are the conditional means of Y and U1 given (U2 , U3 ) = (u 2 , u 3 ), respec-
tively, the direct effect of U1 = u 1 on Y = y at (U2 , U3 ) = (u 2 , u 3 ) is defined
by
Remark 7.2 The effects defined in this section are interpreted in information, and
the exponentials of them are viewed as the multiplicative effects in odds ratios.
In order to summarize and standardize the effects based on log odds ratios, the
entropy coefficient of determination (ECD) [6] is used. In logit model (7.9), the
standardized summary total effect of (U1 , U2 , U3 ) on Y is given by
3
βk Cov(Y, Uk )
eT ((U1 , U2 , U3 ) → Y ) = 3 k=1
. (7.19)
k=1 βk Cov(Y, Uk ) + 1
Remark
3 7.3 By taking the expectation of (7.11) over all (u 1 , u 2 , u 3 ) and y, we have
β
k=1 k Cov(Y, Uk ) and (7.19) is ECD of (U1 , U2 , U3 ) and Y .
Summarizing and standardizing the effects from (7.12) to (7.17) as in (7.19), we
also have
3
βk Cov(Y, Uk |U1 )
eT ((U2 , U3 ) → Y ) = 3k=2 ,
k=1 βk Cov(Y, Uk ) + 1
3
βk Cov(Y, Uk ) − 3k=2 βk Cov(Y, Uk |U1 )
eT (U1 → Y ) = k=1 3 ,
k=1 βk Cov(Y, Uk ) + 1
β1 Cov(Y, U1 |U2 , U3 )
e D (U1 → Y ) = 3 ,
k=1 βk Cov(Y, Uk ) + 1
β3 Cov(Y, U3 |U1 , U2 )
eT (U3 → Y ) = e D (U3 → Y ) = 3 ,
k=1 βk Cov(Y, Uk ) + 1
3
βk Cov(Y, Uk |U1 )
eT ((U2 , U3 ) → Y ) = 3k=2 ,
k=1 βk Cov(Y, Uk ) + 1
168 7 Path Analysis in Latent Class Models
β2 Cov(Y, U2 |U1 , U3 )
e D (U2 → Y ) = 3 ,
k=1 βk Cov(Y, Uk ) + 1
where notations e D (∗) and eT (∗) imply the standardized summary direct and total
effects of the related variables, respectively. In the above path analysis, from (7.18),
we also have
3
eT ((U1 , U2 , U3 ) → Y ) = eT (Uk → Y ).
k=1
Next, path analysis for a path system in Fig. 7.4 is carried out. The joint probability
of the three variables is decomposed as follows:
f (u 1 , u 2 , y) = f 12 (u 1 , u 2 ) f (y|u 1 , u 2 ),
(y − ν)β1 (u 1 − μ1 ) + (y − ν)β2 (u 2 − μ2 ).
(y − ν(u 2 ))β1 (u 1 − μ1 (u 2 ))
and
β1 Cov(Y, U1 ) + β2 Cov(Y, U2 )
eT ((U1 , U2 ) → Y ) = ,
1 + β1 Cov(Y, U1 ) + β2 Cov(Y, U2 )
β1 Cov(Y, U1 |U2 )
e D (U1 → Y ) = ,
1 + β1 Cov(Y, U1 ) + β2 Cov(Y, U2 )
β2 Cov(Y, U2 |U1 )
e D (U2 → Y ) = ,
1 + β1 Cov(Y, U1 ) + β2 Cov(Y, U2 )
The above method can be applied to the causal system in Fig. 7.2b.
For S1 → X 1i , applying the discussion in Sect. 7.3 to (7.7), by using a method similar
to (7.11) we have the total (direct) effects of S 1 = s1 on X 1i = x1i as
β1i Cov(X 1i , S1 )
eT (S1 → X 1i ) = , i = 1,2, . . . , I1 . (7.21)
β1i Cov(X 1i , S1 ) + 1
δ1 Cov(S1 , S2 )
eT (S1 → S2 ) = . (7.23)
δ1 Cov(S1 , S2 ) + 1
From logit model (7.7), the total effects of (S1 , S2 ) = (s1 , s2 ) on X 2i = x2i are given
by
where ν2 (s1 ) and μ2i (s1 ) are the conditional expectation of X 2i and S2 given S1 = s1 ,
respectively. The variables are binary, so it follows that
(x2i − ν2i )β2i (s2 − μ2 ) − (x2i − ν2i (s1 ))β2i (s2 − μ2 (s1 )), i = 1,2, . . . , I2. .
β2i Cov(X 2i , S2 )
eT ((S1 , S2 ) → X 2i ) = , i = 1,2, . . . , I2 , (7.26)
1 + β2i Cov(X 2i , S2 )
β2i Cov(X 2i , S2 |S1 )
eT (S2 → X 2i ) = e D (S2 → X 2i ) = , i = 1,2, . . . , , (7.27)
1 + β2i Cov(X 2i , S2 )
More complicated models as shown in Fig. 7.5 can also be considered. The partial
path system of S1 , S2 , (X 1i ), and (X 2i ) can be analyzed as above, so it is sufficient
to discuss the partial path system of S1 , S2 , S3 , and (X 3i ) as shown in Fig. 7.6. The
diagram is a special case of Fig. 7.3. Hence, the discussion on the path diagram in
Fig. 7.3 can be directly employed.
For the path diagram in Fig. 7.2b, the effects of variables are calculated as follows:
From logit model (7.7), the total effects of (S1 , S2 ) = (s1 , s2 ) on X 1i are given by
Since the direct effects of S2 = s2 on X 1i = x1i are zero, the above effects are
also the total effects of S1 = s1 . With a method similar to (iii) in Subsection 7.4.1,
the direct effects of S1 = s1 on X 1i = x1i at S2 = s2 are given as follows:
where ν1 (s2 ) and μ1i (s2 ) are the conditional expectation of X 1i and S1 given S2 = s2 ,
that is,
(x1i − ν1i )β1i (s1 − μ1 ) − (x1i − ν1i (s2 ))β1i (s1 − μ1 (s2 )), i = 1,2, . . . , I1 .
The above effects are also the indirect effects of S2 = s2 on X 1i = x1i as well.
Summarizing and standardizing the above effects, we have
β1i Cov(X 1i , S1 )
eT ((S1 , S2 ) → X 1i ) = eT (S1 → X 1i ) = , i = 1,2, . . . , I1 ;
1 + β1i Cov(X 1i , S1 )
β1i Cov(X 1i , S1 |S2 )
e D (S1 → X 1i ) = , i = 1,2, . . . , I1 ; (7.31)
1 + β1i Cov(X 1i , S1 )
e D (S2 → X 1i ) = 0, i = 1,2, . . . , I1 .
Table 7.2 The estimated regression coefficients in (7.28), (7.30), and (7.32)
δ β11 β12 β13 β21 β22 β23
2.234 3.584 3.045 1.253 3.584 2.234 3.045
Table 7.3 The means of variables latent and manifest variables, St and X ti
μ1 μ2 ν11 ν12 ν13 ν21 ν22 ν23
0.4 0.5 0.48 0.34 0.42 0.45 0.55 0.4
Table 7.5 The total (direct) effects of Latent variable S1 on Manifest variables X 1i , i = 1,2, 3
S1 X 11 X 12 X 13 S1 → X 11 S1 → X 12 S1 → X 13
1 1 1 1 1.118 1.206 0.436
0 1 1 1 −0.745 −0.804 −0.291
1 0 0 0 −1.032 −0.621 −0.316
0 0 0 0 0.688 0.414 0.210
Mean effect β1i Cov(X 1i , S1 ) 0.602 0.438 0.090
eT (S1 → X 1i ) = e D (S1 → X 1i ) 0.376 0.305 0.083
The above effects are the changes of information in X 1i for latent variable S1 =
1 as explained in (7.10), and the following exponentials of the quantities can be
interpreted as the odds ratios:
For baselines ν1 and μ1i , i = 1,2, 3 in Table 7.3, the odds ratios with respect
to the variables (S1 , X 1i ), i = 1,2, 3 are 3.058, 3.340, and 1.547, respectively. The
standardized effects (7.21) are shown in the seventh row of Table 7.5. For example,
eT (S1 → X 11 ) = 0.376 implies that 37.6% of the variation of manifest variable X 11
in entropy is explained by latent variable S1 . By using (7.22) and (7.23), the effects
of latent variable S1 on S2 are calculated in Table 7.6, and the explanation of the table
can be given as in Table 7.5. In Table 7.7, the total effects of latent variables (S1 , S2 )
7.5 Numerical Illustration I 175
Table 7.7 The total effects of latent variables (S1 , S2 ) on manifest variables X 2i , i = 1,2, 3
S1 S2 X 21 X 22 X 23 (S1 , S2 ) → X 21 (S1 , S2 ) → X 22 (S1 , S2 ) → X 23
1 1 1 1 1 0.985 0.503 0.913
0 1 1 1 1 0.985 0.503 0.913
1 0 1 1 1 −0.985 −0.503 −0.913
0 0 1 1 1 −0.985 −0.503 −0.913
1 1 0 0 0 −0.806 −0.614 −0.609
0 1 0 0 0 −0.806 −0.614 −0.609
1 0 0 0 0 0.806 0.614 0.609
0 0 0 0 0 0.806 0.614 0.609
Mean effect β2i Cov(X 2i , S2 ) 0.627 0.279 0.457
eT ((S1 , S2 ) → X 2i ) 0.385 0.218 0.314
The exponentials of the above effects can be interpreted as odds ratios as in Table
7.5. The standardized effects are obtained through (7.26).
Remark 7.4 In Table 7.7, the absolute values of effects of (S1 , S2 ) = (i, j), i = 0,1
on X 2k are the same for j = 0,1; k = 1,2, 3, because of (7.24) and the mean of
S2 (= μ2 ) = 21 (Table 7.3).
Table 7.8 shows the total (direct) effects of latent variable S2 on manifest variables
X 2i , i = 1,2, 3. The effects are calculated with (7.24) and Table 7.4. The standardized
176 7 Path Analysis in Latent Class Models
Table 7.8 The total (direct) effects of latent variable S2 on manifest variables X 2i , i = 1,2, 3
S1 S2 X 21 X 22 X 23 S2 → X 21 S2 → X 22 S2 → X 23
1 1 1 1 1 0.244 0.134 0.256
1 1 0 0 0 −0.473 −0.313 −0.353
1 0 1 1 1 −0.975 −0.536 −1.023
1 0 0 0 0 1.892 1.251 1.413
0 1 1 1 1 1.731 0.860 1.534
0 1 0 0 0 −0.778 −0.704 −0.597
0 0 1 1 1 −0.742 −0.369 −0.658
0 0 0 0 0 0.333 0.302 0.256
Mean effect β2i Cov(X 2i , S2 |S1 ) 0.477 0.212 0.347
eT (S2 → X 2i ) 0.293 0.166 0.238
Table 7.9 The indirect effects of latent variable S1 on manifest variables X 2i , i = 1,2, 3
S1 S2 X 21 X 22 X 23 S2 → X 21 S2 → X 22 S2 → X 23
1 1 1 1 1 0.741 0.369 0.657
1 1 0 0 0 −0.333 −0.301 −0.256
1 0 1 1 1 −0.011 0.033 0.110
1 0 0 0 0 −1.086 −0.637 −0.804
0 1 1 1 1 −0.746 −0.357 −0.621
0 1 0 0 0 −0.028 0.090 −0.012
0 0 1 1 1 −0.243 −0.134 −0.256
0 0 0 0 0 0.473 0.312 0.353
Mean effect 0.151 0.067 0.110
β2i Cov(X 2i , S2 ) − β2i Cov(X 2i , S2 |S1 )
eT (S1 → X 2i ) = e I (S1 → X 2i ) 0.093 0.052 0.075
effects are given by (7.27). By subtracting Table 7.8 from Table 7.7, we have the
indirect effects of latent variable S1 on manifest variables X 2i , i = 1,2, 3, as shown
in Table 7.9.
In order to compare the results of path analysis in Figs. 7.7 and 7.8, the same param-
eters in Table 7.1 are used. In this case, the total effects of latent variable S1 on
manifest variables X 1i , i = 1,2, 3 are the same as in Table 7.5; however, the direct
effects of S1 are calculated according to (7.29), (7.30), and (7.31) and we have Table
7.10. Since according to Fig. 7.8, the total effects of (S1 , S2 ) on X 1i are the same
7.5 Numerical Illustration I 177
Table 7.10 The direct effects of latent variable S1 on manifest variables X 1i , i = 1,2, 3
S2 S1 X 11 X 12 X 13 S1 → X 11 S1 → X 12 S1 → X 13
1 1 1 1 1 0.245 0.228 0.191
1 0 1 1 1 −0.436 −0.974 −0.292
1 1 0 0 0 −1.045 −0.350 −0.304
1 0 0 0 0 1.858 1.492 0.466
0 1 1 1 1 2.228 1.885 0.744
0 0 1 1 1 −0.424 −0.662 −0.145
0 1 0 0 0 −0.783 −0.368 −0.304
0 0 0 0 0 0.149 0.129 0.059
Mean effect β1i Cov(X 1i , S1 |S2 ) 0.458 0.340 0.066
e D (S1 → X 1i ) 0.286 0.250 0.060
as those of S1 on X 1i , by subtracting Table 7.10 from Table 7.5, we have the indi-
rect effects of S1 on X 1i , which are shown in Table 7.11. The effects are also those
of S2 on X 1i . Similarly, the total effects of latent variable S2 on manifest variables
X 2i , i = 1,2, 3 are the same as those of (S1 , S2 ) shown in Table 7.7, and are given in
Table 7.11 The indirect effects of latent variable S1 (S2 ) on manifest variables X 1i , i = 1,2, 3
S2 S1 X 11 X 12 X 13 S1 → X 11 S1 → X 12 S1 → X 13
1 1 1 1 1 0.873 0.977 0.245
1 0 1 1 1 −0.310 0.170 0.001
1 1 0 0 0 2.163 1.556 0.740
1 0 0 0 0 −2.603 −2.296 −0.757
0 1 1 1 1 −3.260 −2.506 −1.060
0 0 1 1 1 1.112 1.076 0.355
0 1 0 0 0 −0.249 −0.253 −0.012
0 0 0 0 0 0.539 0.285 0.151
Mean effect 0.144 0.079 0.024
β1i Cov(X 1i , S1 ) − β1i Cov(X 1i , S1 |S2 )
e I (S1 → X 1i ) = e I (S2 → X 1i ) 0.090 0.055 0.022
178 7 Path Analysis in Latent Class Models
Table 7.12 The total effects of latent variables S2 on manifest variables X 2i , i = 1,2, 3
S2 X 21 X 22 X 23 S2 → X 21 S2 → X 22 S2 → X 13
1 1 1 1 0.985 0.503 0.913
0 1 1 1 −0.985 −0.503 −0.913
1 0 0 0 −0.806 −0.614 −0.609
0 0 0 0 0.806 0.614 0.609
Mean effect β2i Cov(X 2i , S2 ) 0.627 0.279 0.457
eT (S2 → X 2i ) 0.385 0.218 0.314
Table 7.12. The direct effects of S2 on X 2i , i = 1,2, 3 are the same as those in Table
7.8. The indirect effects of S1 on X 2i , i = 1,2, 3 are the same as those calculated in
Table 7.9, and the effects are also the indirect effects of S2 on X 2i , i = 1,2, 3, based
on Fig. 7.8.
The above method is applied to McHugh data, which were on test on creative
ability in machine design (Chap. 2). Assuming latent skills Si for solving subtests
X i , i = 1,2, 3,4, a confirmatory latent class model for explaining a learning structure
is used, and the results of the analysis are given in Table 4.2. From the results, two
latent skills S1 (= S2 ) and S3 (= S4 ) for solving the test can be assumed. In Chap. 4,
assuming learning processes in a population, a path analysis has been performed.
In this section, the model is viewed as a multiple-indicator, multiple-cause model,
and the present method of path analysis is applied. The path diagram of the manifest
and latent variables is shown in Fig. 7.9. By using the present approach, we have the
mean effects (Table 7.13). From the table, the mean total effects of (S1 , S2 ) on X 1
and X 2 are equal to those of S1 , and the mean total effects of (S1 , S2 ) on X 3 and X 4
are equal to those of S2 . The indirect effects of S1 on X i , i = 1,2, 3,4 are equal to
those of S2 . Thus, in the path diagram in Fig. 7.9, the indirect effects are induced by
the association between the latent variables S1 on S2 . Using the mean effects in Table
7.13, the standardized effects are calculated according to the present method (Table
7.14). In order to interpret the effects based on entropy as ECD, Table 7.14 illustrates
the standardized effects of the mean effects shown in Table 7.13. As shown in the
table, the indirect effects are relatively small, for example, in the standardized effects
of S1 on X 1 and X 2 , the indirect effects are moderate, and about 1/4 times the direct
effects.
Table 7.13 Mean effects of latent variables S1 and S3 on manifest variables X i , i = 1,2, 3,4
Mean effect Manifest variable
X 11 X 21 X 12 X 22
Total effect of S1 and S2 0.519 1.204 1.278 0.454
Total effect of S1 0.519 1.204 0.277 0.099
Direct effect of S1 0.406 0.943 0 0
Indirect effect of S1 0.113 0.261 0.277 0.009
Total effect of S2 0.113 0.261 1.278 0.454
Direct effect of S2 0 0 1.001 0.356
Indirect effect of S2 0.113 0.261 0.277 0.009
Table 7.14 Standardized effects of latent variables S1 and S3 on manifest variables X i , i = 1,2, 3,4
Standardized effect Manifest variable
X 11 X 21 X 12 X 22
Total effect of S1 and S2 0.342 0.546 0.561 0.312
Total effect of S1 0.342 0.546 0.122 0.068
Direct effect of S1 0.268 0.428 0 0
Indirect effect of S1 0.074 0.118 0.122 0.068
Total effect of S2 0.074 0.118 0.561 0.312
Direct effect of S2 0 0 0.439 0.245
Indirect effect of S2 0.074 0.118 0.122 0.068
The present path analysis is applied to the latent Markov chain model treated in
Chap. 5. As in Sect. 5.2 in Chap. 5, let X t be manifest variables that take values on
sample space mani f est = {1,2, . . . , J } at time points t = 1,2, . . . , and let St be the
corresponding latent variables on sample space latent = {1,2, . . . , A}, which are
assumed a first-order time-homogeneous Markov chain. Let m ab , a, b = 1,2, . . . , A
be the transition probabilities; let qa , a = 1,2, . . . , A be the probabilities of S1 = a,
that is, the initial state distribution; and let pa j be the probabilities of X t = j, given
St = a, that is, pa j = P(X t = j|St = a) and the probabilities are independent of
time t. In order to make a general discussion, the following dummy variables for
manifest and latent categories are introduced. Let
1 for X t = j, 1 for St = a,
Xt j = and Sta =
0 otherwise, 0 otherwise.
Then, manifest and latent variables X t and St are identified to the following
dummy variable vectors, respectively:
180 7 Path Analysis in Latent Class Models
where
⎛ ⎞ ⎛ ⎞
α1 β11 β12 · · · β1J
⎜ α2 ⎟ ⎜ β21 β22 · · · β2J ⎟
⎜ ⎟ ⎜ ⎟
α=⎜ .. ⎟, B = ⎜ .. .. . . .. ⎟, (7.32)
⎝ . ⎠ ⎝ . . . . ⎠
αJ β A1 β A1 · · · β A J
⎛ ⎞ ⎛ ⎞
γ1 δ11 δ12 · · · δ1A
⎜ γ2 ⎟ ⎜ δ21 δ22 · · · δ2 A ⎟
⎜ ⎟ ⎜ ⎟
γ =⎜ .. ⎟, = ⎜ .. .. . . .. ⎟. (7.33)
⎝ . ⎠ ⎝ . . . . ⎠
γA δ A1 δ A1 · · · δ A A
Figure 7.10 shows the path diagram of the latent Markov chain model treated
above. According to the model, the following sequence is a Markov chain:
S1 → S2 → S3 → · · · → St → X t . (7.34)
where
μt = E(St ), ν t = E(X t ).
where
Since the total effect of S3 = s3 on X 3 = x 3 at (S1 , S2 ) = (s1 , s2 ) is given by
s3T − μ3T (s2 ) B(x 3 − ν 3 (s2 )), (7.38)
= s3T − μ3T (s1 ) B(x 3 − ν 3 (s1 )) − s3T − μ3T (s2 ) B(x 3 − ν 3 (s2 )).
S1 → S2 → S3 ,
the effects of S1 and S2 on S3 are computed as follows. Since the above sequence
is a Markov chain, the direct effects of S1 on S3 are zeroes. The total effect of
(S1 , S2 ) = (s1 , s2 ) on S3 = s3 is given by
s2T − μ2T (s3 − μ3 ). (7.39)
In the next section, the above path analysis for the latent Markov chain is
demonstrated by using artificial data.
Remark 7.6 In order to determine regression parameters βa j and δab in (7.32) and
(7.33), respectively, we have to put a constraint on the parameters. In this section,
we set
Then, we have
pa j p11 m ab m 11
βa j = log , j = 2,3, . . . , J ; δab = log , b = 2,3, . . . , A. (7.41)
pa1 p1 j m a1 m 1b
Remark 7.7 In sequence (7.34), we have
7.6 Path Analysis of the Latent Markov Chain Model 183
where
the conditional distribution vectors μtT (sk ) and ν t (sk ) are given by appropriate rows
of matrices M t−k and M t−k P, respectively. For example, for skT = (1,0, . . . , 0),
μtT (sk ) and ν t (sk ) are obtained as the first rows of the matrices, respectively. If the
Markov chain with transition matrix M is irreducible and recurrent, we have
⎛ ⎞
π1 π2 · · · πA
⎜ π1 π2 · · · πA ⎟
⎜ ⎟
M t−k → = ⎜ .. .. .. ⎟, as t → ∞, (7.44)
⎝ . . ··· . ⎠
π1 π2 · · · πA
where
A
πa ≥ 0, a = 1,2, . . . , A; πa = 1.
a=1
In the model treated in the previous section, for A = J = 3, let the response
probability and latent transition matrices be set as
184 7 Path Analysis in Latent Class Models
⎛ ⎞ ⎛ ⎞
p11 p12 p13 0.8 0.1 0.1
⎝ p21 p22 p23 ⎠ = ⎝ 0.2 0.7 0.1 ⎠,
p31 p32 p33 0.1 0.2 0.7
⎛ ⎞ ⎛ ⎞
m 11 m 12 m 13 0.6 0.3 0.1
⎝ m 21 m 22 m 23 ⎠ = ⎝ 0.2 0.7 0.1 ⎠, (7.45)
m 31 m 32 m 33 0.1 0.3 0.6
respectively; and for the initial distribution of latent state S1 = (S11 , S12 , S13 ),
⎛ ⎞t−1
m 11 m 12 m 13
μtT = (μt1 , μt2 , μt3 ) = μ1T ⎝ m 21 m 22 m 23 ⎠
m 31 m 32 m 33
= 0.1 × 0.5 − 0.1 × 0.4t−1 + 0.3,0.1 × 0.4t−1 + 0.5,0.2 − 0.1 × 0.5t−1 ,
t−1
In the present example, state vector skT in (7.43) is one of the following unit
vectors:
The above effects are those of (S1 , S2 , S3 ) on X 3 , for example, for S3 = (0,1, 0)T ,
the effect on X 3 = (0,0, 1)T is −0.330, which is in the second row and the third
column. Since sequence S1 , S2 , S3 , and X 3 is a Markov chain, the effects in (7.47) are
independent of latent states S1 and S2 . The mean total effect of elements in (7.47) can
be obtained as 0.633. Hence, the entropy coefficient of determination of explanatory
186 7 Path Analysis in Latent Class Models
0.633
EC D((S1 , S2 , S3 ), X 3 ) = EC D((S3 , X 3 ), X 3 ) = = 0.387.
0.633 + 1
From path analysis [9], the above quantity is the standardized summary total
effect of (S1 , S2 , S3 ) on X 3 , and is denoted by eT ((S1 , S2 , S3 ) → X 3 ). Similarly,
from (7.36) the total effects of (S2 , S3 ) = (s2 , s3 ) on X 3 = x 3 are given as follows:
⎛ ⎞
0.892 −0.922 −0294
⎝ −0.595 0.922 −0.394 ⎠ at S1 = (1,0, 0)T ; (7.48a)
−0.891 0.066 1.949
⎛ ⎞
1.355 −0.994 −0.053
⎝ −0.451 0.532 −0.473 ⎠ at S1 = (0,1, 0)T ; (7.48b)
−0.694 0.270 1.924
⎛ ⎞
1.729 −0.7804 −0.464
⎝ −0.045 0.775 −0.855 ⎠ at S1 = (0,0, 1)T . (7.48c)
−0.727 0.463 1.106
In the above effects, for example, for s1 = (0,1, 0)T , s3 = (0,0, 1)T , and x 3 =
(1,0, 0)T , the total effect is −0.694 which is in the third row and first column of the
matrix (7.48b). Sequence S1 , S2 , S3 , and X 3 is a Markov chain, so the above effects
of (S2 , S3 ) = (s2 , s3 ) on X 3 = x 3 are independent of latent state S2 . The conditional
means of the above effects given S1 = s1 are obtained by
⎧
S1 = (1,0, 0)T
T
⎨ 0.652
E S3T − μ3T (s1 ) B(X 3 − ν 3 (s1 ))|s1 = 0.584 .
⎩ S1 = (0,1, 0)T
0.658 S1 = (0,0, 1)
Since the distribution of S1 is in (7.46), we have that the mean total effect of
(S2 , S3 ) on X 3 is computes by
Subtracting the effects in (7.48a)–(7.48c) from those in (7.47), from (7.37), the
total (indirect) effects of S1 = s1 on X 3 = x 3 through (S2 , S3 ) = (s2 , s3 ) are
obtained by
⎛ ⎞
0.334 −0.127 0.285
⎝ 0.113 −0.348 0.064 ⎠ S1 = (1,0, 0)T ; (7.50a)
0.107 −0.354 0.058
7.7 Numerical Illustration II 187
⎛ ⎞
−0.130 −0.058 0.044
⎝ −0.031 0.041 0.143 ⎠ S1 = (0,1, 0)T ; (7.50b)
−0.090 −0.018 0.083
⎛ ⎞
−0.503 −0.271 0.455
⎝ −0.433 −0.201 0.525 ⎠ S1 = (0,0, 1)T . (7.50c)
−0.057 0.175 0.901
In the above matrices, for s1 = (0,0, 1)T , s3 = (0,0, 1)T , and x 3 = (0,1, 0)T ,
the total effect is calculated in the third row the second column in matrix (7.50c)
and given by 0.175. The above effects are independent of latent states S2 , because
sequence S1 , S2 , S3 , and X 3 is a Markov chain. The summary total effect of S1 on
X 3 through (S2 , S3 ) is given by subtracting that of (S2 , S3 ), i.e., 0.612, from that of
(S1 , S2 , S3 ), i.e., 0.633. Hence, the effect is
0.021
eT (S1 → X 3 ) = = 0.013.
0.633 + 1
In the above effects, for s2 = (0,1, 0)T , s3 = (1,0, 0)T , and x 3 = (0,0, 1)T , the
effect is in the first row and the third column in matrix (7.51b) and is 0.230. Since
the marginal distribution of S2 is given by
from the same way as in (7.49), we have the conditional mean of the effects in
(7.51a)–(7.51c) as
0.099
eT (S2 → X 3 ) = = 0.061.
0.633 + 1
S1 → S2 → S3 → X 3 ;
7.7 Numerical Illustration II 189
7.8 Discussion
In this chapter, path analysis has been made in latent class models, i.e., multiple-
indicator, multiple-cause models and the latent Markov chain model. In path analysis,
it is critical how the effects of variables are measured and also how the total effects of
variables are decomposed into the sums of the direct and indirect effects. Although
the approach is significant for discussing causal systems of categorical variables,
path analysis of categorical variables is more complicated than that of continuous
variables, because in the former analysis the effects of categories of parent vari-
ables on those of descendant variables have to be calculated. In order to assess the
effects and to summarize them, in this chapter, an entropy-based path analysis [9]
has been applied to latent class models in a GLM framework. In the approach, the
total and direct effects are defined through log odds ratio and the effects can be inter-
preted in information (entropy). From this, although the indirect effects are defined
by subtracting the direct effects from the total effects, the effects can also be inter-
preted in information. This point is significant for putting the analysis into practice.
Measuring pathway effects based on the method of path analysis is important as
well, and further development of pathway effect analysis is left to readers. More-
over, applications of the present approach to practical latent class analyses are also
expected in future studies.
References
1. Albert, J. M., & Nelson, S. (2011). Generalized causal mediation analysis. Biometrics, 1028–
1038.
2. Bentler, P. M., & Weeks, D. B. (1980). Linear structural equations with latent variables.
Psychometrika, 45, 289–308.
3. Christoferson, A. (1975). Factor analysis of dichotomous variables. Psychometrika, 40, 5–31.
4. Eshima, N., & Tabata, M. (1999). Effect analysis in loglinear model approach to path analysis
of categorical variables. Behaviormetrika, 26, 221–233.
5. Eshima, N., & Tabata, M. (2007). Entropy correlation coefficient for measuring predictive
power of generalized linear models. Statistics and Probability Letters, 77, 588–593.
6. Eshima, N., & Tabata, M. (2010). Entropy coefficient of determination for generalized linear
models. Computational Statistics and Data Analysis, 54, 1381–1389.
7. Eshima, N., Asano, C., & Obana, E. (1990). A latent class model for assessing learning
structures. Behaviormetrika, 28, 23–35.
8. Eshima, N., Tabata, M., & Geng, Z. (2001). Path analysis with logistic regression models:
Effect analysis of fully recursive causal systems of categorical variables. Journal of the Japan
Statistical Society, 31, 1–14.
190 7 Path Analysis in Latent Class Models
9. Eshima, N., Tabata, M., Borroni, C. G., & Kano, Y. (2015). An entropy-based approach to path
analysis of structural generalized linear models: A basic idea. Entropy, 17, 5117–5132.
10. Fienberg, S. E. (1991). The analysis of cross-classified categorical data (2nd ed.). Cambridge,
England: The MIT Press.
11. Goodman, L. A. (1973b). The analysis of multidimensional contingency tables when some
variables are posterior to others: A modified path analysis approach. Biometrika, 60, 179–192.
12. Goodman, L. A. (1973a). Causal analysis of data from panel studies and other kinds of surveys.
American Journal of Sociology, 78, 1135–1191.
13. Goodman, L. A. (1974). The analysis of systems of qualitative variables when some of the
variables are unidentifiable: Part I. A modified latent structure approach. American Journal of
Sociology, 79, 1179–1259.
14. Hagenaars, J. A. (1998). Categorical causal modeling: Latent class analysis and directed
loglinear models with latent variables. Sociological Methods & Research, 26, 436–489.
15. Jöreskog, K.G., & Sörbom, D. (1996). LISREL8: user’s reference guide (2nd ed.). Chicago:
Scientific Software International.
16. Kuha, J., & Goldthorpe, J. H. (2010). Path analysis for discrete variables: The role of education
in social mobility. Journal of Royal Statistical Society, A, 173, 351–369.
17. Lazarsfeld, P. F. (1948). The use of panels in social research. Proceedings of the American
Philosophical Society, 92, 405–410.
18. Macready, G. B. (1982). The use of latent class models for assessing prerequisite relations and
transference among traits, Psychometrika, 47, 477-488.
19. McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London: Chapman
and Hall.
20. Muthen, B. (1978). Contribution of factor analysis of dichotomous variables. Psychometrika,
43, 551–560.
21. Muthen, B. (1984). A general structural equation model with dichotomous ordered categorical
and continuous latent variable indicators. Psychometrika, 49, 114–132.
22. Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear model. Journal of the Royal
Statistical Society A, 135, 370–384.
23. Owston, R. D. (1979). A maximum likelihood approach to the “test of inclusion.” Psychome-
trika, 44, 421–425.
24. White, R. T., & Clark, R. M. (1973). A test of inclusion which allows for errors of measurement.
Psychometrika, 38, 77–86.
25. Wright, S. (1934). The method of path coefficients. The Annals of Mathematical Statistics, 5,
161–215.