0% found this document useful (1 vote)
178 views196 pages

An Introduction To Latent Class Analysus - Nobuoki Eshima

Uploaded by

Javiera Antu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
178 views196 pages

An Introduction To Latent Class Analysus - Nobuoki Eshima

Uploaded by

Javiera Antu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 196

Behaviormetrics:

Quantitative Approaches to Human Behavior 14

Nobuoki Eshima

An Introduction
to Latent Class
Analysis
Methods and Applications
Behaviormetrics: Quantitative Approaches
to Human Behavior

Volume 14

Series Editor
Akinori Okada, Professor Emeritus, Rikkyo University,
Tokyo, Japan
This series covers in their entirety the elements of behaviormetrics, a term that
encompasses all quantitative approaches of research to disclose and understand
human behavior in the broadest sense. The term includes the concept, theory,
model, algorithm, method, and application of quantitative approaches from
theoretical or conceptual studies to empirical or practical application studies to
comprehend human behavior. The Behaviormetrics series deals with a wide range
of topics of data analysis and of developing new models, algorithms, and methods
to analyze these data.
The characteristics featured in the series have four aspects. The first is the variety
of the methods utilized in data analysis and a newly developed method that includes
not only standard or general statistical methods or psychometric methods
traditionally used in data analysis, but also includes cluster analysis, multidimen-
sional scaling, machine learning, corresponding analysis, biplot, network analysis
and graph theory, conjoint measurement, biclustering, visualization, and data and
web mining. The second aspect is the variety of types of data including ranking,
categorical, preference, functional, angle, contextual, nominal, multi-mode
multi-way, contextual, continuous, discrete, high-dimensional, and sparse data.
The third comprises the varied procedures by which the data are collected: by
survey, experiment, sensor devices, and purchase records, and other means. The
fourth aspect of the Behaviormetrics series is the diversity of fields from which the
data are derived, including marketing and consumer behavior, sociology, psychol-
ogy, education, archaeology, medicine, economics, political and policy science,
cognitive science, public administration, pharmacy, engineering, urban planning,
agriculture and forestry science, and brain science.
In essence, the purpose of this series is to describe the new horizons opening up
in behaviormetrics — approaches to understanding and disclosing human behaviors
both in the analyses of diverse data by a wide range of methods and in the
development of new methods to analyze these data.

Editor in Chief
Akinori Okada (Rikkyo University)

Managing Editors
Daniel Baier (University of Bayreuth)
Giuseppe Bove (Roma Tre University)
Takahiro Hoshino (Keio University)

More information about this series at https://link.springer.com/bookseries/16001


Nobuoki Eshima

An Introduction to Latent
Class Analysis
Methods and Applications
Nobuoki Eshima
Department of Pediatrics and Child Health
Kurume University
Kurume, Fukuoka, Japan

ISSN 2524-4027 ISSN 2524-4035 (electronic)


Behaviormetrics: Quantitative Approaches to Human Behavior
ISBN 978-981-19-0971-9 ISBN 978-981-19-0972-6 (eBook)
https://doi.org/10.1007/978-981-19-0972-6

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface

In observing human behaviors and responses to various stimuli and test items, it
is valid to assume they are dominated by factors, for example, attitudes and abili-
ties in Psychology, and in Sociology, belief, social customs and folkways, and so
on; however, it is rare to observe and measure such factors directly. The factors are
hypothesized components that we employ in scientific researches, and the factors
are unobservable and treated as latent variables to explain phenomena under consid-
eration. The factors are sometimes called latent or internal factors. Although the
latent factors cannot be measured directly, data of the observable variables, such as
responses to test items and interviews with respect to national elections, are obtain-
able. The variables are referred to as the manifest variables. Based on the observa-
tions, the latent factors have to be estimated, and in order to do it, latent structure
analysis was proposed by Lazarsfeld (1950). For extracting latent factors, it is needed
to collect multivariate data of manifest variables, and in measuring the variables, then,
we have to take response errors into consideration, which are those induced from
physical and mental conditions of subjects, intrusion (guessing) and omission (forget-
ting) errors, and so on. When observing results of a test battery from examinees or
subjects, it is critical how their abilities can be assessed and our interest is how we
order the examinees according to the latent factors instead of simple scores, that is,
sums of item scores, by using the test battery. Latent structure analysis is classified
into latent class analysis, latent trait analysis and latent profile analysis according to
the types of manifest and latent variables, in a strict sense. Latent class analysis treats
discrete (categorical) manifest and latent variables; latent trait analysis deals with
discrete manifest variables and continuous latent variables; and latent profile analysis
handles continuous manifest variables and discrete latent variables. The purpose of
latent structure analysis is similar to that of factor analysis (Spearman, 1904), so in
a wide sense, factor analysis is also included in latent structure analysis. Introducing
latent variables in data analysis is ideal; however, it is sensible and meaningful to
explain the phenomena under consideration by using latent variables. In this book,
latent class analysis is taken up in the focus of discussion, and applications of latent
class models to data analyses are treated in the several themes, that is, exploratory
latent class analysis, confirmatory latent class analysis, analysis of longitudinal data,

v
vi Preface

path analysis with latent class models and so on. Along with it, latent profile and
latent trait models are also treated in the parameter estimation. The author would like
to expect the present book to play a significant role in introducing latent structure
analysis to not only young researchers and students studying behavioral sciences,
but also those investigating in the other scientific research fields.

Kurume, Japan Nobuoki Eshima

References
Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis. In
S. A. Stuffer, L. Guttman, and others (Eds.), Measurement of Prediction: Studies in Social
Psychology in World War II, 4, Princeton University Press.
Spearman, S. (1904). “General-intelligence”, objectively determined and measured. American
Journal of Psychology, 15, 201–293
Acknowledgements

I would like to express my sincere gratitude to Prof. Yushiro Yamashita, chairman of


the Department of Pediatrics & Child Health, Kurume University School of Medicine,
for providing me with excellent environments and encouragements to complete this
book. I would also be very much indebted to Dr. Shigeru Karukaya for his useful
advice to throw myself into finishing this book.

vii
Contents

1 Overview of Basic Latent Structure Models . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Latent Class Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Latent Trait Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Latent Profile Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Factor Analysis Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Latent Structure Models in a Generalized Linear Model
Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 The EM Algorithm and Latent Structure Models . . . . . . . . . . . . . . . 14
1.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 Latent Class Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 The ML Estimation of Parameters in the Latent Class Model . . . . 18
2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Measuring Goodness-of-Fit of Latent Class Models . . . . . . . . . . . . 27
2.5 Comparison of Latent Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Latent Profile Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3 Latent Class Analysis with Ordered Latent Classes . . . . . . . . . . . . . . . . 47
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Latent Distance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Assessment of the Latent Guttman Scaling . . . . . . . . . . . . . . . . . . . . 57
3.4 Analysis of the Association Between Two Latent Traits
with Latent Guttman Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5 Latent Ordered-Class Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.6 The Latent Trait Model (Item Response Model) . . . . . . . . . . . . . . . . 78
3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

ix
x Contents

4 Latent Class Analysis with Latent Binary Variables:


An Application for Analyzing Learning Structures . . . . . . . . . . . . . . . . 87
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2 Latent Class Model for Scaling Skill Acquisition Patterns . . . . . . . 88
4.3 ML Estimation Procedure for Model (4.3) with (4.4) . . . . . . . . . . . 90
4.4 Numerical Examples (Exploratory Analysis) . . . . . . . . . . . . . . . . . . 92
4.5 Dynamic Interpretation of Learning (Skill Acquisition)
Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.6 Estimation of Mixed Proportions of Learning Processes . . . . . . . . . 98
4.7 Solution of the Separating Equations . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.8 Path Analysis in Learning Structures . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.9 Numerical Illustration (Confirmatory Analysis) . . . . . . . . . . . . . . . . 107
4.10 A Method for Ordering Skill Acquisition Patterns . . . . . . . . . . . . . . 113
4.11 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5 The Latent Markov Chain Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.2 The Latent Markov Chain Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.3 The ML Estimation of the Latent Markov Chain Model . . . . . . . . . 125
5.4 A Property of the ML Estimation Procedure via the EM
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.5 Numerical Example I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.6 Numerical Example II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.7 A Latent Markov Chain Model with Missing Manifest
Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.8 A General Version of the Latent Markov Chain Model
with Missing Manifest Observations . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.9 The Latent Markov Process Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6 The Mixed Latent Markov Chain Model . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.2 Dynamic Latent Class Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.3 The ML Estimation of the Parameters of Dynamic Latent
Class Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.4 A Numerical Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7 Path Analysis in Latent Class Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.2 A Multiple-Indicator, Multiple-Cause Model . . . . . . . . . . . . . . . . . . 162
7.3 An Entropy-Based Path Analysis of Categorical Variables . . . . . . . 164
Contents xi

7.4 Path Analysis in Multiple-Indicator, Multiple-Cause Models . . . . . 169


7.4.1 The Multiple-Indicator, Multiple-Cause Model
in Fig. 7.2a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.4.2 The Multiple-Indicator, Multiple-Cause Model
in Fig. 7.2b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.5 Numerical Illustration I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.5.1 Model I (Fig. 7.2a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.5.2 Model II (Fig. 7.2b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.6 Path Analysis of the Latent Markov Chain Model . . . . . . . . . . . . . . 179
7.7 Numerical Illustration II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Chapter 1
Overview of Basic Latent Structure
Models

1.1 Introduction

Latent structure analysis is classified into four analyses, i.e., latent class analysis,
latent profile analysis, latent trait analysis, and factor analysis. Latent class analysis
was introduced for explaining social phenomena by Lazarsfeld [11], and it analyzes
discrete (categorical) data, assuming a population or group under study is divided
into homogeneous subgroups which are called latent classes. As in the latent class
model, assuming latent classes in a population, latent profile analysis (Gibson, 1959)
was proposed for the study of interrelationships among continuous variables. In
this sense, the model may be regarded as a latent class model [1, 8]. Latent trait
analysis has been developed in a mental test theory (Lord, 1952, Lord & Novic,
1968) and also employed in social attitude measurements [10]. The latent trait model
was designed to explain responses to manifest categorical variables depending on
latent continuous variables, for example, ability, attitude, and so on. The systematic
discussion on the above models was given in Lazarsfeld & Henry [9]. Factor analysis
dates back to the works of Spearman [20], and the single factor model was extended
to the multiple factor model [21]. The analysis treats manifest and latent continuous
variables and explains phenomena under study by extracting simple structures to
explain inter-relations between the manifest and latent variables. Although “latent
structure analysis” is now a general term for the analyses with the above models,
in many cases, the name is used for latent class analysis in a narrow sense after
Lazarsfeld [11]. In the early years of the developments of the latent structure models,
the main efforts on studies were placed on parameter estimations by solving the
equations with respect to the means and covariances of manifest variables, which
are called the accounting equations; however, now, the methods are only in the
historical development. As the efficiency of computers has been increased rapidly,
these days, the method of maximum likelihood (ML) can be easily applied to data
analyses. Especially, the expectation–maximization (EM) algorithm [3] provided a
great contribution to parameter estimation in latent structure analysis.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_1
2 1 Overview of Basic Latent Structure Models

In this chapter, latent structure models are reviewed. Section 1.2 treats the latent
class model and the accounting equations are given. In Sect. 1.3, a latent trait model
with discriminant and item difficulty parameters is discussed. A comparison of the
model and a latent class model is also made. Section 1.4 treats the latent profile model,
which is regarded as a factor analysis model with categorical factors, and in Sect. 1.5,
the factor analysis model is briefly reviewed. Section 1.6 reviews generalized linear
models (GLM) [15, 16], and latent structure models are treated in a GLM framework,
and in Sect. 1.7, the EM algorithm for the ML estimation of latent structure models is
summarized. Finally, in Sect. 1.8, a summary and discussions of the present chapter
are provided.

1.2 Latent Class Model

In the latent class model, it is assumed a population is divided into some subpop-
ulations, in which individuals are homogeneous in responses to items under study.
The subpopulations are called latent classes in the analysis. Let X i be manifest vari-
ables that take categories {1, 2, . . . , K i }, i = 1, 2, . . . , I , which imply the responses
to be observed, and let ξ be a latent variable that takes categories {1, 2, . . . , A},
which denote latent classes and they are expressed by integers for simplicity of
the notations. Let X = (X 1 , X 2 , . . . , X I )T be the I -dimensional column vector
of the manifest variables X i ; let P(X = x|a) be the conditional probability of
X = x = (x1 , x2 , . . . , x I )T for given latent variable (class) a; and let P(X i = xi |a)
be those of X i = xi . Then, in the latent class model, conditional probability
P(X = x|a) of X = x for a given latent variable (class) a is expressed by


I
P(X = x|a) = P(X i = xi |a). (1.1)
i=1

The above equation indicates that manifest variables X i are statistically indepen-
dent in latent class a. The assumption is called that of local independence. Let va
be the probability that a randomly selected individual in a population is from latent
class a. Then, from (1.1), we have


A 
A 
I
P(X = x) = va P(X = x|a) = va P(X i = xi |a), (1.2)
α=1 α=1 i=1

where


A 
Ki
va = 1; P(X i = xi |a) = 1, i = 1, 2, . . . , I. (1.3)
α=1 xi =1
1.2 Latent Class Model 3

Table 1.1 Positive response probabilities of latent class model (1.4)


Latent class X1 X2 ··· XI
1 π11 π12 ··· π1I
2 π21 π22 ··· π2I
.. .. .. .. ..
. . . . .
A π A1 π A2 ··· π AI

The above equations are referred to as the accounting equations. In many data
analyses, manifest variables are binary, for example, binary categories are {yes, no},
{positive, negative}, {success, failure}, and so on. Such responses are formally
denoted as integers {1,0}. Let πai be the positive response probabilities of manifest
variables X i , i = 1, 2, . . . , I . Then, the Eqs. (1.2) are expressed as follows:


A 
I
P(X = x) = va πaixi (1 − πai )1−xi . (1.4)
α=1 i=1

According to the above accounting Eqs. (1.2) and (1.4), the latent class model is
also viewed as a mixture of the independent response models (1.1). The interpre-
tation of latent classes is done by latent response probabilities (πa1 , πa2 , . . . , πa I )
(Table 1.1). Exploratory latent class analysis is performed by the general model
(1.2) and (1.4), where any restrictions are not placed on model parameters va and
P(X i = xi |a). On the other hand, in confirmatory analysis, some constraints are
placed on the model parameters. The constraints are made according to phenomena
under study or the information of practical scientific research.

Remark 1.1 In latent class model (1.2) with constraintsin (1.3), the number of
I
parameters va is A − 1 and that of P(X i = xi |a) is A i=1 (K − 1). Since the
I i
number of manifest probabilities (parameters) P(X = x) is i=1 K i − 1, in order
to identify the latent class model, the following inequality has to hold:


I 
I
K i − 1 > (A − 1) + A (K i − 1). (1.5)
i=1 i=1

Remark 1.2 In the latent class model (1.2), single latent variable ξ has been assumed
for explaining a general framework of the model. In a confirmatory latent class
analysis, some latent variables can be set for the analysis, for example, an application
of the latent class model to explain skill acquisition patterns [2] and latent class factor
analysis model [14]; however, in such cases, since latent variables are categorical and
finite, the models can be viewed as restricted cases of the general latent class model.
For example, for a latent class model with two latent variables ξ j with categorical
4 1 Overview of Basic Latent Structure Models

sample spaces 1, 2, . . . , A j , j = 1, 2, setting (ξ1 .ξ2 ) = (a, b) as a new latent
variable ζ = a + A1 (b − 1), a = 1, 2, . . . , A1 , b = 1, 2, . . . , A2 , the model can be
viewed as a restricted case of the general model.

1.3 Latent Trait Model

Let θ be the latent trait (ability) of a randomly selected individual in a popu-


lation, where the latent trait is a real value or vector in a Euclidian space; let
X i be manifest variables that take categories {1, 2, . . . , K i }, i = 1, 2, . . . , I as
in Sect. 1.2; let P(X = x|θ ) be the conditional probabilities of responses X =
x = (x1 , x2 , . . . , x I )T , given latent trait θ ; and let P(X i = xi |θ ) be those of
X i = xi . Under the assumption of local independence, the latent trait model is
given by


I
P(X = x|θ) = P(X i = xi |θ),
i=1

where P(X i = xi |θ ), i = 1, 2, . . . , I are real-valued functions of θ . Let ϕ(θ ) be the


standard normal density function of latent trait θ ∈ (−∞, +∞), i.e.,

2
1 θ
ϕ(θ ) = √ exp − .
2π 2

Then, we have

+∞ 
I
P(X = x) = ∫ ϕ(θ ) P(X i = xi |θ)dθ. (1.6)
−∞
i=1

Comparing (1.2) and (1.6), model (1.6) can be approximated by a latent class
model. Let
   
−∞ = θ(0) < θ(1) < θ(2) < . . . < θ(A−1) < +∞ = θ(A) , (1.7)

and let
θa
va = ∫ ϕ(θ )dθ, a = 1, 2, . . . , A. (1.8)
θa−1

Then, (1.6) can be approximated as follows:


1.3 Latent Trait Model 5


A 
I
P(X = x) ≈ va P(X i = xi |θa ), (1.9)
α=1 i=1

where we set

θ(a−1) < θa < θ(a) , a = 1, 2, . . . , A.

For binary manifest variables, positive response probabilities


Pi (θ )(= P(X i = 1|θ )) are non-decreasing functions. For example, the
two-parameter logistic model [12] is given by

1 exp(Dai (θ − di ))
Pi (θ ) = = , i = 1, 2, . . . , I,
1 + exp(−Dai (θ − di )) 1 + exp(Dai (θ − di ))
(1.10)

where ai and di are discriminant and difficulty parameters, respectively, for test item
i, i = 1, 2, . . . , I , and D = 1.7. This model is an extension of the (Rasch model
(1960)) and popularly used in item response theory. In general, the above functions
are referred to as item characteristic functions. Positive response probabilities Pi (θ )
are usually continuous functions in latent trait θ (Fig. 1.1).

Remark 1.3 Let Yi be latent traits to answer items i, i = 1, 2, . . . , I and let θ be a


common latent trait to answer all the items. It is assumed variables Yi and θ are jointly
distributed according to a bivariate normal distribution with mean vector (0, 0) and
variance–covariance matrix,

1.2

0.8

0.6
a =2
a =2 a =4
0.4 a =5
a =3
0.2

0
-1.95
-1.8
-1.65
-1.5
-1.35
-1.2
-1.05
-0.9
-0.75
-0.6
-0.45
-0.3
-0.15

0.15
0.3
0.45
0.6
0.75
0

0.9
1.05
1.2
1.35
1.5
1.65
1.8
1.95

d = -1 d = -0.5 d=0 d = 0.5 d=1

Fig. 1.1 Two-parameter logistic models (1.10)


6 1 Overview of Basic Latent Structure Models


1 ρi
= .
ρi 1

 From this, the conditional distributions of Yi for given θ are normal


N ρi θ, 1 − ρi2 . Let ηi be the threshold of latent ability Yi to successfully answer to
item i, i = 1, 2, . . . , I . The probabilities that an individual with latent trait θ gives
correct answers to items i, i = 1, 2, . . . , I are computed as follows:
⎛ ⎞


η ρi
Pi (θ ) = P(Yi > ηi |θ ) = ⎝ 
i ⎠
θ− , i = 1, 2, . . . , I,
1−ρ 2 ρ i
i

where (x) is the standard normal distribution function. Setting


ρi ηi
ai =  , di = , D = 1.7,
1− ρi2 ρi

we have
exp(Dai (θ − di ))
(ai (θ − di )) ≈ .
1 + exp(Dai (θ − di ))

The treatment of the logistic models in both theoretical and practical discussion
is easier than that on the normal distribution model (ai (θ − di )), so the logistic
models are used in item response models. The graded response model [18, 19] is an
extension of this model. 
In order to improve the Guttman scale model, the latent distance model [9] was
proposed by using step functions. Let θ be a latent trait on interval [0, 1] and let the
thresholds θ(i) , i = 1, 2, . . . , I be given as
   
θ(0) = 0 < θ(1) < θ(2) < . . . < θ(I ) < 1 = θ(I +1) . (1.11)

Then, the item characteristic functions are defined by


  
πiL θ < θ(i) 
Pi (θ ) = , i = 1, 2, . . . , I, (1.12)
πiH θ ≥ θ(i)

where 0 ≤ πiL < πiH ≤ 1. Probabilities πiL imply guessing errors and 1 − πiH
forgetting ones. In the above model, thresholds (1.11) imply the difficulties of items
as well. If we set

va = θ(a) − θ(a−1) , a = 1, 2, . . . , I + 1, (1.13)


1.3 Latent Trait Model 7

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.04
0.08
0.12
0.16
0.2
0.24
0.28
0.32
0.36
0.4
0.44
0.48
0.52
0.56
0.6
0.64
0.68
0.72
0.76
0.8
0.84
0.88
0.92
0.96
1
theta =0.1 theta = 0.3 theta = 0.4 theta = 0.7 theta = 0.9

Fig. 1.2 Latent distance model (1.12)

the latent distance model is a restricted version of the latent class model. As in the
two-parameter logistic model, the graphs of Pi (θ ) are illustrated in Fig. 1.2.

1.4 Latent Profile Model

Assuming that a population under study is decomposed into A subpopulations, for


continuous manifest variables X i , i = 1, 2, . . . , I , let f (x|a) and f i (xi |a) be the
conditional density functions of X = (X 1 , X 2 , . . . , X I )T and X i in latent class
(subpopulation) a(= 1, 2, . . . , A), respectively; and let va be proportions of latent
classes a(= 1, 2, . . . , A). Then, under the assumption of local independence, it
follows that


I
f (x|a) = f i (xi |a). (1.14)
i=1

Hence, the joint density function of X, f (x), is given by


A 
I
f (x) = va f i (xi |a), (1.15)
a=1 i=1

where
8 1 Overview of Basic Latent Structure Models


A
va = 1.
a=1

If the conditional density functions f i (xi |a) are normal with mean vectors μai
and variances ψi2 , i = 1, 2, . . . , I , then, from (1.14), we have


I  
1 (xi − μai )2
f (x|a) =  exp − . (1.16)
i=1 2π ψi2 2ψi2

This model can be expressed with linear equations, and in latent class a, the
following equations are given:

X i = μai + ei , i = 1, 2, . . . , I, (1.17)

where ei are the error terms that are independently distributed according to normal
distributions with means 0 and variances ψi2 , i = 1, 2, . . . , I . From the above equa-
 and covariances of the manifest variables, σi = Var(X i ) and
2
tions, the variances

σi j = Cov X i , X j , are described as follows:
 A  2
σi2 = a=1 v μ − μi + ψi2 (i = 1, 2, . . . , I ),
 A a ai   (1.18)
σi j = a=1 va μai − μi μa j − μ j (i = j) ,

where


A
E(X i ) = μi = va μai , i = 1, 2, . . . , I. (1.19)
a=1

Figure 1.3 illustrates the mixed distribution of normal distributions N (−2, 1)


and N (5, 1), where the mixed ratios are 0.6 and 0.4, respectively. The number of
parameters in model (1.16) is I (A + 2) − 1 and that of the above manifest equations
is 21 I (I + 3). From this, in order to identify the model, the following equation has
to hold:
1
I (A + 2) − 1 < I (I + 3). (1.20)
2
The latent profile model can be viewed as a factor analysis model with categorical
factors.
1.5 Factor Analysis Model 9

0.3

0.25

0.2

0.15

0.1

0.05

0
-5

5
-4.6
-4.2
-3.8
-3.4
-3
-2.6
-2.2
-1.8
-1.4
-1
-0.6
-0.2
0.2
0.6
1
1.4
1.8
2.2
2.6
3
3.4
3.8
4.2
4.6
Fig. 1.3 The density function of 0.6N (−2, 1) + 0.4N (5, 1)

1.5 Factor Analysis Model

Let X i be manifest variables, ξ j latent variables (common factors), and εi unique


factors peculiar to X i ; and let λi j be factor loadings that are weights of factors ξ j to
explain X i . Then, the factor analysis model is given as follows:


m
Xi = λi j ξ j + εi , i = 1, 2, . . . , I,
j=1

where


⎪ E(X i ) =E(εi ) = 0, i = 1, 2, . . . , I ;



⎨ E ξ j = 0, j = 1, 2, . . . , m;
Var ξ j = 1, j = 1, 2, . . . , m;



⎪ Var(εi ) = ψi2 > 0, i = 1, 2, . . . , I ;

⎩ Cov(εk , εl ) = 0, k = l.

Assuming that factors ξ j , j = 1, 2, . . . , m and εi , i = 1, 2, . . . , I are normally


distributed, then, the conditional density functions of manifest variables of X i , i =
1, 2, . . . , I given the factors ξ j , j = 1, 2, . . . , m are described by
10 1 Overview of Basic Latent Structure Models
⎛  m 2 ⎞
xi − j=1 λi j ξ j
1 ⎜ ⎟
f i (xi |ξ ) =  exp⎝− ⎠.
2π ψi2 2ψi2

The conditional normal density function of X given ξ is expressed as


⎛  m 2 ⎞

I xi − j=1 λi j ξ j
1 ⎜ ⎟
f (x|ξ ) =  exp⎝− ⎠. (1.21)
i=1 2π ψi
2 2ψi2

Comparing (1.16) and (1.21), the latent profile model can be viewed as a factor
analysis model with categorical factors.
In latent class analysis, the latent class factor analysis model is briefly reviewed
[14]. Let ξ j , j = 1, 2, . . . , m be binary latent variables; let X i , i = 1, 2, . . . , I be
binary manifest variables; let f (x|ξ ) be the conditional probability function of X =
(X 1 , X 2 , . . . , X I )T given ξ = (ξ1 , ξ2 , . . . , ξm )T . Assuming there are no interactions
between the latent variables, then, the model is expressed as follows:
  

I exp αi + xi mj=1 λi j ξ j
f (x|ξ ) =   . (1.22)
i=1 1 + exp αi + xi mj=1 λi j ξ j

In a factor analysis model, the predictor in the above model is mj=1 λi j ξ j and
regression coefficients λi j are log odds with respect to binary variables X i and ξ j ,
and the parameters are interpreted as the effects of latent variables ξ j on manifest
  X i , that is, the positive response probabilities are changes by multiplying
variables
exp λi j . The latent class factor analysis model (1.22) is similar to the factor analysis
model (1.21).

1.6 Latent Structure Models in a Generalized Linear


Model Framework

Generalized linear models (GLMs) are widely applied to regression analyses for both
continuous and categorical response variables [15, 16]. As in the above discussion,
let f i (xi |ξ ) be the conditional density or probability function of manifest variables
X i given latent variable vector ξ . Then, in GLMs, the function is assumed to be the
following exponential family of distributions:


xi θi − bi (θi )
f i (xi |ξ ) = exp + ci (xi , ϕi ) , i = 1, 2, . . . , I, (1.23)
ai (ϕi )
1.6 Latent Structure Models in a Generalized Linear Model Framework 11

where θi and ϕi are parameters and ai (ϕi )(> 0), bi (θi ) and ci (xi , ϕi ) are specific
functions for response manifest variables X i , i = 1, 2, . . . , I . This assumption is
referred to as the random component. If X i is the Bernoulli trial with P(X i = 1) = πi ,
then, the conditional probability function is



πi
f i (xi |ξ ) = πixi (1 − πi ) 1−xi
= exp xi log + log(1 − πi ) ,
1 − πi

Corresponding to (1.23), we have




πi
θi = log , a(ϕi ) = 1, b(θi ) = −log(1 − πi ), c(xi , ϕi ) = 0. (1.24)
1 − πi

In this formulation, for binary manifest variables, latent class model (1.1) can be
expressed as follow:


I


πai
P(X = x|a) = exp xi log − log(1 − πai )
i=1
1 − πai
 I 
 I
= exp xi θai − log(1 − πai )
i=1 i=1
 

I
= exp x θ a − T
log(1 − πai ) , (1.25)
i=1

where


πai
θ aT = (θa1 , θa2 , . . . , θa I ), θai = log , i = 1, 2, . . . , I.
1 − πai

For normal variable X i with mean μi and variance ϕi2 , the conditional density
function is


1 xi μi − 21 μi2 xi2
f i (xi |ξ ) =  exp + − 2 ,
2π ψ 2 ψi2 2ψi
i

where


1 2 x2
θi = μi , ai (ϕi ) = ψi2 , bi (θi ) = μi , ci (xi , ϕi ) = − i 2 − log 2π ψi2 .
2 2ψi

In the factor analysis model (1.21), the random component is reformulated as


follows:
12 1 Overview of Basic Latent Structure Models
   I 

I
xi θi − 21 θi2    xi θi − 1 θi2  I
f (x|ξ ) = exp + c xi , ψi
2
= exp 2
+ c(xi , ψi ) .
2

i=1
ψi2 i=1
ψi2 i=1
(1.26)

Let us set
⎛ ⎞
ψ12
⎜ ψ22 0 ⎟
⎜ ⎟ T
=⎜ .. ⎟, θ = (θ1 , θ2 , . . . , θ I ), x T = (x1 , x2 , . . . , x I ).
⎝ 0 . ⎠
ψ I2

Then, (1.26) is re-expressed as follows:




−1 1 T −1 1 T −1
f (x|ξ ) = exp x  T
θ− θ  θ− x  x . (1.27)
2 2

As shown in (1.25) and (1.27), in latent structure models with multivariate


response variables, the random components can be described as follows. Let us
set
⎛ ⎞
a1 (ϕ1 )
⎜ a2 (ϕ2 ) 0 ⎟
⎜ ⎟
=⎜ .. ⎟,
⎝ 0 . ⎠
a I (ϕ I )

θ T = (θ1 , θ2 , . . . , θ I ), x T = (x1 , x2 , . . . , x I ), 1 = (1, 1, . . . , 1)T .

Then, the random component can be expressed as


I 
I

xi θi − bi (θi )
f (x|ξ ) = f i (xi |ξ ) = exp + ci (xi , ϕi )
i=1 i=1
ai (ϕi )
 

I
−1 −1
= exp x  T
θ −1  T
b(θ ) + ci (xi , ϕi ) .
i=1
(1.28)

From the above discussion, by using appropriate linear predictors and link func-
tions, latent structure models can be expressed by GLMs, for example, in the factor
analysis model (1.21) and the  latent class factor analysis model (1.22), the linear
predictors are expressed by mj=1 λi j ξ j and the link functions are identity ones, and

then, θi = mj=1 λi j ξ j , i = 1, 2, . . . , I . Hence, the effects of latent variables on the
manifest variables can be measured with the entropy coefficient of determination
1.6 Latent Structure Models in a Generalized Linear Model Framework 13

(ECD) [4]. Let f (x) and g(ξ ) be the marginal density or probability functions of
manifest variable vector X and latent variable vector ξ , respectively. Then, in the
latent structure model (1.28), we have

f (x|ξ ) f (x)
KL(X, ξ ) = ∫ f (x|ξ )g(ξ )log d xdξ + ∫ f (x)g(ξ )log d xdξ
f (x) f (x|ξ )
= ∫( f (x|ξ ) − f (x))g(ξ )log f (x|ξ )d xdξ = tr −1 Cov(θ , X).
(1.29)

If the manifest and latent variables are discrete (categorical), the related integrals in
(1.29) are substituted with appropriate summations. From the above KL information,
the entropy coefficient of determination (ECD) is given by

KL(X, ξ ) tr −1 Cov(θ , X)


ECD(X, ξ ) = = . (1.30)
KL(X, ξ ) + 1 tr −1 Cov(θ , X) + 1

The ECD expresses the explanatory or predictive power of the GLMs. Applying
ECD to model (1.21), we have
m
I j=1 λi j
2 I Ri2
i=1 ψi2 i=1 1−Ri2
ECD(X, ξ ) = I m 2 = ,
j=1 λi j I Ri2
+1 i=1 1−Ri2 +1
i=1 ψi2


where Ri2 are the coefficients of determination of predictors θi = mj=1 λi j ξ j on the
manifest variables X i , i = 1, 2, . . . , I [5, 7]. Similarly, from model (1.22), we also
get
I m  
j=1 λi j Cov X i , ξ j
2
i=1
ECD(X, ξ ) =  I m 2   .
i=1 j=1 λi j Cov X i , ξ j + 1

Discussions of ECD in factor analysis and latent trait analysis are made in Eshima
et al. [5] and Eshima [6]. In this book, ECD is used for measuring the predictive power
of latent variables for manifest variables, and is also applied to make path analysis
in latent class models.

Remark 1.4 In basic latent structure models treated in this chapter (1.23), since

Cov(θi , X i )
KL(X i , ξ ) = , i = 1, 2, . . . , I,
a(ϕ)

from (1.29) we have


14 1 Overview of Basic Latent Structure Models


I 
I
Cov(θi , X i )
KL(X, ξ ) = KL(X i , ξ ), = .
i=1 i=1
ai (ϕi )

In models (1.21),

KL(X i , ξ )
ECD(X i , ξ ) = = Ri2 , i = 1, 2, . . . , I.
KL(X i , ξ ) + 1

1.7 The EM Algorithm and Latent Structure Models

The expectation–maximization (EM) algorithm [3] for the maximum likelihood (ML)
estimation from incomplete data is reviewed in the latent structure model framework.
The algorithm is a powerful tool for the ML estimation of latent structure models. In
latent structure analysis, (X, ξ ) and X are viewed as the complete and incomplete
data, respectively. In the latent structure model, for parameter vector φ, the condi-
tional density or probability function of X given ξ , f (x|ξ ), the marginal density or
probability function of X, f (x), and that of ξ , g(ξ ) are denoted by f (x|ξ )φ , f (x)φ ,
and g(ξ )φ , respectively. Let f (x, ξ )φ be the joint density function of the complete
data (x, ξ ), then,

f (x, ξ )φ = f (x|ξ )φ g(ξ )φ ,

and the log likelihood function of φ based on incomplete data X = x is expressed


as

l(φ|x) = log f (x)φ .

Let

  
Q φ |φ = E log f (x, ξ )φ
|x, φ

be the conditional expectation of f (x, ξ )φ


given X = x and parameter φ. The above
conditional expectation is obtained by integrating with respect to latent variable
vector ξ . In order to get the ML estimates of the parameters φ in latent structure


models, φ , such that


 


l φ = maxlog f (x)φ ,
φ

the EM algorithm is constituted of the following two steps:


1.7 The EM Algorithm and Latent Structure Models 15

(i) Expectation step (E-step)


For estimate s+1 φ at the (s + 1) th step, compute the conditional expectation of
log f (x, ξ )φ given the incomplete data X = x and parameter s φ:
   
Q φ|s φ = E log f (x, ξ )φ |x, s φ . (1.31)

(ii) Maximization step (M-step)

Obtain φ s+1 such that


s+1   
Q φ|s φ = max Q φ|s φ . (1.32)
φ


By using the above iterative procedure, the ML estimates of latent structure models
can be obtained. If there exists a sufficient statistic t(x, ξ ) for parameter vector φ
such that
 
exp φt(x, ξ )T
f (x, ξ |φ) = b(x, ξ ) , (1.33)
s(φ)

the EM algorithm is simplified as follows:


(i) E-step
Compute
 
s+1
t = E t(x, ξ )|x, s φ . (1.34)

(ii) M-step

Obtain φ p+1 from the following equation:


s+1
t = E(t(X, ξ )|φ). (1.35)

1.8 Discussion

Basic latent structure models, i.e., the latent class model, latent trait model, latent
profile model, and factor analysis model, are overviewed in this chapter. These models
are based on, what we call, the assumption of local independence, that is, the manifest
variables are statistically independent, given latent variables. These models can be
expressed in a GLM framework, and the multivariate formulation of latent structure
models is also given in (1.28). Studies of latent structure models through a GLM
16 1 Overview of Basic Latent Structure Models

framework will be important to grasp the models in a general way and to apply
them in various research domains, and it may lead to the construction of new latent
structure models. It is expected that new latent structure models are designed in the
applications. The EM algorithm is a useful tool to perform the ML estimation of
latent structure models, and a brief review of the method is also given in this chapter.
In the following chapters, the EM algorithm is used to estimate the model parameters.

References

1. Bartholomew, D. J. (1987). Latent variable models and factor analysis. Charles & Griffin.
2. Dayton, M., & Macready, G. B. (1976). A probabilistic model for validation of behavioral
hierarchies. Psychometrika, 41, 190–204.
3. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete
data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, B, 39,
1–38.
4. Eshima, N., & Tabata, M. (2010). Entropy coefficient of determination for generalized linear
models. Computational Statistics and Data Analysis, 54, 1381–1389.
5. Eshima, N., Tabata, M., & Borroni, C. G. (2018). An entropy-based approach for measuring
factor contributions in factor analysis models. Entropy, 20, 634.
6. Eshima, N. (2020). Statistical data analysis and entropy. Springer Nature.
7. Eshima, N., Borroni, C. G., Tabata, M., & Kurosawa, T. (2021). An entropy-based tool to help
the interpretation of common-factor spaces in factor analysis. Entropy, 23, 140. https://doi.org/
10.3390/e23020140-24
8. Everitt, B. S. (1984). An introduction to latent variable models. Chapman & Hall.
9. Gibson, W. A. (1959). Three multivariate models: Factor analysis, latent structure analysis,
and latent profile analysis, Psychometrika 24, 229–252.
10. Lazarsfeld, P. F., & Henry, N. M. (1968). Latent structure analysis. Houghton Mifflin.
11. Lazarsfeld, P. F. (1959). Latent structure analysis, psychology: A study of a science, Koch, S.
ed. McGrowHill: New York.
12. Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis.
In Soufer, S. A., Guttman, L., & others (Eds.), Measurement and prediction: Studies in social
psychology I World War II (Vol. 4). Prenceton University Press.
13. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.
14. Lord, F. M. (1952). A theory of test scores (Psychometric Monograph, No. 7), Richmond VA:
Psychometric Corporation.
15. Magidson, J., & Vermunt, J. K. (2001). Latent class factor and cluster models: Bi-plots, and
related graphical displays. Sociological Methodology, 31, 223–264.
16. McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London.
17. Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear model. Journal of the Royal
Statistical Society A, 135, 370–384.
18. Rasch, G. (1960). Probabilistic model for some intelligence and attainment tests. Danish
Institute for Educational Research.
19. Samejima, F. (1973). A method of estimating item characteristic functions using the maximum
likelihood estimate od ability. Psychometrika, 38, 163–191.
20. Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimen-
sional latent space. Psychometrika, 39, 111–121.
21. Spearman, S. (1904). “General-intelligence”, objectively determined and measured. American
Journal of Psychology, 15, 201–293.
22. Thurstone, L. L. (1935). Vector of mind: Multiple factor analysis for the isolation of primary
traits. Chicago, IL, USA: The University of Chicago Press.
Chapter 2
Latent Class Cluster Analysis

2.1 Introduction

In behavioral sciences, there are many cases where we can assume that human behav-
iors and responses depend on latent concepts, which are not directly observed. In such
cases, it is significant to elucidate the latent factors to affect and cause human behav-
iors and responses. For this objective, latent class analysis was proposed by Lazars-
feld [11] to explore discrete (categorical) latent factors that explain the relationships
among responses to items under studies. The responses to the items are treated by
manifest variables and the factors by latent variables. By use of models with manifest
and latent variables, it is possible to analyze the phenomena concerned. A general
latent class model is expressed with (1.2) and (1.3), and for binary manifest variables
the model is expressed by (1.4). These equations are called accounting equations.
The parameters in models (1.2) and (1.4) are manifest probabilities P(X = x) and
latent probabilities va , P(X i = xi |a), and πai . Although the manifest probabilities
can be estimated directly by the relative frequencies of responses X = x as consistent
estimates, the latent probabilities cannot be estimated easily. In the early stages of
the development of latent class analysis, the efforts for the studies were concentrated
on the parameter estimation by solving the accounting equations, for example, Green
[10], Anderson [1], Gibson [6, 7], Madansky [13], and so on; however, these studies
are now only in the study history. As increasing computer efficiency, methods for
the ML estimation were widely applied; however, it was critical to obtain proper
estimates of the latent probabilities, that is, the estimates have to be between 0 and 1.
The usual ML estimation methods often derived the improper solutions in real data
analyses, in which improper solutions imply the latent probability estimates that are
outside of interval [0, 1]. To overcome the problem, two ways for the ML estimation
were proposed. One is a proportional fitting method by Goodman [8, 9], which is
included in the EM algorithm for the ML estimation [3]. The second is a method in
which a parameter transformation is employed to deal with the ML estimation by a
direct use of the Newton–Raphson algorithm [5]. Although the convergence rate of

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 17
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_2
18 2 Latent Class Cluster Analysis

Goodman’s method is slow, the method is simple and flexible to apply in real data
analysis. In this sense, Goodman’s contribution to the development of latent class
analysis is great.
This chapter consists of seven sections including this section. In Sect. 2.2,
Goodman’s method for the ML estimation of the latent class model is derived from the
EM algorithm and the property of the algorithm is discussed. Section 2.3 applies the
ML estimation algorithm to practical data analyses. In Sect. 2.4, in order to measure
the goodness-of-fit of latent class models, the entropy coefficient of determination [4]
is used. Section 2.5 considers two methods for comparing latent classes are discussed
and the methods are illustrated by using numerical examples. In Sect. 2.6, a method
for the ML estimation of the latent profile model is constructed according to the EM
algorithm, and a numerical example is also given to demonstrate the method. Finally,
Sect. 2.7 provides a discussion on the latent class analysis presented in this chapter
and a perspective of the analysis leading to further studies in the future.

2.2 The ML Estimation of Parameters in the Latent Class


Model

In latent class model (1.2) with constraints (1.3), the complete data would be
obtained as responses to I items X = (X 1 , X 2 , . . . , X I )T in latent classes a.
Let n(x1 , x2 , . . . , x I ) and n(x1 , x2 , . . . , x I , a) be the numbers of observations with
response x = (x1 , x2 , . . . , x I )T and those in latent class a, respectively, and let
φ = ((va ), (P(X i = xi |a))) be the parameter row vector. Concerning numbers of
observations n(x) and n(x, a), it follows that


A
n(x) = n(x, a).
α=1

Since the probability of X = x is expressed by


A 
I
P(X = x) = va P(X i = xi |a), (2.1)
α=1 i=1

the log likelihood function of parameter vector φ based on incomplete data (n(x))
is given by
 A 
  
I
l(φ|(n(x))) = n(x)log va P(X i = xi |a) , (2.2)
x α=1 i=1
2.2 The ML Estimation of Parameters in the Latent Class Model 19


where the summation in the above formula x is made over all response patterns
x = (x1 , x2 , . . . , x I ). Since the direct maximization of log likelihood function (2.2)
with respect to φ is very complicated, the EM algorithm is employed. Given that
complete data, statistics t(x, a) = (n(x, a)) in (1.33) are obtained as a sufficient
statistic vector for parameters φ, and we have the log likelihood function of parameter
vector φ as follows:
 

A  
I
l(φ|(n(x, a))) = n(x, a)log va P(X i = xi |a)
α=1 x i=1
 

A  
I
= n(x, a) logva + logP(X i = xi |a) . (2.3)
α=1 x i=1

In this sense, sufficient statistic vector t(x, a) = (n(x, a)) is viewed as the
complete data. Let s φ = ((s va ), (s P(X i = xi |a))) be the estimate of φ at the sth
iteration in the EM algorithm. Then, from (1.34) and (1.35), the E- and M-steps are
formulated as follows.
The EM algorithm for model (1.2) with constraints (1.3).
(i) E-step
Let
s+1 
s+1
t(x, a) = n(x, a)

be the conditional expectation of the sufficient statistic for the given incomplete
(observed) data x and parameters s φ. From (1.34), we have
I
s
va s
P(X i = xi |a)
s+1
n(x, a) = n(x)  A i=1
I . (2.4)
b=1 vb = xi |b)
s s P(X
i=1 i

s+1 
From the above results, we can get s+1 t(x, a) = n(x, a) .
(ii) M-step
Let  be the sample space of the manifest variable vector X; let (X i = xi ) be the
sample subspaces of X for given X i = xi , i = 1, 2, . . . , I  ; let i be the sample
space of manifest variables X i , i = 1, 2, . . . , I ; and let N = x n(x). From (1.35),
we have


I
s+1
n(x, a) = N va P(X i = xi |a), x ∈ , a = 1, 2, . . . , A. (2.5)
i=1

The following constraints hold true:


20 2 Latent Class Cluster Analysis


I
P(X i = xi |a) = 1, a = 1, 2, . . . , A,
x∈ i=1

and

 
I
 
P X j = x j |a = P(X i = xi |a), i = 1, 2, . . . , I ; a = 1, 2, . . . , A,
x∈(X i =xi ) j=1

 
where x∈ is the summation over all response patterns X = x, and x∈(X i =xi ) , the
summation over all response patterns X = x for given X i = xi . Solving equations in
(2.5), with respect to parameters va , P(X i = xi |a), i = 1, 2, . . . , I , under constraints
(1.3), it follows that


⎪ 
s+1


s+1
va = N1 n(x, a), a = 1, 2, . . . , A;

⎨ x∈
1 
s+1
P(X i = xi |a) = s+1
n(x, a), (2.6)

⎪ · v

⎪ N s+1
a


x∈(X i =xi )
xi ∈ i , i = 1, 2, . . . , I ; a = 1, 2, . . . , A.

The above algorithm is the same as a proportional fitting method by Goodman





[8, 9], and the ML estimates of the parameters v a and P (X i = xi |a) can be obtained
as the convergence values of the above estimates, ∞ va and ∞ P(X i = xi |a). From
(2.6), it is seen that for any integer s,

0 ≤ s va ≤ 1, a = 1, 2, . . . , A;
0 ≤ P(X i = xi |a) ≤ 1, xi ∈ i , i = 1, 2, . . . , I ; a = 1, 2, . . . , A.
s

Thus, if the above algorithm converges, the estimates are proper for the likelihood
function, and satisfy the following equations:


∂va
l(φ|(n(x))) = 0, a = 1, 2, . . . , A;

∂ P(X i =xi |a)
l(φ|(n(x))) = 0, xi ∈ i , i = 1, 2, . . . , I ; a = 1, 2, . . . , A.

For observed data set {n(x)}, the goodness-of-fit test of a latent class model to
the data can be carried out with the following log likelihood ratio test statistic:
  A 
 n(x)   I



G =2
2
n(x) log − log va P (X i = xi |a) . (2.7)
x
N α=1 i=1

For sufficiently large sample size N , the above statistic is asymptotically χ 2 -


distributed with degrees of freedom, the number of manifest parameters P(X = x)
2.2 The ML Estimation of Parameters in the Latent Class Model 21

minus latent parameters va and P(X i = xi |a), that is, from (1.5) we have
 I 

I 
I 
I 
K i − 1 − (A − 1) − A (K i − 1) = Ki − A K i − (I − 1) . (2.8)
i=1 i=1 i=1 i=1

After estimating a latent class model under study, the interpretation of latent
classes

is made by considering the  sets of the estimated latent response probabilities
P (X i = xi |a), i = 1, 2, . . . , I , a = 1, 2, . . . , A. In addition to the interpretation,
it is significant to assess the manifest response vectors x = (x1 , x2 , . . . , x I ) with
respect to latent classes, that is, to assign individuals with the manifest responses to
the extracted latent classes. The best way to assign them to the latent classes is made
with the maximum posterior probabilities, that is, if
I
va P(X i = xi |a)
P(a0 |(x1 , x2 , . . . , x I )) = max  A i=1
I , (2.9)
b=1 vb i=1 P(X i = x i |b)
a

an individual with response x = (x1 , x2 , . . . , x I ) is evaluated as a member in latent


class a0 .
Remark 2.1 The EM algorithm for the latent class model, mentioned above, has been
constructed by using E-step (1.34) and M-step (1.35). The algorithm can also be
directly derived with (1.31) and (1.32). The process is given as follows:
(i) E-step
 A   
 s   
I
Q φ| φ = E n(x, a) logva + logP(X i = xi |a) |x, φ
s

α=1 x i=1
 

A 
  
I
= E n(x, a)|x, φ logva +
s
logP(X i = xi |a)
α=1 x i=1
 

A 
s+1 
I
= n(x, a) logva + logP(X i = xi |a) ,
α=1 x i=1

where s+1 n(x, a) are given in (2.4).


(ii) M-step
Considering the constraints in (1.3), let λ and μai , a = 1, 2, . . . , A; i = 1, 2, .., .I
be Lagrange multipliers. Then, the Lagrange function is given by

    
A 
A 
I 
L φ|s φ = Q φ|s φ − λ va − μi P(X i = xi |a)
a=1 a=1 i=1 xi
22 2 Latent Class Cluster Analysis
 

A 
s+1 
I 
A
= n(x, a) logva + logP(X i = xi |a) − λ va
α=1 x i=1 a=1


A 
I 
− μai P(X i = xi |a).
a=1 i=1 xi

Differentiating the Lagrange function with respect to va , we have

∂  s  ∂   1  s+1
L φ| φ = Q φ|s φ − λ = n(x, a) − λ = 0, a = 1, 2, . . . , A.
∂va ∂va va x

From the above equations, it follows that



va λ = s+1
n(x, a), a = 1, 2, . . . , A.
x

Summing up both sides of the above equations for a = 1, 2, . . . , A, we obtain


A 
λ= s+1
n(x, a) = N .
a=1 x

Hence, the s + 1 th estimates of va are derived as follows:


 s+1
 s+1
n(x, a) n(x, a)
s+1
va = x
= x
, a = 1, 2, . . . , A. (2.10)
λ N
Similarly, differentiating the Lagrange function with respect to P(X i = xi |a), we
have
 s+1
∂ L(φ|s φ) ∂ Q(φ|s φ) x∈(X i =xi ) n(x, a)
= − μai = − μai = 0,
∂ P(X i = xi |a) ∂ P(X i = xi |a) P(X i = xi |a)

i = 1, 2, . . . , I ; a = 1, 2, . . . , A.

From this, it follows that



P(X i = xi |a)μai = s+1
n(x, a), i = 1, 2, . . . , I ; a = 1, 2, . . . , A.
x∈(X i =xi )

Summing up both sides of the above equations with respect to xi , we get


Ki  
μai = s+1
n(x, a) = s+1
n(x, a), i = 1, 2, . . . , I ; a = 1, 2, . . . , A.
xi =1 x∈(X i =xi ) x
2.2 The ML Estimation of Parameters in the Latent Class Model 23

Thus, (2.6) is obtained as follows:


 s+1
x∈(X i =xi ) n(x, a)
s+1
P(X i = xi |a) =  s+1 ,i = 1, 2, . . . , I ; a = 1, 2, . . . , A.
x n(x, a)
(2.11)

2.3 Examples

Table 2.1 shows the data from respondents to questionnaire items on role conflict
[20], and the respondents are cross-classified with respect to whether they tend toward
universalistic values “1” or particularistic values “0” when confronted by each of four
different situations of role conflict [18]. Assuming A latent classes in the population,
latent class model (2.1) is applied. Let X i , i = 1, 2, 3, 4 be the responses to the four
situations. According to the condition of model identification, the formula in (2.8)
have to be positive, so we have

24 − A{2 × 4 − (4 − 1)} = 16 − 5A > 0.

From the above inequality, the number of latent classes has to be less than and equal
to 16
5
. Assuming three latent classes, with which the latent class model is denoted by
M(3), the EM algorithm with (2.4) and (2.6) is carried out. The ML estimates of the
parameters are illustrated in Table 2.2, and the following inequalities hold:
  

P (X i = 1|1) < P (X i = 1|2) < P (X i = 1|3), i = 1, 2, 3, 4.

From the above results, the extracted latent classes 1, 2, and 3 in Table 2.2 can be
interpreted as ordered latent classes, “low”, “medium”, and “high”, in the universal-
istic attitude in the role of conflict. The latent class model with three latent classes for

Table 2.1 Data of responses


Response pattern Frequency Response pattern Frequency
in four different situations of
role conflict 0000 20 0001 2
1000 38 1001 7
0100 6 0101 1
1100 25 1101 6
0010 9 0011 2
1010 24 1011 6
0110 4 0111 1
1110 23 1111 42
Source Stouffer and Toby [20], Goodman [8]
24 2 Latent Class Cluster Analysis

Table 2.2 The estimates of the parameters for a latent class model with three latent classes
(Stouffer-Toby data in Table 2.1)
Latent class Proportion Latent positive item response probability
X1 X2 X3 X4
1 0.220 0.005 0.032 0.024 0.137
2 0.672 0.194 0.573 0.593 0.830
3 0.108 0.715 1.000 0.759 0.943
a The log likelihood ratio test statistic (2.6) is calculated as G 2 (3) = 0.387(d f = 1, P = 0.534)

Table 2.3 The estimates of the parameters for a latent class model with two latent classes (Stouffer-
Toby data in Table 2.1)
Latent class Class proportion Latent positive item response probability
X1 X2 X3 X4
1 0.279 0.007 0.060 0.073 0.231
2 0.721 0.286 0.670 0.646 0.868
a G 2 (2) = 2.720(d f = 6, P = 0.843)

four binary items has only one degree of freedom left for the test of goodness-of-fit
to the data, so a latent class model with two latent classes M(2) is estimated and the
results are shown in Table 2.3. In this case, two ordered latent classes, “low” and
“high” in the universalistic attitude, are extracted. The goodness-of-fit of both models
to the data is good. In order to compare the two models, the relative goodness-of-fit
of M(2) to M(3) can be assessed by

G 2 (2) − G 2 (3) = 2.720 − 0.387 = 2.333, d f = 5, P = 0.801.

From this, M(2) is better than M(3) to explain the present response behavior.
Stouffer and Toby [20] observed the data in Table 2.1 to order the respondents in
a latent continuum with respect to the relative priority of personal and impersonal
considerations in social obligations. In this sense, it is significant to have obtained
the ordered latent classes in the present latent class analysis. According to posterior
probabilities (2.9), the assessment results of respondents with manifest responses
x = (x1 , x2 , x3 , x4 )T for M(2) and M(3) are demonstrated in Table 2.4. Both results
are almost the same. As shown in this data analysis, we can assess the respondents
response patterns x = (x1 , x2 , x3 , x4 ), not simple total of the responses to
with their 
4
test items i=1 xi (Table 2.4).
Table 2.5 illustrates test data on creative ability in machine design [15]. Engi-
neers are cross-classified with respect to their dichotomized scores, that is, above
the subtest mean (1) or below (0), obtained on each of four subtests that measured
creative abilities in machine design [18]. If we can assume a one-dimensional latent
continuum with respect to the creative ability, it may be reasonable to expect to
2.3 Examples 25

Table 2.4 Assignment of the manifest responses to the extracted latent classes (Data in Table 2.1)
Response M(2) latent M(3) latent Response M(2) latent M(3) latent
pattern class class pattern class class
0000 1 1 0001 1 2
1000 2 2 1001 2 2
0100 2 2 0101 2 2
1100 2 2 1101 2 2
0010 1 2 0011 2 2
1010 2 2 1011 2 2
0110 2 2 0111 2 2
1110 2 2 1111 2 3

Table 2.5 Data on creative


Response pattern Frequency Response pattern Frequency
ability in machine design
(McHugh’s data) 0000 23 0001 5
1000 6 1001 3
0100 8 0101 2
1100 9 1101 3
0010 5 0011 14
1010 2 1011 4
0110 3 0111 8
1110 8 1111 34
Source McHugh [15], Proctor (1970)

derive ordered latent classes in latent class analysis as in the analysis of Stouffer-
Toby’s data. First, for three latent classes, we have the results of latent class analysis
shown in Table 2.6. The goodness-of-fit of the model to the data set is bad, since
we get G 2 (3) = 4.708(d f = 1, P = 0.030). Similarly, for a latent class model with
two latent classes, the goodness-of-fit of the model to the data set is also bad, that
is, G 2 (2) = 25.203(d f = 6, P = 0.000). From the results, it is not appropriate to
apply the latent class cluster analysis to the data set. For each of the four different

Table 2.6 The estimates of the parameters for a latent class model with three latent classes (data
in Table 2.5)
Latent class Class proportion Latent positive item response probability
X1 X2 X3 X4
1 0.198 0.239 0.000 0.808 0.803
2 0.398 0.324 0.360 0.089 0.111
3 0.404 0.810 1.000 0.926 0.810
a G 2 (3) = 4.708(d f = 1, P = 0.030)
26 2 Latent Class Cluster Analysis

subtests, it may be needed to assume a particular skill to obtain scores above the
mean, where the four skills cannot be ordered with respect to difficulty for obtaining
them. Assuming the particular skills for solving the subtests, a confirmatory latent
class analysis of the data is carried out in Chap. 4.
The third data (Table 2.7) were obtained from noncommissioned officers to
items on attitude toward the Army [18]. The respondents were cross-classified with
respect to their dichotomous responses, which were made according to dichotomized
responses “1” as “favorable” and “0” “unfavorable” toward the Army for each of the
four different items on general attitude toward the Army. If there exists a latent
continuum with respect to the attitude, we can assume ordered latent classes as in
the first data (Table 2.1). The estimated latent class models with three and two latent
classes are given in Tables 2.8 and 2.9, respectively. As shown in the results of the
test of the goodness-of-fit of the models, the degrees of the models are fair. As shown
in the tables, the estimated latent classes can be ordered, because, for example, in
Table 2.8, the following inequalities hold:
  

P (X i = 1|1) < P (X i = 1|2) < P (X i = 1|3), i = 1, 2, 3, 4.

Hence, the extracted latent classes 1–3 can be interpreted as “low”, “medium”,
and “high” groups in favorable attitude toward the Army, respectively. Comparing

Table 2.7 Data on attitude


Response pattern Frequency Response pattern Frequency
toward the Army
(Lazarsfeld-Stouffer’s data) 0000 75 0001 69
1000 3 1001 16
0100 42 0101 60
1100 10 1101 25
0010 55 0011 96
1010 8 1011 52
0110 45 0111 199
1110 16 1111 229
Source Price et al. [18]

Table 2.8 The estimates of the parameters for a latent class model with three latent classes
(Lazarsfeld-Stouffer’s data in Table 2.7)
Latent class Class proportion Latent positive item response probability
X1 X2 X3 X4
1 0.260 0.000 0.296 0.386 0.406
2 0.427 0.374 0.641 0.672 0.768
3 0.313 0.637 0.880 1.000 1.000
a G 2 (3) = 1.787(d f = 1, P = 0.181)
2.3 Examples 27

Table 2.9 The estimates of the parameters for a latent class model with two latent classes
(Lazarsfeld-Stouffer’s data in Table 2.7)
Latent class Class proportion Latent positive item response probability
X1 X2 X3 X4
1 0.445 0.093 0.386 0.442 0.499
2 0.555 0.572 0.818 0.906 0.944
a G 2 (2) = 8.523(d f = 6, P = 0.202)

the models, since the relative goodness-of-fit of M(2) is

G 2 (2) − G 2 (3) = 8.523 − 1.787 = 6.736, d f = 5, P = 0.241,

model M(2) is better than M(3). The data are treated in Chapter 3 again, assuming
ordered latent classes.
Comparing Tables 2.2, 2.6, and 2.8, each of Tables 2.2 and 2.8 shows three ordered
latent classes; however, the estimated latent classes in Table 2.6 are not consistently
ordered with respect to positive response probabilities for four test items. It can be
thought the universalistic attitude in the role conflict in Stouffer-Toby’s data and
the favorable attitude toward the Army are one-dimensional, but in machine design,
latent classes may not be assessed one-dimensionally.

2.4 Measuring Goodness-of-Fit of Latent Class Models

As in the ordinary linear regression analysis, it is meaningful to evaluate the predictive


power or goodness-of-fit of the latent class model. According to a GLM framework
of latent structure models (Chap. 1, Sect. 1.6), the KL information (1.29) is applied
to the latent class model (2.1). In order to facilitate the discussion, the application is
made for latent class models with binary response variables (1.4). According to the
assumption of local independence, from (1.24), (1.25), and (1.29), we have


I 
I
KL(X, ξ ) = KL(X i , ξ ) = Cov(θi , X i ). (2.12)
i=1 i=1

It means that “the variation of manifest variable vector X in entropy” explained by


latent classes is decomposed into those of manifest variables X i . Since from (1.25)


A
 
E(X i ) = va πai = πi̇ ,
α=1
28 2 Latent Class Cluster Analysis

Table 2.10 Assessment of latent class model M(3) in Table 2.2


Manifest variable X1 X2 X3 X4 (X 1 , X 2 , X 3 , X 4 )
KL 0.298 0.564 0.547 0.480 1.888
ECD 0.229 0.361 0.354 0.324 0.654

Table 2.11 Assessment of latent class model M(2) in Table 2.3


Manifest variable X1 X2 X3 X4 (X 1 , X 2 , X 3 , X 4 )
KL 0.229 0.425 0.361 0.395 1.410
ECD 0.186 0.298 0.265 0.283 0.585


A 
A
πai
E(θi ) = va θai = va log ,
α=1 α=1
1 − πai

and we have


A
  πai
KL(X i , ξ ) = Cov(θi , X i ) = va πai − πi̇ log , i = 1, 2, . . . , I. (2.13)
a=1
1 − πai

Increasing the above information, stronger is the association between manifest


variables X i and latent variable (class) ξ . ECD in GLMs corresponds to the coefficient
of determination R 2 in the ordinary linear regression models. The above discussion
is applied to Table 2.2, and we calculate the KL information and ECDs (Table 2.10).
65.4% of the variation of response variable vector X = (X 1 , X 2 , X 3 , X 4 )T in entropy
is explained by the latent classes. According to the KL information criterion, the
association of manifest variable X 2 with the latent variable is the strongest among
the manifest variables. For latent class model M(2) in Table 2.3, the same assessment
is made and the results are illustrated in Table 2.11. The results are similar to those
for M(3) in Table 2.10.

2.5 Comparison of Latent Classes

In latent class model (2.1), the latent classes are interpreted with the latent response
probabilities to items, {P(X i = xi |a), i = 1, 2, . . . , I }, α = 1, 2, . . . , A. When
there are two latent classes in a population, we can always say one latent class
is higher than the other one in a concept or latent trait. However, where the number
of latent classes is greater than two, we cannot easily assess and compare the latent
classes without latent concepts or traits, and so it is meaningful to make methods for
2.5 Comparison of Latent Classes 29

comparing latent classes in latent class cluster analysis. In this section, first, a tech-
nique similar to canonical analysis is employed to make a latent space to compare
the latent classes, that is, to construct a latent space for locating the latent classes.
We discuss the case where the manifest variables X i are binary. Let

πai = P(X i = 1|a), i = 1, 2, . . . , I ; a = 1, 2, . . . , A

and
 

A
πi = P(X i = 1) = va πai .
a=1

For manifest responses X = (X 1 , X 2 , . . . , X I )T , the following score is given:


I
T = {ci0 (1 − X i ) + ci1 X i }, (2.14)
i=1

where ci0 and ci1 are the weights for responses X i = 0 and X i = 1, respectively. Let

Z i = ci0 (1 − X i ) + ci1 X i , i = 1, 2, . . . , I.

In this setup, we have

Var(Z i ) = (ci0 − ci1 )2 πi (1 − πi ), i = 1, 2, . . . , I, (2.15)

and

   
I
 
Cov Z i , Z j = (ci0 − ci1 ) c j0 − c j1 va (πai − πi ) πa j − π j , i = j. (2.16)
i=1

According to the above formulae, ci0 and ci1 are not identifiable, so we set ci0 = 0,
ci1 = ci , and Z i = ci X i , i = 1, 2, . . . , I . Then, the above formulae are rewritten as


A 
A
Var(Z i ) = ci 2 πi (1 − πi ) = ci 2 va πai (1 − πai ) + ci 2 va (πai − πi )2 , i = 1, 2, . . . , I,
a=1 a=1

and

  
A
 
Cov Z i , Z j = ci c j va (πai − πi ) πa j − π j , i = j.
a=1
30 2 Latent Class Cluster Analysis

Let


A
 
σi j B = va (πai − πi ) πa j − π j
a=1

and
 A
va πai (1 − πai ), i = j;
σi j W = a=1
0, i = j.

Then, the between- and within-class variance


 matrices of responses
 X =
(X 1 , X 2 , . . . , X I )T are defined by  B = σi j B and  W = σi j W , respectively.
In the above setup, the variance of score T (2.14) is calculated as follows:

Var(T ) = (c1 , c2 , . . . , c I ) B (c1 , c2 , . . . , c I )T + (c1 , c2 , . . . , c I ) W (c1 , c2 , . . . , c I )T .


(2.17)

The first term of the right-hand side of the above equation represents the between-
class variance of T and the second term the within-class variance. Let

V B (T ) = (c1 , c2 , . . . , c I ) B (c1 , c2 , . . . , c I )T (2.18)

and

VW (T ) = (c1 , c2 , . . . , c I ) W (c1 , c2 , . . . , c I )T . (2.19)

For determining the weight vector c = (c1 , c2 , . . . , c I ) that assesses the


differences among the latent classes, the following criterion is used:

V B (T ) V B (T )
→ max . (2.20)
VW (T ) c VW (T )

In order to avoid the indeterminacy with respect to (c1 , c2 , . . . , c I ), we impose


constraint

VW (T ) = (c1 , c2 , . . . , c I ) W (c1 , c2 , . . . , c I )T = 1 (2.21)

on maximization (2.20). Then, the criterion is reduced to

V B (T ) → max V B (T ). (2.22)
c
2.5 Comparison of Latent Classes 31

Remark 2.2 In criterion (2.20), variance V B (T ) can be regarded as that according


to latent classes. In this sense, it is interpreted as the signal variance of T , which is
explained by A latent classes. On the other hand, the denominator VW (T ) can be
viewed as the noise variance of T . Hence, the ratio

V B (T )
VW (T )

is the signal-to-noise ratio. In this sense, the above criterion is similar to KL


information (1.29) and can be interpreted as entropy. 
In order to obtain the optimal weight vector c = (c1 , c2 , . . . , c I )T , the following
Lagrange function is introduced:

g(c) = (c1 , c2 , . . . , c I ) B (c1 , c2 , . . . , c I )T − λ(c1 , c2 , . . . , c I ) W (c1 , c2 , . . . , c I )T ,


(2.23)

where λ is the Lagrange multiplier. Differentiating the above function with respect
to vector c = (c1 , c2 , . . . , c I ), we have

∂g(c)
= 2 B (c1 , c2 , . . . , c I )T − 2λ W (c1 , c2 , . . . , c I )T = 0.
∂c
If  W is non-singular, it follows that
 
−1/2 −1/2 −1/2
 W  B  W − λE  W (c1 , c2 , . . . , c I )T = 0,

where E is the identity matrix of order I . Since (c1 , c2 , . . . , c I ) = 0, λ is an


−1/2 −1/2 −1/2 −1/2
eigenvalue of  W  B  W . Let K be the rank of  W  B  W ; λk be the k th
largest eigenvalues of the matrix; and let ξ k , k = 1, 2, . . . , K be the corresponding
eigenvectors. Putting
−1/2
ck =  W ξk,

Tk = (X 1 , X 2 , . . . , X I )ck , k = 1, 2, . . . , K ,

we have
−1/2 −1/2
V B (Tk ) = ck  B ckT = ξ kT  W  B W ξ k = λk , k = 1, 2, . . . , K . (2.24)

−1/2 −1/2
From (2.24), it is seen T1 is the solution of (2.22). Since matrix  W  B  W
is symmetric, eigenvectors ξ k are orthogonal with respect to the inner product, and
thus, it follows that
32 2 Latent Class Cluster Analysis

−1/2 −1/2
ck  B clT = ξ kT  W  B W ξ l = 0, k = l.

From this, the weight vectors ck , k = 1, 2, . . . , K are orthogonal with respect to


the between-variance matrix  B . The weight vectors make the scores or dimensions
Ti to compare or order the latent classes. The locations of latent classes are based on
dimensions Ti , that is,

tak ≡ E(Tk |latent class a) = ( pa1 , pa2 , . . . , pa I )ck , k = 1, 2, . . . , K ; a = 1, 2, . . . , A.

It is suitable to select two or three dimensions to express the locations of latent


classes, that is, (ta1 , ta2 ) or (ta1 , ta2 , ta3 ), a = 1, 2, . . . , A.
The above method for locating latent classes is demonstrated by the use of an
artificial latent class model shown in Table 2.12. For this latent class model, the first-
and second-best score functions (dimensions) T1 and T2 are derived according to
eigenvalues (Table 2.13) and the locations of the latent classes are measured with the
functions (Table 2.14). The score functions are interpreted according to the weights
for manifest response variables X i , i = 1, 2, 3, 4, and the locations of the latent
classes are illustrated in Fig. 2.1. In the practical data analysis, it is an appropriate
idea to interpret and compare the latent classes according to figures like Fig. 2.1.
According to locations (T1 , T2 ) of latent classes in Table 2.14, the distances
between latent classes are calculated. Let d(a, b) be the Euclid distance between
latent classes a and b. Then, we have

Table 2.12 A hypothesized latent class model


Latent class Proportion Positive response probability
X1 X2 X3 X4
1 0.2 0.1 0.5 0.3 0.7
2 0.5 0.8 0.6 0.1 0.2
3 0.3 0.9 0.3 0.7 0.3

Table 2.13 Score functions of the latent class model in Table 2.13
Score function Eigenvalue Weight
X1 X2 X3 X4
T1 0.900 2.508 −0.151 0.468 −0.975
T2 0.530 −0.245 −0.741 0.2302 0.615

Table 2.14 The


Latent Class T1 T2
two-dimensional scores of the
three latent classes in Table 1 −0.366 0.726
2.13 2 1.768 −0.288
3 2.247 1.352
2.5 Comparison of Latent Classes 33

1.7
Class 3

1.2
Class 1

0.7

0.2

-1 -0.5 0 0.5 1 1.5 2 2.5


-0.3
Class 2

-0.8

Fig. 2.1 Locations of latent classes in Table 2.13

d(1, 2) = 2.36, d(2, 3) = 1.71, d(1, 3) = 2.69. (2.25)

The above Euclid distances between latent classes make a tree graph shown in
Fig. 2.2.
Second, a method for comparing latent classes based on entropy is considered.
For simplicity of the discussion, we discuss an entropy-based method for comparing
latent classes in cases where the manifest variables X i are binary. Let

πai = P(X i = 1|a), i = 1, 2, . . . , I ; a = 1, 2, . . . , A,

and let p = ( p1 , p2 , . . . , p K ) and q = (q1 , q2 , . . . , q K ) be two probability


distributions. Then, the divergences between the distributions are calculated as
follows:

2.5

1.5

0.5

0
class 2 class 3 class 1

Fig. 2.2 The tree graph of latent classes in Table 2.12 based on the Euclidian distance
34 2 Latent Class Cluster Analysis


K
pk  K
qk
D( p||q) = pk log , D(q|| p) = qk log , (2.26)
k=1
qk k=1
pk

where


K 
K
pk = qk = 1.
k=1 k=1

As in (1.29), the following KL information is used to measure the difference


between the two distributions:

D ∗ ( p||q) = D( p||q) + D(q|| p). (2.27)

From (2.26) and (2.27), we have


K
D ∗ ( p||q) = ( pk − qk )(log pk − logqk )l.
k=1

In model (2.1), the distribution of manifest variable vector X =


(X 1 , X 2 , . . . , X I )T in latent class a is


I
P(X = x|a) = P(X i = xi |a). (2.28)
i=1

Let p(X|a) be the probability distribution of X = (X 1 , X 2 , . . . , X I )T in latent


class α and let p(X i |a) be those of variables X i , i = 1, 2, . . . , I . Then, we have

D ∗ ( p(X|a)|| p(X|b)) = D( p(X|a)|| p(X|b)) + D( p(X|b)|| p(X|a)). (2.29)

From (2.28), we also obtain


I

D ( p(X|a)|| p(X|b)) = D ∗ ( p(X i |a)|| p(X i |b))
i=1


I
= {D( p(X i |a)|| p(X i |b)) + D( p(X i |b)|| p(X i |a))}.
i=1
(2.30)

We see that the KL information concerning manifest variable vector X (2.29)


is decomposed into I measures of KL information according to manifest variables
X i , i = 1, 2, . . . , I . When manifest variables are binary, binary categories are, for
2.5 Comparison of Latent Classes 35

Table 2.15 KL distances between latent classes for variables X i


X1 X2 X3 X4
D ∗ ( p(X i |1)|| p(X i |2)) 2.51 0.04 0.27 1.12
D ∗ ( p(X i |2)|| p(X i |3)) 0.08 0.38 1.83 0.05
D ∗ ( p(X i |1)|| p(X i |3)) 3.52 0.17 0.68 0.68

example, {yes, no}, {positive, negative}, {success, failure}, and so on. Let πai be
the positive responses of manifest variables X i , i = 1, 2, . . . , I in latent classes
a = 1, 2, . . . , A. Then, (2.30) becomes
⎧ ⎫
⎪ π log πai + (1 − π )log 1 − πai + π log πbi ⎪
I ⎪ ⎪
 ⎨ ai πbi
ai
1 − πbi
bi
πai ⎬
D ∗ (P(X = x|a)||P(X = x|b)) = .
⎪ 1 − πbi ⎪
i=1⎪
⎩ + (1 − πbi )log ⎪

1 − πai
(2.31)

Applying the above results to Table 2.12, Table 2.15 illustrates the KL distances
between latent classes for manifest variables X i . From this table, we have
⎧ ∗ 4
⎨ D (P(X = x|1)||P(X = x|2)) = i=1 D ∗ ( p(X i |1)|| p(X i |2)) = 3.94
D ∗ (P(X = x|2)||P(X = x|3)) = 2.34,

D ∗ (P(X = x|1)||P(X = x|3)) = 5.04.
(2.32)

Based on the above measures, cluster analysis is used to compare the latent class
model. Latent classes 2 and 3 are first combined, and the distance between {class 2,
class 3} and class 1 is calculated by
 
min D ∗ (P(X = x|1)||P(X = x|2)), D ∗ (P(X = x|1)||P(X = x|3))
= D ∗ (P(X = x|1)||P(X = x|2)) = 3.94.

From this, we have a tree graph shown in Fig. 2.3, and the result is similar to that
in Fig. 2.2. As demonstrated above, the entropy-based method for comparing latent
classes can be easily employed in data analyses.
Remark 2.3 From (2.31), we have

I  
πai πbi
D ∗ (P(X = x|a)||P(X = x|b)) = (πai − πbi ) log − log .
i=1
1 − πai 1 − πbi

In a discussion similar to ECD in Sect. 1.6 (1.30), the above information


(entropy) can be interpreted as a signal-to-noise ratio. In this case, the signal is
D ∗ (P(X = x|a)||P(X = x|b)) and the noise is 1. From this, a standardized KL
36 2 Latent Class Cluster Analysis

4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
class 2 class 3 class 1

Fig. 2.3 The tree graph of latent classes in Table 2.12 based on KL information (2.32)

distance (2.31) can be defined by

D ∗ (P(X = x|a)||P(X = x|b))


D ∗ (P(X = x|a)||P(X = x|b)) = .
D ∗ (P(X = x|a)||P(X = x|b)) + 1

The interpretation of the above quantity can be given as that similar to ECD (1.30),
i.e., the ratio of the variation of the two distributions in entropy.

2.6 Latent Profile Analysis

The latent profile model has been introduced in Sect. 1.4. In this section, an ML
estimation procedure via the EM algorithm is constructed. Let X i , i = 1, 2, . . . , I
 class a =
be manifest continuous variables; let X ai be the latent variables in latent
1, 2, . . . , A, which are distributed according to normal distributions N μai , ψi2 ; and
let Z a be the latent binary variables that take 1 for an individual in latent class a and
0 otherwise. Then, the model is expressed as


A
Xi = Z a X ai , i = 1, 2, . . . , I. (2.33)
a=1

Let xi j , i = 1, 2, . . . , I, j = 1, 2, . . . , n be observed data that randomly selected


individuals j take for manifest variables X i , and let Z a j X ai j , i = 1, 2, . . . , n, j =
1, 2, . . . , n, a = 1, 2, . . . , A be unobserved data that randomly selected individuals
j in latent classes a take for latent variables Z a X ai .
Then, the incomplete and complete data are given, respectively, by
  
 = xi j , i = 1, 2, . . . , I, j = 1, 2, . . . , n ,
Data
(2.34)
D = Z a j X ai j , i = 1, 2, . . . , n; a = 1, 2, . . . , A ,
2.6 Latent Profile Analysis 37

where


A
xi j = Z a j X ai j , i = 1, 2, . . . , I ; j = 1, 2, . . . , n. (2.35)
a=1

   
Let φ = va , μai , ψi2 be the parameter vector, and s φ = s va , s μai , s ψi2 be the
estimated parameters in the s th step in the EM algorithm. In order to construct the
EM algorithm, the joint density function f (x|a) (1.16) is expressed by

  
   I
1 (xi − μai )2
f x| μai , ψi ≡ f (x|a) =
2
 exp − . (2.36)
i=1 2π ψi2 2ψi2

From (1.16), the log likelihood function of φ based on the complete data D is
given by
 2

A 
n 
I
1 
A 
n 
I
Z a j xi j − μai
logl(φ|D) = Z a j logva + n log  −
a=1 j=1 i=1 2π ψi2 a=1 j=1 i=1
2ψi2

 2

A 
n
n
I A  n  I
Z a j xi j − μai nI
= Z a j logva − logψi −
2
2
− log2π.
a=1 j=1
2 i=1 a=1 j=1 i=1
2ψ i 2
(2.37)
 
Let x j = x1 j , x2 j , . . . , x I j , j = 1, 2, . . . , n be the observed vectors for
individuals j. Then, the EM algorithm is given as follows:
(i) E-step
For estimate s φ at the p th step, compute the conditional expectation of log f (x, ξ |φ)
given the incomplete data X = x and parameter s φ:

    A 
n
 
Q φ|s φ = E logl(φ|D)|x, s φ = E Z a j |x j , s φ logva
a=1 j=1
  2
n 
I A  n  I
E Z a j |x j , s φ xi j − μai nI
− logψi2 − 2
− log2π, (2.38)
2 2ψi 2
i=1 a=1 j=1 i=1

where
  
  va f x j | s μai , s ψi2
s
E Z a j |x j , φ =  A
s
   . (2.39)
a=1 va f x j | μai , ψi
s s s 2

(ii) M-step
38 2 Latent Class Cluster Analysis

By using the Lagrange multiplier λ, the Lagrange function is given by

 s   s  
A
L φ| φ = Q φ| φ − λ va . (2.40)
a=1

Differentiating the above function with respect to va , we have

∂  s  1   
n
L φ| φ = E Z a j |x, s φ − λ = 0, a = 1, 2, . . . , A.
∂va va j=1

From the above equations, we obtain


A 
n
 
λ= E Z a j |x j , s φ = n.
a=1 j=1

From this, it follows that

1  
n
s+1
va = E Z a j |x j , s φ . (2.41)
n j=1

By differentiating (2.40), with respect to μai , we get


  
∂   n
E Z a j |x j , s φ xi j − μai
L φ|s φ = = 0, i = 1, 2, . . . , I ; a = 1, 2, . . . , A.
∂μai ψ2
j=1 i

From the above equations, we have


n  
j=1 x i j E Z a j |x j , φ
s
s+1
μai = n   , i = 1, 2, . . . , I ; a = 1, 2, . . . , A. (2.42)
j=1 E Z a j |x j , φ
s

Similarly, the partial differentiation of the Lagrange function with respect to ψi2
gives

  2
A  n E Z a j |x j , s φ xi j − s+1 μai
∂  s  n
L φ| φ = − 2 + = 0, i = 1, 2, . . . , I.
∂ψi2 2ψi a=1 j=1
2ψi4

From this, we have


  2
A  n
E Z a j |x j , s φ xi j − s+1
μai
s+1
ψi2 =
a=1 j=1
n
2.6 Latent Profile Analysis 39
⎛ ⎞
1 ⎝ 2    
n A n
= xi j − μai
s+1 2
E Z a j |x j , s φ ⎠, i = 1, 2, . . . , I. (2.43)
n j=1 a=1 j=1

By using the above algorithm, the ML estimates of parameters in the latent profile
model can be obtained. In some situations, we may relax the local independence of
manifest variables, for example, correlations between some variables are assumed;
however, overviewing the above process for constructing the EM algorithm, the
modification is easy to make via a similar manner. In order to demonstrate the above
algorithm, an artificial data set and the estimated parameters are given in Tables 2.16
and 2.17, respectively. The artificial data can be produced as a mixture of N(μ1 , )
and N(μ2 , ), where

μa = (μa1 , μa2 , . . . , μa10 )T , a = 1, 2;

⎛ ⎞
ψ12 0 ··· 0
⎜ 0 ψ22 ··· 0 ⎟
⎜ ⎟
=⎜ . .. .. .. ⎟.
⎝ .. . . . ⎠
00 · · · ψ10
2

Remark 2.4 If in the latent profile model, the error variances of manifest variables
X i in latent classes a are different, that is, Var(X i |a) = ψai2 , i = 1, 2, . . . , I , then,
the estimates in the EM algorithm are modified as follows:
n   2
j=1 Z a j |x j , s φ xi j − s+1 μai
s+1
ψai2 = n   , i = 1, 2, . . . , I ; a = 1, 2, . . . , A.
j=1 E Z a j |x j , φ
s

In a framework of GLMs, the entropy coefficient of determination of the latent


profile model can be calculated. The conditional density function of manifest variable
vector X, i.e., random component, is the following normal distribution:


I 
I  
1 (xi − μi )2
f (x|a) = f i (xi |a) =  exp − ,
i=1 i=1 2π ψ 2 2ψi2
i

where f i (xi |a) are the conditional density functions in latent class a. As in the factor
analysis model, we have
  
1 xi μi − 21 μi2 xi2
f i (xi |a) =  exp + − 2 ,
2π ψi2 ψi2 2ψi
40 2 Latent Class Cluster Analysis

Table 2.16 Artificial data for a simulation


Numbera X1 X2 X3 X4 X5 X6 X7 X8 X9 X 10
1 29.98 21.40 21.91 12.54 30.46 35.05 30.07 24.04 30.85 11.73
2 42.93 46.17 38.75 51.48 47.47 41.30 38.79 41.12 33.39 42.63
3 29.51 49.00 34.85 46.67 50.78 35.30 20.67 41.24 41.98 16.49
4 34.42 43.37 42.53 43.41 39.21 50.28 48.68 39.36 35.10 45.68
5 35.22 43.66 44.31 46.80 34.07 36.81 49.54 39.73 49.23 44.65
6 30.61 39.94 42.09 43.11 27.25 36.45 49.81 39.11 48.37 16.05
7 27.23 25.86 30.55 23.60 31.98 25.60 41.37 26.07 14.37 30.25
8 43.66 30.58 31.76 41.63 26.89 34.02 29.57 37.23 30.71 25.05
9 19.90 30.27 32.75 18.27 34.13 26.17 20.84 29.68 36.82 23.96
10 29.32 28.90 33.04 45.77 27.73 33.10 45.27 26.15 28.78 22.88
11 39.38 36.16 40.36 47.56 46.10 38.43 44.71 40.33 41.82 44.21
12 35.49 25.73 32.99 21.80 35.73 38.15 38.54 31.39 37.91 15.65
13 48.73 34.27 43.21 45.65 40.69 34.57 28.97 47.08 43.34 34.43
14 25.89 31.23 41.40 44.30 30.56 41.38 33.50 34.12 53.05 39.51
15 37.39 33.85 35.73 43.08 43.26 41.03 49.84 48.73 20.86 41.47
16 38.33 39.71 39.04 21.27 42.99 41.01 30.16 37.33 38.08 52.02
17 43.56 48.59 37.68 46.43 33.38 27.64 42.92 39.14 23.48 32.16
18 41.40 42.39 43.62 26.89 36.97 33.15 51.55 39.81 41.17 36.23
19 23.36 27.78 31.13 33.09 31.00 32.45 30.42 30.28 32.34 28.80
20 35.49 41.00 42.27 41.90 36.44 38.88 44.68 46.66 29.26 40.37
21 42.48 43.97 41.58 44.15 43.34 42.98 39.59 39.07 51.95 51.45
22 42.21 30.43 46.75 50.77 36.22 31.73 43.94 39.71 31.36 38.43
23 34.83 44.82 37.13 40.10 43.17 46.87 26.20 39.91 37.02 36.81
24 39.68 33.38 44.34 48.00 32.46 49.79 39.56 42.72 42.54 54.89
25 57.73 41.44 40.58 51.59 37.74 39.89 34.12 37.63 40.65 49.14
26 31.38 25.03 35.82 36.18 42.96 43.10 17.13 41.86 39.48 40.61
27 25.13 40.22 39.07 53.04 36.44 44.52 36.56 36.05 40.40 39.12
28 41.28 45.86 40.68 36.38 35.56 36.01 35.67 38.75 50.98 35.68
29 40.29 43.51 41.62 43.54 40.50 48.23 43.45 39.34 40.04 46.94
30 29.10 39.90 39.22 33.01 36.63 33.99 48.90 35.36 33.86 48.62
31 23.15 46.68 38.54 44.25 43.59 40.44 27.82 40.59 36.07 53.05
32 37.25 43.15 43.34 48.70 36.06 37.95 39.80 40.58 35.96 44.28
33 41.90 27.29 40.14 34.73 42.43 45.44 46.53 34.71 39.54 34.85
34 35.89 42.12 34.88 28.27 38.00 37.72 40.20 39.50 38.75 50.63
35 38.06 46.52 42.25 40.60 48.88 41.51 41.22 39.26 42.17 42.69
36 54.10 41.94 34.37 42.40 40.55 48.24 39.59 37.43 36.21 52.97
(continued)
2.6 Latent Profile Analysis 41

Table 2.16 (continued)


Numbera X1 X2 X3 X4 X5 X6 X7 X8 X9 X 10
37 34.51 39.71 36.92 34.76 45.32 41.96 40.77 37.48 29.30 45.20
38 43.49 29.91 40.68 35.27 37.08 44.86 45.41 35.67 41.09 52.50
39 33.28 48.02 39.74 49.58 46.76 35.46 42.19 46.45 31.18 45.67
40 43.19 41.79 40.63 34.80 40.94 33.25 45.27 41.52 38.99 44.07
41 41.84 38.39 42.78 42.70 29.65 28.38 47.16 40.80 36.83 39.81
42 39.58 35.49 40.89 41.51 44.39 40.39 36.03 41.40 39.28 24.35
43 45.69 30.53 39.37 46.24 32.83 42.10 40.59 37.02 39.02 40.29
44 32.33 51.63 41.19 43.05 37.27 56.24 26.20 46.26 39.84 26.44
45 35.71 46.13 44.59 28.89 34.21 42.95 35.78 37.53 33.84 44.41
46 48.10 41.86 41.00 34.68 43.57 47.68 37.73 39.94 30.89 32.79
47 36.24 42.08 40.92 31.25 46.97 32.84 47.80 42.12 35.10 51.52
48 41.80 38.64 38.81 36.58 37.25 37.30 26.64 41.30 44.83 41.18
49 29.41 26.24 43.38 32.65 45.29 34.81 39.09 38.78 43.61 59.67
50 36.18 39.57 41.21 35.43 34.36 33.46 33.51 44.07 43.94 53.88
a Number implies the data number produced for the simulation study

Table 2.17 The estimated parameters in a latent profile model with two latent classes
X1 X2 X3 X4 X5 X6 X7 X8 X9 X 10
μ1i 31.26 27.21 30.59 28.08 31.14 32.08 33.74 29.25 30.25 22.61 v1 0.140
μ2i 38.11 39.90 40.42 40.74 39.53 39.91 39.03 40.15 38.69 42.04 v2 0.860
ψi2 48.11 36.51 8.81 64.74 26.79 34.01 65.52 11.77 46.33 81.96 – –

so we can set

x2 
1 2
θi = μi , ai (ϕi ) = ψi2 , bi (θi ) = μi , ci (xi , ω) = − i 2 − log 2π ψi2 , i = 1, 2, . . . , I.
2 2ψi

Since the latent variable vector in the latent profile model is Z =


(Z 1 , Z 2 , . . . , Z A )T , and the systematic components are given as follows:


A
θi = μi = μai Z a , i = 1, 2, . . . , I.
a=1

Let θ = (θ1 , θ2 , . . . , θ I )T . Then, from (1.29) we have


I
1 
A 
I
1 
A
KL(X, Z) = tr −1 Cov(θ, X) = Cov(θi , X i ) = Cov(μai Z a , X i ). (2.44)
ψ2
i=1 i a=1 i=1
ψi2 a=1
42 2 Latent Class Cluster Analysis

Since


A
E(X i ) = va μai ,
a=1

we have


A 
A
Cov(μai Z a , X i ) = va μai
2
− μi2 ,
a=1 a=1

where


A
μi = va μai , i = 1, 2, . . . , I.
a=1

The entropy coefficient of determination (ECD) is calculated by

KL(X, Z)
ECD(X, Z) = . (2.45)
KL(X, Z) + 1

Similarly, we also have

KL(X i , Z)
ECD(X i , Z) = , (2.46)
KL(X i , Z) + 1

where

1 
A
KL(X i , Z) = Cov(μai Z a , X i ).
ψi2 a=1

The information is that of X which the latent variable has. As in (2.9), the best
way to classify observed data X = x into the latent classes is based on the maximum
posterior probability of Z, that is, for
I
va f i (xi |a)
P(a0 |(x1 , x2 , . . . , x I )) = max  A i=1
I , (2.47)
a
b=1 vb i=1 f i (xi |b)

an individual with response x = (x1 , x2 , . . . , x I ) is evaluated as a member in latent


class a0 .
The above discussion is applied to the estimated latent profile model shown
in Table 2.17. Although in Table 2.18 there exist manifest variables that are less
2.6 Latent Profile Analysis 43

Table 2.18 Assessment of the latent profile model with two latent classes in Table 2.18
X1 X2 X3 X4 X5 X6 X7 X8 X9 X 10 X
KL 0.12 0.54 1.36 0.30 0.33 0.23 0.06 1.24 0.19 0.56 4.92
ECD 0.11 0.35 0.58 0.23 0.25 0.18 0.05 0.55 0.16 0.36 0.83

explained by the latent variable Z= (Z 1 , Z 2 )T , for example, X 1 and X 7 , 83% of


the variation of manifest variable vector X in entropy is explained by the latent
variable. The ECDs in Table 2.18 are interpreted as the ratios of reduced uncer-
tainty with respect to latent variable vector Z= (Z 1 , Z 2 )T , that is, latent classes. In
effect, Table 2.19 shows the true and the estimated (assigned) latent classes of data
in Table 2.16. Based on the true latent classes, the data in Table 2.16 have been
made, and according to the estimated latent profile model, individuals are assigned
to latent classes by using (2.47) (Table 2.19). The consistency ratio between true and
estimated latent classes is 0.68(= 34/50).

Table 2.19 The true and assigned latent classes of individuals


Numbera 1 2 3 4 5 6 7 8 9 10
LCb 2 2 2 2 2 2 2 2 2 1
ALCc 1 2 2 2 2 2 1 1 1 1
Number 11 12 13 14 15 16 17 18 19 20
LC 1 1 2 2 2 2 2 2 2 2
ALC 2 1 2 2 2 2 2 2 1 2
Number 21 22 23 24 25 26 27 28 29 30
LC 2 2 2 2 1 2 1 2 1 1
ALC 2 2 2 2 2 2 2 2 2 2
Number 31 32 33 34 35 36 37 38 39 40
LC 2 1 1 2 2 2 2 2 2 1
ALC 2 2 2 2 2 2 2 2 2 2
Number 41 42 43 44 45 46 47 48 49 50
LC 2 2 2 1 2 2 2 1 1 2
ALC 2 2 2 2 2 2 2 2 2 2
a Numbers imply those corresponding to data in Table 2.17
b LCs imply the true latent classes of the correspondent data in Table 2.17
c ALCs imply the latent classes of the correspondent data in Table 2.17, assigned with the estimated

latent profile model in Table 2.18


44 2 Latent Class Cluster Analysis

2.7 Discussion

In this chapter, first, a general latent class analysis is discussed and for the ML
estimation of the latent class model, the EM algorithm is constructed. For three
data sets, latent class analysis has been demonstrated. Concerning the χ 2 -test of
the goodness-of-fit of the latent class model, the model fits the Stouffer-Toby and
Lazarsfeld-Stouffer’s data sets; however, we cannot have a good fit to McHugh’s
data set. Since the estimated latent class models for Stouffer-Toby’s and Lazarsfeld-
Stouffer’s data sets have latent classes ordered as shown in Tables 2.2, 2.3, 2.8, and
2.9, it is meaningful to discuss latent class analysis assuming ordered latent classes,
for example, the latent distance model. The basic latent class model treats latent
classes parallelly, that is, without any assumption on latent response probabilities, and
then, the analysis is called an exploratory latent class analysis or latent class cluster
analysis [14, 21]. The number of latent classes in the latent class model is restricted
by inequality (1.5), so to handle more ordered latent classes, it is needed to make
parsimonious models as another approach. In order to assess the model performance,
the explanatory power or goodness-of-fit of the model can be measured with ECD
as demonstrated in Sect. 2.4. In the interpretation of latent classes, a method for
locating the latent classes in a Euclidian space is given and the method is illustrated.
An entropy-based method to compare the latent classes is also presented, and the
method measures, in a sense, the KL distances between the latent classes, and the
relationship among the latent classes is illustrated with cluster analysis. In Sect. 2.6,
the latent profile model is considered, and the ML estimation procedure via the
EM algorithm is constructed. A numerical illustration is given to demonstrate the
latent profile analysis. These days, computer efficiency has been greatly increased,
so the ML estimation procedures given in the present chapter can be realized in the
EXCEL work files. The author recommends readers to make the calculations for
the ML estimation of the latent class models for themselves. The present chapter
has treated the basic latent class model, that is, an exploratory approach to latent
class analysis. There may exist further studies to develop latent structure analysis,
for example, making latent class model with ordered latent classes and extending
the latent distance model (Lazarsfeld and Henry, 1968) and latent class models with
explanatory variables [2]. Latent class analysis has also been applied in medical
research [16, 17], besides in psychological and social science research [22]. In order
to challenge confirmatory latent class approaches, it is important to extend research
areas to apply the latent class model, and due to that, new latent structure models will
be constructed to make effective and significant methods of latent structure analysis.
References 45

References

1. Anderson, T. W. (1954). On estimation of parameters in latent structure analysis. Psychome-


trika, 19, 1–10.
2. Dayton, C. M., & Macready, G. B. (1988). Concomitant-variable latent class models. Journal
of the American Statistical Association, 83, 173–178.
3. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete
data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, B, 39,
1–38.
4. Eshima, N., & Tabata, M. (2010). Entropy coefficient of determination for generalized linear
models. Computational Statistics and Data Analysis, 54, 1381–1389.
5. Forman, A. K. (1978). A note on parameter estimation for Lazarsfeld’s latent class analysis.
Psychometrika, 43, 123–126.
6. Gibson, W. A. (1955). An extension of Anderson’s solution for the latent structure equations.
Psychometrika, 20, 69–73.
7. Gibson, W. A. (1962). Extending latent class solutions to other variables. Psychometrika, 27,
73–81.
8. Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and
unidentifiable models. Biometrika, 61, 215–231.
9. Goodman, L. A. (1979). On the estimation of the parameters in latent structure analysis.
Psychometrika, 44, 123–128.
10. Green, B. F. (1951). A general solution for the latent class model of latent structure analysis.
Psychometrika, 16, 71–76.
11. Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis.
In Soufer, S. A., Guttman, L., & others (Eds.), Measurement and prediction: Studies in social
psychology I World War II (Vol. 4). Prenceton University Press.
12. Lazarsfeld, P. F. & Henry, N. M. (1968). Latent Structure Analysis, Boston: Houghton Mifflin.
13. Madansky, A. (1960). Determinantal methods in latent class analysis. Psychometrika, 25, 183–
198.
14. Magidson, J., & Vermunt, J. K. (2001). Latent class factor and cluster models: Bi-plots, and
related graphical displays. Sociological Methodology, 31, 223–264.
15. McHugh, R. B. (1956). Efficient estimation of local identification in latent class analysis.
Psychometrika, 20, 331–347.
16. Nosetti, L., Paglietti, M. G., Brunetti, L., Masini, L., Grutta, S. L., & Cilluffo, G. (2020).
Application of latent class analysis in assessing the awareness, attitude, practice and satisfaction
of paediatricians on sleep disorder management in children in Italy. PLoS One, 15(2), e0228377.
https://doi.org/10.1371/journal.pone.0228377
17. Petersen, K. J., Qualter, P., & Humphery, N. (2019). The application of latent class analysis
for investigating population child mental health: A systematic review. Frontiers in Psychology,
10, 1214. https://doi.org/10.3389/fpsyg.2019.01214.eCollection2019.
18. Price, L. C., Dayton, C. M., & Macready, G. B. (1980). Discovery algorithms for hierarchical
relations. Psychometrika, 45, 449–465.
19. Proctor. C. H. (1970). A probabilistic formulation and statistical analysis of Guttman scaling,
Psychometrika 35, 73-78.
20. Stouffer, S. A., & Toby, J. (1951). Role conflict and personality. The American Journal of
Sociology, 56, 395–406.
21. Vermunt, J. K. (2010). Latent Class Models, International Encyclopedia of . Education, 7,
238–244.
22. Vermunt, J. K. (2003). Applications of latent class analysis in social science research. In Nielsen,
T. D., Zhang, N. L. (Eds.), Symbolic and quantitative approaches to reasoning with uncertainty.
ECSQARU 2003. Lecture Notes in Computer Science (Vol. 2711). Springer, Berlin, Heidelberg.
https://doi.org/10.1007/978-3-540-45062-7_2.
Chapter 3
Latent Class Analysis with Ordered
Latent Classes

3.1 Introduction

In latent class analysis with two latent classes, we can assume one class is higher
than the other in a sense; however, for more than two latent classes, we cannot
necessarily order them in one-dimensional sense. In such cases, to compare and
interpret the latent classes, the dimensions of latent spaces to locate them are the
number of latent classes minus one. In Chap. 2, a method for locating latent classes
in a latent space has been considered, and the distances between latent classes are
measured with the Euclidian distance in the latent space and then, cluster analysis is
applied to compare the latent classes. Moreover, the Kullback–Leibler information
(divergence) is also applied to measure the distances between the latent classes. Let
πai , i = 1, 2, . . . , I ; a = 1, 2, . . . , A be the positive response probabilities to binary
item X i in latent class a. If

π1i ≤ π2i ≤ · · · ≤ π Ai , i = 1, 2, . . . , I, (3.1)

the latent classes can be ordered in one-dimensional concept or continuum. As shown


in Examples in Sect. 2.3, latent class models with three latent classes have been
estimated in Tables 2.2 and 2.8, and the estimated latent response probabilities are
consistently ordered in the magnitudes as in (3.1), though the results came from
exploratory latent class analyses, which are called latent class cluster analyses [11,
17]. In general, the explanatory latent class analysis cannot assure the consistency
in order such as in (3.1) in parameter estimations. Two attempts for considering
ordered latent classes were proposed in Lazarsfel and Henry [9], that is, the latent
distance model that is an extension of the Guttman scaling analysis and a latent
class model in which the latent classes are located in one-dimensional continuum
by using polytomous functions for describing latent binary response probabilities
πai . In the latter model, it is very difficult to estimate the model parameters with

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 47
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_3
48 3 Latent Class Analysis with Ordered Latent Classes

constraints 0 ≤ πai ≤ 1, i = 1, 2, . . . , I ; a = 1, 2, . . . , A. For latent distance


analysis, the EM algorithm for the maximum likelihood estimation was given by
Eshima and Asano [6] and by using a logit model instead of polytomous functions,
ordered latent classes were treated by Eshima and Asano [7]. Croon [1] discussed
latent class analysis with ordered latent classes for polytomous manifest variables
by a non-parametric approach; however, the number of parameters is increasing as
that of latent classes increasing, so in order to analyze ordered latent classes, it is
better to use logit models for latent response probabilities, as shown in the subsequent
sections. The approach can also be viewed as item response models with discrete
latent traits. The ML estimation procedures for the Rasch models were discussed by
De Leeuw and Verhelst [4] and Lindsay et al. [10].
In Sect. 3.2, latent distance model is discussed and an ML estimation proce-
dure for the model parameters is constructed by the EM algorithm [6]. The proce-
dure is demonstrated by using data sets used in Chap. 2. Section 3.3 discusses a
method for assessing the latent Guttman scaling. In Sect. 3.4, the latent Guttman
scaling is applied for discussing the association between two latent continuous traits.
Section 3.5 provides an approach for dealing with ordered latent classes by the Rasch
model [15]. In Sect. 3.6, a two-parameter latent trait model is treated and the ML
estimation of the parameters is discussed through the EM algorithm [5]. Finally,
Sect. 3.7 gives a discussion to lead to further studies.

3.2 Latent Distance Analysis

In the Guttman scaling, test items are ordered in the response difficulty and the
purpose of the scaling is to evaluate the subjects under study in one-dimensional scale
(trait or ability). Let X i , i = 1, 2, . . . , I be responses to test items i, and let us set
X i = 1 for positive responses and X i = 0 for negative ones. If X i = 1, then, X i−1 =
1, i = 2, 3, . . . , I in the Guttman scaling, and thus, in a strict sense, there would
be I + 1 response patterns, (0, 0, . . . , 0), (1, 0, . . . , 0), . . . , (1, 1, . . . , 1), which we
could observe; however, in the real observation, the other response patterns will also
occur due to two kinds of response errors. One is the intrusion error and the other
is the omission error. Hence, the Guttman scaling patterns can be regarded as skills
with which the subjects can solve or respond successfully, for example, suppose that
the ability of calculation in the arithmetic is measured by the following three items:

(i) X 1 : x + y =?, (ii) X 2 : x × y =?, and (iii) X 3 : x ÷ y =?,

then, the items are ordered in difficulty as the above order, and the response patterns to
be observed would be (0, 0, 0), (1, 0, 0), (1, 1, 0), (1, 1, 1). However, it is sensible
to take the response errors into account. Let Si be skills for manifest responses
X i , i = 1, 2, . . . , I ; let Si = 1 be states of skill acquisitions for solving items i;
Si = 0 be those of non-skill acquisitions for items i and let us set.
3.2 Latent Distance Analysis 49

π Li (si = 0)
P(X i = 1|Si = si ) = , i = 1, 2, . . . , I. (3.2)
π H i (si = 1)

Then, the intrusion error probabilities are π Li and the omission error probabilities
1 − π H i , and the following inequalities should hold:

0 < π Li < π H i < 1, i = 1, 2, . . . , I. (3.3)

The latent classes and the positive response probabilities π Li and π H i are illus-
trated in Table 3.1, where latent classes are denoted with the numbers of skill acqui-
I
sitions, i=1 si . In this sense, the latent classes are ordered in a hypothesized ability
or trait, and it is significant for an individual with a response x = (x1 , x2 , . . . , x I )T
to assign to one of the latent classes, that is, an assessment of the individual’s ability.
In the latent distance model (3.2), the number of parameters is 3I .

Remark 3.1 Term “skill” is used in the above explanation. Since it can be thought
that skills represent thresholds in a continuous trait or ability to respond for test
binary items, term “skill” is employed for convenience’s sake in this book.

Overviewing the latent distance modes historically, the models are restricted
versions of (2.2), considering the easiness of the parameter estimation. Proctor [14]
proposed a model with

π L (si = 0)
P(X i = 1|Si = si ) = , i = 1, 2, . . . , I. (3.4)
1 − π L (si = 1)

The intrusion and omission error probabilities are the same as π L , constant through
all items. Dayton and Macready [2] used the model with

π L (si = 0)
P(X i = 1|Si = si ) = , i = 1, 2, . . . , I, (3.5)
π H (si = 1)

and the following improvement version of the above model was also proposed by
Dayton and Macready [3]:

Table 3.1 Positive response probabilities in the latent distance model


Latent class 0 1 2 ··· I
X1 π L1 πH 1 πH 1 ··· πH 1
X2 π L2 π L2 πH 2 ··· πH 2
.. .. .. .. .. ..
. . . . . .
XI πL I πL I πL I ··· πH I
50 3 Latent Class Analysis with Ordered Latent Classes

π Li (si = 0)
P(X i = 1|Si = si ) = , i = 1, 2, . . . , I. (3.6)
1 − π Li (si = 1)

In the above model, the intrusion error and omission error probabilities are the
same as π Li for items i = 1, 2, . . . , I . The present model (3.2) with (3.3) is a general
version of the above models.
In the present chapter, the ML estimation of model (3.2) with (3.3) is considered.
For this model, the following reparameterization is employed [6]:
 exp(αi )
π
1+exp(αi ) (= Li ) (si = 0)
P(X i = 1|Si = si ) = exp(αi +exp(βi )) , i = 1, 2, . . . , I. (3.7)
π
1+exp(αi +exp(βi )) (= H i ) (si = 1)

In this expression, the constraints (3.3) are satisfied. The above model expression
can be simplified as follows:

exp(αi + si exp(βi ))
P(X i = 1|Si = si ) = , i = 1, 2, . . . , I. (3.8)
1 + exp(αi + si exp(βi ))

Let S = (S1 , S2 , .., S I )T be a latent response (skill acquisition) vector and let X =
(X 1 , X 2 , .., X I )T a manifest response vector. Then, the latent classes corresponding
I
to latent response s = (s1 , s2 , .., s I )T can be described with score k = i=1 si . From
(2.1), we have

P(X = x|S = s)
I  xi  1−xi
exp(αi + si exp(βi )) 1
=
i=1
1 + exp(αi + si exp(βi )) 1 + exp(αi + si exp(βi ))
I
exp{xi (αi + si exp(βi ))}
=
i=1
1 + exp(αi + si exp(βi ))

and


I 
I I
exp{xi (αi + si exp(βi ))}
P(X = x) = vk P(X = x|k) = vk .
k=0 k=0 i=1
1 + exp(αi + si exp(βi ))

In order to estimate the parameters φ = ((vk ), (αi ), (βi ))T , the following EM
algorithm is used.
EM algorithm I
(i) E-step
3.2 Latent Distance Analysis 51

Let s φ = ((s vk ), (s αi ), (s βi ))T be the estimate of parameter vector φ at the s th iter-


ation in the EM algorithm. Then, in the (s + 1) th iteration, the conditional expecta-
tions of complete data (n(x, k)) for given parameters s φ = ((s vk ), (s αi ), (s βi )) are
calculated as follows:
I s
s
vk i=1 P(X i = xi |k)
s+1
n(x, k) = n(x)  I I s , k = 0, 1, 2, . . . , I, (3.9)
sv
m=0 m i=1 P(X i = x i |m)

where
exp{xi (s αi + si exp(s βi ))}
s
P(X i = xi |k) = , xi = 0, 1.
1 + exp(s αi + si exp(s βi ))

(ii) M-step
By using

the complete
data (3.9), the loglikelihood function based on the complete
data s+1 n(x, k) is given by




Q φ|s φ = l φ| s+1 n(x, k)


I  I
exp{xi (αi + si exp(βi ))}
= s+1
n(x, k)log vk
k=0 x i=1
1 + exp(αi + si exp(βi ))


I 
= s+1
n(x, k)logvk
k=0 x
 

I  
I
+ s+1
n(x, k) {xi (αi + si exp(βi )) − log(1 + exp(αi + si exp(βi )))} .
k=0 x i=1
(3.10)

With respect to s+1 vk , as in (2.10), we have


 s+1
 s+1
n(x, k) n(x, k)
s+1
vk = x
= x
, k = 0, 1, 2, . . . , I ;
λ N

however, the other parameters αi and βi cannot be obtained explicitly, so we have


to use the Newton–Raphson method for maximizing Q(φ|s φ) in the M-step. The
first derivatives of Q(φ|s φ) with respect to αi and βi , respectively, are calculated as
follows:
 
∂ Q(φ|s φ)   s+1
I
exp(αi + si exp(βi ))
= n(x, k) xi −
∂αi k=0 x
1 + exp(αi + si exp(βi ))
52 3 Latent Class Analysis with Ordered Latent Classes


I 
= s+1
n(x, k)(xi − P(X i = 1|Si = si )), i = 1, 2, . . . , I ;
k=0 x

∂ Q(φ|s φ)   s+1
I
= n(x, k)(xi − P(X i = 1|Si = si ))si exp(βi ), i = 1, 2, . . . , I.
∂βi k=0 x

Then, the 2I -dimensional gradient vector is set as


⎛ ⎞
∂ Q(φ|s φ)
g= ⎝  ∂αi s  ⎠. (3.11)
∂ Q(φ| φ)
∂αi

Consequently, the second-order partial derivatives of Q(φ|s φ) are calculated as


follows:

∂ 2 Q(φ|s φ) I 
= − s+1
n(x, k)P(X i = 1|Si = si )(1 − P(X i = 1|Si = si )),
∂αi2 k=0 x

i = 1, 2, . . . , I ;


∂ 2 Q φ|s φ 
I 
=− s+1 n(x, k)P(X = 1|S = s )(1 − P(X = 1|S = s ))s exp(β ),
i i i i i i i i
∂αi ∂βi x k=0

i = 1, 2, . . . , I ;


∂ 2 Q φ|s φ 
I 
= s+1 n(x, k)
∂βi2 k=0 x
{xi − P(X i = 1|Si = si ) − P(X i = 1|Si = si )(1 − P(X i = 1|Si = si ))si exp(βi )}si exp(βi ),

i = 1, 2, . . . , I ;

∂ 2 Q(φ|s φ) ∂ 2 Q(φ|s φ) ∂ 2 Q(φ|s φ)


= = = 0, i = j.
∂αi ∂α j ∂αi ∂β j ∂βi ∂β j

From the above results, the Hessian matrix H is set as follows:

⎛  ⎞
∂ 2 Q(φ|s φ) ∂ 2 Q(φ|s φ)
H= ⎝  2∂αi ∂αsj   2∂αi ∂βsi  ⎠. (3.12)
∂ Q(φ| φ) ∂ Q(φ| φ)
∂βi ∂αi ∂βi ∂β j

Let φ (α,β) = ((αi ), (βi )) and let t th iterative value of φ (α,β) be φ (α,β)t =
((αit ), (βit )), where φ (α,β)1 = ((s αi ), (s βi )), Then, φ (α,β)t+1 is obtained as follows:
3.2 Latent Distance Analysis 53

φ (α,β)t+1 = φ (α,β)t − H −1
t g t , t = 1, 2, . . . , (3.13)

where H t and g t are

values
of the gradient vector (3.11) and the Hessian matrix
(3.12) at φ = φ t = s+1 vk , φ (α,β)t , respectively. From this algorithm, we can get



limt→∞ φ (α,β)t = s+1 αi , s+1 βi .

Remark 3.2 The Newton–Raphson method for obtaining the estimates s+1 αi and
s+1
βi in the M-step makes a quick convergence of sequence φ (α,β)t within several
iterations.

Remark 3.3 Without constraints in (3.3), the latent distance model is a latent class
model with the following equality constraints:

P(X i = 1|0) = P(X i = 1|1) = · · · P(X i = 1|i − 1)(= π Li ),
i = 1, 2, . . . , I.
P(X i = 1|i) = P(X i = 1|i + 1) = · · · P(X i = 1|I )(= π H i ),

Then, the EM algorithm can be applied for estimating the parameters. Let
φ = ((va ), (π Li ), (π H i ))T be the parameters to be estimated and let s φ =
((s vk ), (s π Li ), (s π H i ))T be the estimates of the parameters at the s th iteration. Then,
the EM algorithm is given as follows:
EM algorithm II
(i) E-step
I
s
vk s
P(X i = xi |a)
s+1
n(x, k) = n(x)  I i=1
I , k = 0, 1, 2, . . . , I, (3.14)
m=0 vm = xi |b)
s s P(X
i=1 i

where from (3.7)


s
π Li (a < i)
s
P(X i = 1|k) = , k = 0, 1, 2, . . . , I.
s
π H i (a ≥ i)

(ii) M-step
 s+1
 s+1
n(x, k) n(x, k)
s+1
vk = x
= x
, k = 0, 1, 2, . . . , I ;
λ N

1   s+1
I
s+1
π̂ H i = n(x, k)xi , i = 1, 2, . . . , I ; (3.15)
N k=i x

1   s+1
i−1
s+1
π̂ Li = n(x, k)xi , i = 1, 2, . . . , I. (3.16)
N k=0 x
54 3 Latent Class Analysis with Ordered Latent Classes

The algorithm is a proportional fitting one; however, the results do not necessarily
guarantee the inequality constraints in (3.3). 
The data in Table 2.1 is analyzed by using the latent distance model. The data
are from respondents to questionnaire items on role conflict [16], and the positive
responses X i , i = 1, 2, 3, 4 are 171, 108, 111, and 67, respectively. It may be valid
that item 1 is the easiest and item 4 is the most difficult to obtain positive responses,
whereas items 2 and 3 are intermediate. The estimated class proportions and the
positive response probabilities are given in Table 3.2. From a test of the goodness-
of-fit to the data is very good. The assessment of responses in the five latent classes
is
illustrated in Table 3.3, and the results are compared with the response scores
4
1 x i . Assuming a latent continuum in a population, the estimated item response
probabilities in the latent distance model and the five latent classes are illustrated
in Fig. 3.1. As demonstrated in this example, it is significant to grade the respon-
dents with response patterns instead of simple scores, for example, response patterns
(1, 1, 1, 0), (1, 1, 0, 1), (1, 0, 1, 1), and (0, 1, 1, 1) have manifest score 3; however,
they are assigned to latent classes 3, 4, 1, 0, respectively.

Table 3.2 Results of latent distance analysis of the data in Table 2.1
Latent class Proportion Latent positive item response probability
X1 X2 X3 X4
0 0.296 0.324 0.253 0.364 0.136
1 0.344 0.988 0.253 0.364 0.136
2 0.103 0.988 0.940 0.364 0.136
3 0.049 0.988 0.940 0.948 0.136
4 0.208 0.988 0.940 0.948 0.973
G2 = 0.921(d f = 3, P = 0.845)

Table 3.3 Assignment of the manifest responses to the extracted latent classes (latent distance
analysis of data set in Table 2.1)
Response pattern Scorea Latent class Response pattern Score Latent class
0000 0 0 0001 1 0
1000 1 1 1001 2 1
0100 1 0 0101 2 0
1100 2 2 1101 3 4
0010 1 0 0011 2 0
1010 2 1 1011 3 1
0110 2 0 0111 3 0
1110 3 3 1111 4 4
a Scores imply the sums of the positive responses
3.2 Latent Distance Analysis 55

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1 Class 4
Class 0 Class 1 Class 2 Class 3
0
0.02
0.06
0.1
0.14
0.18
0.22
0.26
0.3
0.34
0.38
0.42
0.46
0.5
0.54
0.58
0.62
0.66
0.7
0.74
0.78
0.82
0.86
0.9
0.94
0.98
X1 X2 X3 X4

Fig. 3.1 Graph of the estimated latent distance model for data set in Table 2.1

In order to assess the goodness-of-fit of the latent distance model, that is, the
explanatory power, the entropy approach with (2.12) and (2.13) is applied to the
above analysis. Comparing Tables 2.10, 2.11, and 3.4, the goodness-of-fit of the
latent distance model is better than the other models. From Table 3.4, 70% of the
entropy of response variable vector X = (X 1 , X 2 , X 3 , X 4 )T are explained by the
five ordered latent classes in the latent distance model, and item 1 (X 1 ) is associated
with the latent variable stronger than the other manifest variables.
Data in Tables 2.5 (McHugh’s data) and those in Table 2.7 (Lazarsfel-Stouffer’s
data) are also analyzed with the latent distance model. The first data were obtained
from four test items on creative ability in machine design [12], and the second data
were from noncommissioned officers that were cross-classified with respect to their
dichotomous responses, “favorable” and “unfavorable” toward the Army for each of
the four different items on general attitude toward the Army [13]. Before analyzing
the data sets, the marginal frequencies of positive response to items are given in Table
3.5. Considering the marginal positive response frequencies, in McHugh’s data set it
is natural to think there are no orders in difficulty with respect to item responses X i ;
whereas in Lazarsfel-Stouffer’s data set (Table 2.7) it may be appropriate to assume
the difficulty order in the item responses, i.e., the skill acquisition order

S1 ≺ S2 ≺ S3 ≺ S4 .

Table 3.4 Assessment of the latent distance model for the Stouffer-Toby data
Manifest variable X1 X2 X3 X4 Total
KL 0.718 0.606 0.386 0.625 2.335
ECD 0.418 0.377 0.278 0.385 0.700
56 3 Latent Class Analysis with Ordered Latent Classes

Table 3.5 Marginal positive response frequencies of McHugh’s and Lazarsfel-Stouffer’s data
Data set Marginal positive response frequency
X1 X2 X3 X4
McHugh’s data 65 75 78 73
Lazarsfeld-Stouffer’s data 359 626 700 736

The results of latent distance analysis of Lazarsfel-Stouffer’s data set are given in
Table 3.6 and the estimated model is illustrated in Fig. 3.2. The goodness-of-fit of the
model to the data set is not statistically significant at the level of significance 0.05,
and comparing the results with those in Table 2.8 or Table 2.9, the latter is better to
explain the data set. Figure 3.2 demonstrates the estimated latent distance model, and
the entropy-based assessment of the latent distance model is illustrated in Table 3.7.
The Guttman scaling is an efficient method to grade subjects with their response
patterns; however, in the practical observation or experiments, we have to take their

Table 3.6 Results of latent distance analysis of the data set in Table 2.7
Latent class Proportion Latent positive item response probability
X1 X2 X3 X4
0 0.388 0.027 0.366 0.445 0.498
1 0.030 0.569 0.366 0.445 0.498
2 0.038 0.569 0.813 0.445 0.498
3 0.031 0.569 0.813 0.914 0.498
4 0.513 0.569 0.813 0.914 0.981
G2 = 6.298(d f = 3, P = 0.098)

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3 Class 3
0.2 Class 2
Class 1 Class 4
0.1 Class 0
0
0.02
0.06
0.1
0.14
0.18
0.22
0.26
0.3
0.34
0.38
0.42
0.46
0.5
0.54
0.58
0.62
0.66
0.7
0.74
0.78
0.82
0.86
0.9
0.94
0.98

X1 X2 X3 X4

Fig. 3.2 Graph of the estimated latent distance model for data set in Table 2.7
3.2 Latent Distance Analysis 57

Table 3.7 Assessment of the latent distance model for Lazarsfel-Stouffer’s Data
Manifest variable X1 X2 X3 X4 Total
KL 0.496 0.219 0.300 0.478 1.493
ECD 0.332 0.179 0.231 0.324 0.599

response errors into account. In this respect, the latent distance model provides a
good approach to deal with the method. The approach is referred to as the latent
Guttman scaling in this book. In applying latent distance models to data sets, the
contents of items to be used have to be considered beforehand.

Remark 3.4 Setting initial estimates of π Li and π H i satisfying the constraints in


(3.3), if the estimates in the latent distance model by EM algorithm II satisfy the
same constraints, they are the same as those by EM algorithm I.

3.3 Assessment of the Latent Guttman Scaling

Let X 1 , X 2 , . . . , X I be manifest variables that make the latent Guttman scaling. As


in the previous discussion, the manifest variables are observed to assess the latent
continuous trait θ that is distributed according to the uniform distribution on interval
[0, 1]. In the latent distance model, we have I + 1 ordered latent classes, and it
is significant to assess the latent classes with scores on the interval, that is, locate
them on the interval. The assessment of the trait by using the latent Guttman scale
depends on the items to be employed, that is, the latent distance model, and so it is
meaningful to discuss the information about trait θ that the model has [8]. The amount
of information implies the model performance, that is, goodness of the scaling. Let


i−1
θ(0) = 0, θ(i) = vk , i = 1, 2, . . . , I + 1. (3.17)
k=0

Then, θ(i) are interpreted as the threshold for positively or successfully responding
to item i, that is,X i = 1, i = 1, 2, . . . , I . Let us assign the following scores to latent
classes i:


ti θ ∈ θ(i) , θ(i+1)
,i = 0, 1,
 . . . , I − 1
T (θ ) = . (3.18)
tI θ ∈ θ(I ) , 1

The information ratio about latent trait θ , which score T (θ ) has, is defined by

Cov(T (θ ), θ )2
K(T (θ )|θ ) ≡ corrr(T (θ ), θ )2 = . (3.19)
Var(T (θ ))Var(θ )
58 3 Latent Class Analysis with Ordered Latent Classes

From the above definition, we have

0 < K(T (θ )|θ ) < 1.

Since θ is uniformly distributed on interval [0, 1], we have

1 1
E(θ ) = , var(θ ) = .
2 12
From (3.18), we also get


I 
I


E(T (θ )) = tk vk = tk θ(k+1) − θ(k) , (3.20)
k=0 k=0


I


Var(T (θ )) = tk2 θ(k+1) − θ(k) − E(T (θ ))2 , (3.21)
k=0
 I 
1 
2
Cov(T (θ ), θ ) = tk θ(k+1) − θ(k) − E(T (θ )) .
2
(3.22)
2 k=0

By using the above results, we obtain


   2
I
k=0 tk θ(k+1) − θ(k) − E(T (θ ))
2 2
12Cov(T (θ ), θ )2 3
K(T (θ )|θ) = = .
Var(T (θ )) Var(T (θ ))
(3.23)

The amount of information about latent trait θ that the manifest variables have is
defined by

K(X 1 , X 2 , . . . , X I |θ) = max K(T (θ )|θ), (3.24)


T (θ)∈F

where F is the class of functions defined by (3.18). We have the following theorem:

Theorem 3.1 Let θ be uniformly distributed on interval [0, 1], and let function T (θ )
be defined by (3.18). Then, K(T (θ )|θ ) is maximized by

θi+1 + θi
ti = a + b, i = 0, 1, 2, . . . , I, (3.25)
2

where a and b are constant, and it follows that


I
K(X 1 , X 2 , . . . , X I |θ ) = 3 θi+1 θi (θi+1 − θi ). (3.26)
i=0
3.3 Assessment of the Latent Guttman Scaling 59

Proof In order to maximize (3.23) with respect to T (θ ), the following constraints


are imposed on the function, that is, normalization:


I


E(T (θ )) = ti θ(i+1) − θ(i) = 0, (3.27)
a=0


I


Var(T (θ )) = ti2 θ(i+1) − θ(i) − E(T (θ ))2 = 1. (3.28)
i=0

From (3.27), we have


I


Var(T (θ )) = ti2 θ(i+1) − θ(i) = 1. (3.29)
i=0

From constraints (3.27) and (3.28), it follows that


 2

I

2
K(T (θ )|θ ) = 3 ti θ(i+1) − θ(i)
2
.
i=0

In order to make the maximization of the above function with respect to scores
ti , it is sufficient to maximize the following one:


I

2
ti θ(i+1) − θ(i)
2
.
i=0

For Lagrange multipliers λ and μ, the following Lagrange function is made:


I

2 
I


I


g= ti θ(i+1) − θ(i)
2
−λ ti θ(i+1) − θ(i) − μ ti2 θ(i+1) − θ(i) .
i=0 i=0 a=0

Differentiating the above function with respect to ti , we have


I

2 
I


I


g= ti θ(i+1) − θ(i)
2
−λ ti θ(i+1) − θ(i) − μ ti2 θ(i+1) − θ(i) . (3.30)
i=0 i=0 a=0

From this,



θ(i+1) − θ(i) θ(i+1) + θ(i) − λ − 2μti = 0.

Since θ(i+1) − θ(i) = 0, we have


60 3 Latent Class Analysis with Ordered Latent Classes

θ(i+1) + θ(i) − λ − 2μti = 0. (3.31)

Summing up both sides of (3.30) with respect to i = 0, 1, 2, . . . , I , it follows that


I


1 − λ − 2μ ti θ(i+1) − θ(i) = 0. (3.32)
i=0

From (3.27), we have

λ = 1,

and from (3.31) we get

θ(i+1) + θ(i) − 1
ti = , i = 0, 1, 2, . . . , I. (3.33)

Multiplying (3.30) by ti and summing up both sides of the equations with respect
to i = 0, 1, 2, . . . , I , we have


I

2 
I


I


ti θ(i+1) − θ(i) − λ
2
ti θ(i+1) − θ(i) − 2μ ti2 θ(i+1) − θ(i) = 0.
i=0 i=0 i=0

From (3.27) and (3.29), it follows that


I

2
ti θ(i+1) − θ(i)
2
− 2μ = 0.
i=0

From the above equation, we have


I  
i=0 ti θ 2
(i+1) − θ 2
(i)
μ= . (3.34)
2
From (3.33) and (3.34), we get
I
 2 
I

θ + θ − 1 θ − θ 2
i=0 θ(i+1) θ(i) θ(i+1) − θ(i)
i=0 (i+1) (i) (i+1) (i)
μ= = .
4μ 4μ

By solving the above equation with respect to μ(> 0), we have




I
i=0 θ(i+1) θ(i) θ(i+1) − θ(i)
μ=
2
3.3 Assessment of the Latent Guttman Scaling 61

and (3.23) is maximized by T (θ ) with (3.33), that is,

K(X 1 , X 2 , . . . , X I |θ) = max K(T (θ )|θ)


T (θ)∈F
⎧
 2  ⎫2
⎨ i=0
I
θ(i+1) + θ(i) − 1 θ(i+1) 2 ⎬
− θ(i)
=3
⎩ 2μ ⎭


I


=3 θ(i+1) θ(i) θ(i+1) − θ(i) . (3.35)
i=0

Since K(T (θ )|θ ) (3.19) is the square of the correlation coefficient between T (θ )
and θ , hence the theorem follows. 
From Theorem 3.1, we set

θ(i+1) + θ(i)
ti = , i = 0, 1, 2, . . . , I. (3.36)
2
The above discussion is applied to the latent distance models estimated in
Tables 3.2 and 3.6. For Table 3.2, we have

θ(0) = 0, θ(1) = 0.296, θ(2) = 0.640, θ(3) = 0.743, θ(4) = 0.792, θ(4) = 1,

and from (3.33) it follows that

K(X 1 , X 2 , X 3 , X 4 |θ) = 0.923.

Similarly, for Table 3.6, we obtain

K(X 1 , X 2 , X 3 , X 4 |θ) = 0.806.

From the results, the latent Guttman scaling in Table 3.2 is better than that in
Table 3.6.
The following theorem gives the maximization of (3.24) with respect to θ(i) , i =
0, 1, 2, . . . , I .

Theorem 3.2 The amount of information about latent trait θ , K(X 1 , X 2 , . . . , X I |θ ),


is maximized with respect to θ(i) , i = 0, 1, 2, . . . , I by

i
θ(i) = , i = 0, 1, 2, . . . , I
I +1

and then, it follows that


62 3 Latent Class Analysis with Ordered Latent Classes

I (I + 2)
max K(X 1 , X 2 , . . . , X I |θ) = . (3.37)
(θ(a) ) (I + 1)2

Proof Differentiating K(X 1 , X 2 , . . . , X I |θ ) with respect to θ(i) , we have




K(X 1 , X 2 , . . . , X I |θ ) = 3 θ(i+1) − θ(i−1) θ(i+1) + θ(i−1) − 2θ(i) = 0.
∂θ(i)

Since θ(i+1) = θ(i−1) , we obtain

θ(i+1) + θ(i−1) − 2θ(i) = 0.

Therefore, it follows that

i
θ(i) = , i = 0, 1, 2, . . . , I + 1. (3.38)
I +1

From this, we get (3.35) and the theorem follows. 


By using the above theorem, the efficiency of the latent Guttman scaling can be
defined by

K(X 1 , X 2 , . . . , X I |θ )
e f f iciency = . (3.39)
max K(X 1 , X 2 , . . . , X I |θ)
( (a) )
θ

The efficiencies of the latent distance models in Tables 3.2 and 3.6 are calculated,
respectively, as 0.962 and 0.840.
Remark 3.5 The efficiency of the latent Guttman scaling may also be measured
with entropy. Let p = ( p1 , , p2 , , . . . , p A ) be any probability distribution. Then, the
entropy is defined by


A
H ( p) = − pa log pa .
a=1

1 The maximum of the above entropy is logA for the uniform distribution q =
,
A A
1
, . . . , 1
A
. The result is the same as that in Theorem 3.2. Then, the efficiency
of distribution p can be defined by

H ( p)
e f f iciency = .
logA

Applying the above efficiency to the latent distance models estimated in Tables 3.2
and 3.6, we have 0.892 and 0.650, respectively. In the sense of entropy, the latent
Guttman scaling in Table 3.2 is better than that in Table 3.6 as illustrated above by
using (3.39).
3.4 Analysis of the Association Between Two Latent Traits … 63

3.4 Analysis of the Association Between Two Latent Traits


with Latent Guttman Scaling

The latent distance model discussed in Sect. 3.2 is extended to a multidimensional


version to measure the association between latent traits [8]. Let X ki be binary
items for measuring the acquisition of skills Ski , i = 1, 2, . . . , Ik , k = 1, 2 for
hierarchically assessing continuous latent traits θk , k = 1, 2, which are ordered as
Sk1 ≺ Sk2 ≺ . . . Sk Ik by the difficulty of the skill
acquisition in trait θ k , k = 1, 2.

simplicity of the notations, let us set X k = X k1 , X k2 , . . . , X k Ik and Sk =


For
Sk1 , Sk2 , . . . , Sk Ik , k = 1, 2. In this setting, as in the previous section,
 
1 (success) 1 (acquition)
X ki = , Ski = , i = 1, 2, . . . , Ik ; k = 1, 2.
0 ( f ailur e) 0 (nonacquisition)

In this setup, the skills Sk1 ≺ Sk2 ≺ . . . Sk Ik constitute the latent Guttman scaling.
Let θk(a) be thresholds for skills Ska , a = 0, 1, 2, . . . , Ik + 1; k = 1, 2, and then, we
set


vmn = P θ1(m) ≤ θ1 < θ1(m+1) , θ2(n) ≤ θ2 < θ2(n+1) ,

m = 0, 1, 2, . . . , I1 ; n = 0, 1, 2, . . . , I2 .


Then, putting sk = sk1 , sk2 , . . . , sk Ik , k = 1, 2, and


I1 
I2
m= s1i , n = s2i ,
i=0 i=0

the model is given by


I1 
I2
P((X 1 , X 2 ) = (x 1 , x 2 )) = vmn P((X 1 , X 2 ) = (x 1 , x 2 )|(S1 , S2 ) = (s1 , s2 )),
k=0 l=0

where

P((X 1 , X 2 ) = (x 1 , x 2 )|(S1 , S2 ) = (s1 , s2 ))


2  Ik
exp{xki (αki + ski exp(βki ))}
= .
k=1 i=1
1 + exp(αki + ski exp(βki ))

According to the model, the joint levels of traits θk of individuals can be scaled,
and the association between the traits can also be assessed. Let Tk (θk ), k = 1, 2 be
64 3 Latent Class Analysis with Ordered Latent Classes

functions of scores for latent traits θk , which are made by (3.18) and (3.36). Then, the
correlation coefficient between scores Tk (θk ), k = 1, 2, Corr(T1 (θ1 ), T2 (θ2 )) is used
for measuring the association between traits θ1 and θ2 , because Corr(θ1 , θ2 ) cannot
be calculated. If θ1 and θ2 are statistically independent, T1 (θ1 ) and T2 (θ2 ) are also
independent, and then, we have Corr(T1 (θ1 ), T2 (θ2 )) = 0.
The above model is applied to data in Table 3.8, which were obtained from 145
children from 1 to 5 years old. Latent trait θ1 and θ2 implied the general intelligence
and the verbal ability of children, respectively, and these abilities are measured with
three manifest binary variables ordered as X ki , i = 1, 2, 3; k = 1, 2, respectively.
The parameter can be estimated via the EM algorithm as in the previous section. The
estimated latent probabilities are illustrated in Table 3.9, and the responses to the
manifest variables X ki are demonstrated in Figs. 3.3 and 3.4. From Fig. 3.3, we have

K(X 1 , X 2 , X 3 |θ1 ) = 0.840, e f f iciency = 0.896.

Similarly, for Table Fig. 3.4, it follows that

K(X 1 , X 2 , X 3 |θ2 ) = 0.898, e f f iciency = 0.958.



In this data, the mean densities of domains θ1(m) , θ1(m+1) × [θ2(n) .θ2(n+1) ) are
calculated as
vmn


, m, n = 0, 1, 23.
θ1(m+1) − θ1(m) × θ2(n+1) − θ2(n)

The densities are illustrated in Fig. 3.5, and the association between traits.θ1 and
θ2 is summarized. It seems the association is positive, and in effect we obtain estimate
"

Corr (T1 (θ1 ), T2 (θ2 )) = 0.780. From this, the association between the two latent traits
is strong. The respondents shown in Table 3.8 are assigned to latent classes in Table
3.10, and it implies an assessment of respondents’ grades in the latent traits.
In this section, two-dimensional latent continuous traits are discretized, and
ordering of latent classes can be carried out in each latent trait; however, it may
be useful to grade all the latent classes with a method, because without it, for latent
classes (i. j), i = 0, 1, 2, 3; j = 0, 1, 2, 3 we may simply employ scores i + j
to grade the latent classes. In Sect. 4.10 in Chap. 4, an entropy-based method to
order latent classes is discussed, and grading (ordering) of the above latent classes
(i. j), i = 0, 1, 2, 3; j = 0, 1, 2, 3 (Table 3.9) will be treated as an example.

3.5 Latent Ordered-Class Analysis

In analyzing Stouffer-Toby’s data (Table 2.1), latent class cluster analysis and latent
distance analysis have been used. From the results in Tables 2.3 and 2.11, it is
appropriate to assume there exist ordered latent classes that explain behavior in the
Table 3.8 Data on the general intelligence ability and the verbal ability from 145 pupils
θ1 θ2 θ1 θ2 θ1 θ2
X 11 X 12 X 13 X 21 X 22 X 23 Freq X 11 X 12 X 13 X 21 X 22 X 23 Freq X 11 X 12 X 13 X 21 X 22 X 23 Freq
0 0 0 0 0 0 13 0 1 1 0 1 0 0 0 0 1 1 0 1 0
1 0 0 0 0 0 5 1 1 1 0 1 0 1 1 0 1 1 0 1 2
0 1 0 0 0 0 1 0 0 0 1 1 0 1 0 1 1 1 0 1 0
1 1 0 0 0 0 2 1 0 0 1 1 0 7 1 1 1 1 0 1 5
0 0 1 0 0 0 1 0 1 0 1 1 0 3 0 0 0 0 1 1 0
1 0 1 0 0 0 2 1 1 0 1 1 0 5 1 0 0 0 1 1 0
3.5 Latent Ordered-Class Analysis

0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 1 1 0
1 1 1 0 0 0 2 1 0 1 1 1 0 2 1 1 0 0 1 1 1
0 0 0 1 0 0 9 0 1 1 1 1 0 2 0 0 1 0 1 1 0
1 0 0 1 0 0 3 1 1 1 1 1 0 11 1 0 1 0 1 1 0
0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 1
1 1 0 1 0 0 2 1 0 0 0 0 1 0 1 1 1 0 1 1 1
0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0
1 0 1 1 0 0 2 1 1 0 0 0 1 0 1 0 0 1 1 1 1
0 1 1 1 0 0 1 0 0 1 0 0 1 1 0 1 0 1 1 1 0
(continued)
65
Table 3.8 (continued)
66

θ1 θ2 θ1 θ2 θ1 θ2
X 11 X 12 X 13 X 21 X 22 X 23 Freq X 11 X 12 X 13 X 21 X 22 X 23 Freq X 11 X 12 X 13 X 21 X 22 X 23 Freq
1 1 1 1 0 0 4 1 0 1 0 0 1 0 1 1 0 1 1 1 3
0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 1 1 1 0
1 0 0 0 1 0 1 1 1 1 0 0 1 0 1 0 1 1 1 1 2
0 1 0 0 1 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1 3
1 1 0 0 1 0 1 1 0 0 1 0 1 1 1 1 1 1 1 1 38
0 0 1 0 1 0 0 0 1 0 1 0 1 0
1 0 1 0 1 0 2 1 1 0 1 0 1 1
Source Eshima [8]
3 Latent Class Analysis with Ordered Latent Classes
3.5 Latent Ordered-Class Analysis 67

Table 3.9 The estimated latent probabilities (parameters)


Latent class (m, n) Class proportion Positive response probabilities to items
X 11 X 12 X 13 X 21 X 22 X 23
(0,0) 0.134 0.120 0.060 0.154 0.073 0.113 0.031
(1,0) 0.051 0.894 0.060 0.154 0.073 0.113 0.031
(2,0) 0.020 0.864 0.914 0.154 0.073 0.113 0.031
(3,0) 0.013 0.894 0.914 0.947 0.073 0.113 0.031
(0,1) 0.068 0.120 0.060 0.154 0.931 0.113 0.031
(1,1) 0.020 0.894 0.060 0.154 0.931 0.113 0.031
(2,1) 0.002 0.894 0.914 0.154 0.931 0.113 0.031
(3,1) 0.028 0.894 0.914 0.947 0.931 0.113 0.031
(0,2) 0.000 0.120 0.060 0.154 0.931 0.856 0.031
(1,2) 0.071 0.894 0.060 0.154 0.931 0.856 0.031
(2,2) 0.080 0.894 0.914 0.154 0.931 0.856 0.031
(3,2) 0.090 0.894 0.914 0.947 0.931 0.856 0.031
(0,3) 0.000 0.120 0.060 0.154 0.931 0.856 0.936
(1,3) 0.008 0.894 0.060 0.154 0.931 0.856 0.936
(2,3) 0.019 0.894 0.914 0.154 0.931 0.856 0.936
(3,3) 0.396 0.894 0.914 0.947 0.931 0.856 0.936
loglikelohood ratio statistic = 39.507, d f = 39, p = 0.447

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
θ1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95

X11 X12 X13

Fig. 3.3 The estimated response probabilities to X 11 , X 12 , and X 13 for measuring θ1


68 3 Latent Class Analysis with Ordered Latent Classes

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

θ2
0
0.04
0.08
0.12
0.16
0.2
0.24
0.28
0.32
0.36
0.4
0.44
0.48
0.52
0.56
0.6
0.64
0.68
0.72
0.76
0.8
0.84
0.88
0.92
0.96
X21 X22 X23

Fig. 3.4 The estimated response probabilities to X 21 , X 22 , and X 23 for measuring θ2

0.4

0.3

θ2
0.2
0.7885
0.1 0.4565

0 0.277
0.101
0.277
0.4125 0.109
0.7365 θ
1

0-0.1 0.1-0.2 0.2-0.3 0.3-0.4

Fig. 3.5 Summaries of mean densities between traits θ1 and θ2

data set, that is, role conflict. The results in Table 2.3 are those based on a latent class
cluster model and the analysis is an exploratory latent class analysis, and on the other
hand, the results in Table 3.2 are those by a confirmatory analysis. In the role conflict
for Stouffer-Toby’s data set, it may be suitable to assume ordered latent classes that are
located in a latent continuum. For the data set, the number of latent classes in the latent
class cluster model is less than and equal to three according to the condition of model
Table 3.10 Assignment of the manifest responses to the extracted latent classes based on Table 3.8
θ1 θ2 θ1 θ2 θ1 θ2
X 11 X 12 X 13 X 21 X 22 X 23 LCa X 11 X 12 X 13 X 21 X 22 X 23 LC X 11 X 12 X 13 X 21 X 22 X 23 LC
0 0 0 0 0 0 (0,0) 0 1 1 0 1 0 (3,2) 0 0 1 1 0 1 (3,3)
1 0 0 0 0 0 (1,0) 1 1 1 0 1 0 (3,2) 1 0 1 1 0 1 (3,3)
0 1 0 0 0 0 (0,0) 0 0 0 1 1 0 (0,1) 0 1 1 1 0 1 (3,3)
1 1 0 0 0 0 (2,0) 1 0 0 1 1 0 (1,2) 1 1 1 1 0 1 (3,3)
0 0 1 0 0 0 (0,0) 0 1 0 1 1 0 (2,2) 0 0 0 0 1 1 (0,0)
1 0 1 0 0 0 (1,0) 1 1 0 1 1 0 (2,2) 1 0 0 0 1 1 (1,3)
3.5 Latent Ordered-Class Analysis

0 1 1 0 0 0 (3,0) 0 0 1 1 1 0 (0,1) 0 1 0 0 1 1 (3,3)


1 1 1 0 0 0 (3,0) 1 0 1 1 1 0 (1,2) 1 1 0 0 1 1 (3,3)
0 0 0 1 0 0 (0,1) 0 1 1 1 1 0 (3,2) 0 0 1 0 1 1 (3,3)
1 0 0 1 0 0 (1,1) 1 1 1 1 1 0 (3,2) 1 0 1 0 1 1 (3,3)
0 1 0 1 0 0 (0,1) 0 0 0 0 0 1 (0,0) 0 1 1 0 1 1 (3,3)
1 1 0 1 0 0 (2,2) 1 0 0 0 0 1 (1,0) 1 1 1 0 1 1 (3,3)
0 0 1 1 0 0 (0,1) 0 1 0 0 0 1 (0,0) 0 0 0 1 1 1 (1,3)
1 0 1 1 0 0 (1,1) 1 1 0 0 0 1 (2,0) 1 0 0 1 1 1 (1,3)
0 1 1 1 0 0 (3,1) 0 0 1 0 0 1 (0,0) 0 1 0 1 1 1 (3,3)
(continued)
69
Table 3.10 (continued)
70

θ1 θ2 θ1 θ2 θ1 θ2
X 11 X 12 X 13 X 21 X 22 X 23 LCa X 11 X 12 X 13 X 21 X 22 X 23 LC X 11 X 12 X 13 X 21 X 22 X 23 LC
1 1 1 1 0 0 (3,1) 1 0 1 0 0 1 (3,3) 1 1 0 1 1 1 (3,3)
0 0 0 0 1 0 (0,0) 0 1 1 0 0 1 (3,3) 0 0 1 1 1 1 (3,3)
1 0 0 0 1 0 (1,0) 1 1 1 0 0 1 (3,3) 1 0 1 1 1 1 (3,3)
0 1 0 0 1 0 (0,0) 0 0 0 1 0 1 (0,1) 0 1 1 1 1 1 (3,3)
1 1 0 0 1 0 (2,2) 1 0 0 1 0 1 (1,3) 1 1 1 1 1 1 (3,3)
0 0 1 0 1 0 (0,0) 0 1 0 1 0 1 (3,3)
1 0 1 0 1 0 (1,1) 1 1 0 1 0 1 (3,3)
a LC implies latent class
3 Latent Class Analysis with Ordered Latent Classes
3.5 Latent Ordered-Class Analysis 71

identification, meanwhile, six in the latent distance model. In order to extract ordered
latent classes, it is sensible to make a parsimonious model. Let θa , a = 1, 2, . . . , A
be parameters that express the locations of latent classes in a latent continuum, such
that θa < θa+1 , a = 1, 2, . . . , A−1, and let πi (θa ), i = 1, 2, . . . , I ; a = 1, 2, . . . , A
be latent positive response probabilities to binary items i in latent classes a, which
satisfy the following inequalities:

πi (θa ) ≤ πi (θa+1 ), a = 1, 2, . . . , A − 1; i = 1, 2, . . . , I. (3.40)

The functions πi (θa ) are specified before analyzing a data set under study, and it is
appropriate that the number of parameters in the model is as small as possible and that
the parameters are easy to interpret. Since the positive response probabilities πi (θa )
are functions of location or trait parameters θa , such models are called structured
latent class model. In this section, the following logistic model is used [7]:

exp(θα − di )
πi (θα ) = , a = 1, 2, . . . A; i = 1, 2, . . . I, (3.41)
1 + exp(θα − di )

where di are item difficulty parameters as in the latent trait model and we set d1 = 0
for model identification. The above model is called the Rasch model [15]. Then,
the constraints (3.40) are held by the above model. The number of parameters to be
estimated is

2 A + I − 1.

Thus, in order to identify the model, we have to keep the following constrain:

1
I
2 A + I − 1 < 2I − 1 ⇔ A < 2 −I , (3.42)
2

because the number of the accounting equations is 2 I − 1, i.e., the number of the
joint probabilities of manifest responses X = (X 1 , X 2 , . . . , X I )T minus one.
Let P(X = x|a) be the joint probability of X = (x1 , x2 , . . . , x I )T for given latent
class a. Then, from (3.41) we have

I  xi  1−xi
exp(θa − di ) 1
P(X = x|a) =
i=1
1 + exp(θa − di ) 1 + exp(θa − di )
I
exp{xi (θa − di )}
= ,
i=1
1 + exp(θa − di )


A I 
A
exp{xi (θa − di )}
P(X = x) = va P(X = x|a) = va .
α=1 α=1 i=1
1 + exp(θa − di )
72 3 Latent Class Analysis with Ordered Latent Classes

In order to estimate the parameters φ = ((va ), (θa ), (di ))T , the EM algorithm is
used and the summary of the algorithm is given as follows:
EM algorithm
(i) E-step
Let s φ = ((s va ), (s θa ), (s di ))T be the estimate of parameter vector φ at the s th
iteration in the EM algorithm. Then, the conditional expectations of complete data
(n(x, a)) given parameters s φ = ((s va ), (s θa ), (s di )) are calculated in the (s + 1) th
iteration as follows:
I s
s
va i=1 P(X i = xi |a)
s+1
n(x, a) = n(x)  I I s , a = 0, 1, 2, . . . , I, (3.43)
sv
b=0 b i=1 P(X i = x i |b)

where
exp{xi (s θa − s di )}
s
P(X i = xi |a) = , xi = 0, 1.
1 + exp(s θa − s di )

(ii) M-step

s+1
As in (3.10), the loglikelihood function of the complete data n(x, a) (3.43) is
given by




Q φ|s φ = l φ| s+1 n(x, a)


A  I
exp{xi (θa − di )}
= s+1
n(x, a)log va
α=1 x i=1
1 + exp(θa − di )


A 
= s+1
n(x, a)logva
α=1 x
 I 

A  
+ s+1
n(x, a) {xi (θa − di ) − log(1 + exp(θa − di ))} . (3.44)
α=1 x i=1

With respect to s+1 va , we have


 s+1
n(x, a)
s+1
va = x
, a = 1, 2, . . . , A;
N

however, for estimating the other parameters θa and di , the Newton-Raphson method
needs to be used in the M-step. The first derivatives of Q(φ|s φ) with respect to θa
and di , respectively, are calculated as follows:
3.5 Latent Ordered-Class Analysis 73

∂ Q(φ|s φ)   s+1
I
= n(x, a)(xi − P(X i = 1|a)), a = 1, 2, . . . , A;
∂θa i=1 x

∂ Q(φ|s φ) A 
=− s+1
n(x, a)(xi − P(X i = 1|a)), i = 2, 3, . . . , I.
∂di α=1 x

Then, the (A + I )-dimensional gradient vector is set as


⎛ ⎞
∂ Q(φ|s φ)
g= ⎝  ∂θas  ⎠. (3.45)
∂ Q(φ| φ)
∂di

Consequently, the second partial derivatives of Q(φ|s φ) are calculated as follows:

∂ 2 Q(φ|s φ)  I
= − s+1
n(x, a) P(X i = 1|a)(1 − P(X i = 1|a))
∂θa2 x i=1


I
= −N · s+1 va P(X i = 1|a)(1 − P(X i = 1|a)),
i=1

a = 1, 2, . . . , A;

∂ 2 Q(φ|s φ)  s+1
= n(x, a)P(X i = 1|a)(1 − P(X i = 1|a))
∂θa ∂di x
= N · s+1 va P(X i = 1|a)(1 − P(X i = 1|a)),

a = 1, 2, . . . , A; i = 1, 2, . . . , I ;

∂ 2 Q(φ|s φ) A 
= − s+1
n(x, a)P(X i = 1|a)(1 − P(X i = 1|a))
∂di2 α=1 x


A
=− N · s+1 va P(X i = 1|a)(1 − P(X i = 1|a)),
α=1

i = 2, 3, . . . , I.

∂ 2 Q(φ|s φ) ∂ 2 Q(φ|s φ)
= = 0, a = b; i = j.
∂θa ∂θb ∂di ∂d j

From the above results, the Hessian matrix H is set as follows:


74 3 Latent Class Analysis with Ordered Latent Classes

Table 3.11 The estimated latent ordered-class model with five latent classes (Stouffer-Toby’s data)
Latent class θ Proportion Latent positive item response probability
X1 X2 X3 X4
1(−1.563) 0.058 0.173 0.037 0.040 0.011
2(1.072) 0.124 0.745 0.348 0.365 0.133
3(1.244) 0.267 0.776 0.388 0.406 0.154
4(1.329) 0.320 0.791 0.408 0.427 0.166
5(4.771) 0.231 0.992 0.956 0.959 0.861
G2 = 1.092(d f = 3, P = 0.779)

⎛  ⎞
∂ 2 Q(φ|s φ) ∂ 2 Q(φ|s φ)
H= ⎝  2∂θa ∂θsb   2∂θa ∂dsi  ⎠. (3.46)
∂ Q(φ| φ) ∂ Q(φ| φ)
∂di ∂θa ∂di ∂d j

Let φ (θ,d) = ((θa ), (di )) and let the t th iterative value of φ (θ,d) be φ (θ,d)t =
((θat ), (dit )), where φ (θ,d)1 = ((s θa ), (s di )). Then, φ (α,β)t+1 is obtained as follows:

φ (θ,d)t+1 = φ (θ,d)t − H −1
t g t , t = 1, 2, . . . ,

where H t and

g t are values
of the gradient vector (3.45) and the Hessian matrix
(3.46) at φ t = s+1 va , φ (θ,d)t . From this algorithm, we can get limt→∞ φ (θ,d)t =

s+1
s+1
θa , di .
The above model is applied to the analysis of Stouffer-Toby’s data in Table 2.1.
Considering the inequality constraint (3.42), first, for the number of ordered latent
classes A = 5, the results are shown in Table 3.11. The goodness-of-fit to the data set
is good as indicated in the table, G 2 = 1.092(d f = 3, P = 0.779). According to the
latent response probabilities to four items in latent classes 2 through 4 (Fig. 3.6), the
latent classes are similar, and second, the latent ordered-class model with three latent
classes is used for analyzing the data. The results are illustrated in Table 3.12, and it
shows the goodness-of-fit to the data is very good, G 2 = 1.092(d f = 7, P = 0.993).
The response probabilities for three latent classes are illustrated in Fig. 3.7. In
Sect. 2.3, the data have been analyzed with usual latent class models with three and
two latent classes. Although the ordered latent classes have been derived as shown in
Tables 2.2 and 2.3, the present models are more parsimonious than the usual models.
The assessment of respondents based on the latent ordered-class analysis in Table
3.12 is demonstrated in Table 3.13.
In this section, latent ordered-class analysis is discussed by using model (3.41);
however, the analysis does not confine us to the model. Structured models can be
made as far as the models are identified. The present approach assumes ordered latent
classes in the population, which can be located in a latent continuum. In this sense, it
is related to item response models that assume latent continuums in the populations.
3.5 Latent Ordered-Class Analysis 75

1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-1.56 1.07 1.24 1.33 4.77

X1 X2 X3 X4

Fig. 3.6 Locations of five ordered latent classes and their positive response probabilities in Table
3.11

Table 3.12 The estimated latent ordered-class model with three latent classes (Stouffer-Toby’s
data)
Latent class Proportion Latent positive item response probability
X1 X2 X3 X4
1(−0.397) 0.150 0.434 0.123 0.131 0.039
2(1.412) 0.647 0.811 0.439 0.458 0.184
3(5.160) 0.203 0.995 0.974 0.975 0.914
G2 = 1.092(d f = 7, P = 0.993)

The explanatory power of latent ordered-class model (3.41) is assessed by the


entropy coefficient determination (ECD) (1.30). In this model, the ECD is calculated
as
I
i=1 Cov(X i , θ )
ECD(X, θ ) = I , (3.47)
1 + i=1 Cov(X i , θ )

where


A
Cov(X i , θ ) = va πi (θa )θa − E(X i )E(θ )
a=1
76 3 Latent Class Analysis with Ordered Latent Classes

1.2

0.8

0.6

0.4

0.2

0
-0.397 1.412 5.160 θ
X1 X2 X3 X4

Fig. 3.7 Location of three latent ordered classes and their positive response probabilities in Table
3.12

Table 3.13 Assignment of the manifest responses to the extracted latent classes (latent ordered-
class analysis of data in Table 2.1)
Response pattern Score Latent class Response pattern Score Latent class
0000 0 1 0001 1 2
1000 1 2 1001 2 2
0100 1 2 0101 2 2
1100 2 2 1101 3 2
0010 1 2 0011 2 2
1010 2 2 1011 3 2
0110 2 2 0111 3 2
1110 3 2 1111 4 3


A 
A
= va πi (θa )θa − P(X i = 1) va θa ,
a=1 a=1

i = 1, 2, . . . , I.

From this, we can obtain ECD (3.47) and ECDs for manifest variables X i are also
given by
3.5 Latent Ordered-Class Analysis 77

Table 3.14 The explained


Manifest variable Cov(X i , θ) ECD
entropy Cov(X i , θ) and
ECDs in the estimated model X1 0.250 0.200
in Table 3.12 X2 0.459 0.315
X3 0.451 0.311
X4 0.528 0.345
Total 1.688 0.628

Cov(X i , θ )
ECD(X i , θ ) = . (3.48)
1 + Cov(X i , θ )

By using (3.47) and (3.48), the estimated model in Table 3.12 is assessed. The
results are demonstrated in Table 3.14. Since ECD(X, θ ) = 0.628, 62.8% of the
uncertainty of manifest variable vector X = (X 1 , X 2 , X 3 , X 4 )T is explained by the
model.
In this section, latent ordered-class model (3.41) has been discussed and the model
has been applied to data in Table 2.1. A more general model is given by

exp(βi (θa − di ))
πi (θa ) = , a = 1, 2, . . . , A; i = 1, 2, . . . , I, (3.49)
1 + exp(βi (θa − di ))

where parameters βi indicate regression parameters as in the two-parameter item


response model. Then, in order to identify the model, for example, we set β1 =
1, d1 = 0, and the following inequality has to be held:

2 I − 2(A + I − 1) > 0.

The ECDs of two-parameter model (3.49) are given by


I
βi Cov(X i , θ )
ECD(X, θ ) = i=1
I ,
1+ i=1 βi Cov(X i , θ )

and
βi Cov(X i , θ )
ECD(X i , θ ) = , i = 1, 2, . . . , I.
1 + βi Cov(X i , θ )
78 3 Latent Class Analysis with Ordered Latent Classes

3.6 The Latent Trait Model (Item Response Model)

In this section, the ML estimation of two-parameter logistic model (1.10) is consid-


ered. As in Sect. 1.3 in Chap. 1, the model is approximated by a latent class model
(1.9). As the positive response probabilities in the latent class model are equiv-
alent to those in (3.49), the discussion below will be made with (3.49), where
βi = Dai , i = 1, 2, . . . , I . In this case, parameters to be estimated are discriminant
parameters βi and item difficulties di , i = 1, 2, . . . , I , whereas the class propor-
tions va are given by the standard normal distribution and latent trait parameters
θa , a = 1, 2, . . . , A are also given for the approximation (1.7). For an appropriate
division of the latent continuum θ given in (1.7), we calculate class proportions va
by (1.8). Then, the latent class model is set as


A 
I 
A I
exp(xi βi (θa − di ))
P(X = x) = va P(X i = xi |θa ) = va ,
α=1 i=1 α=1 i=1
1 + exp(βi (θa − di ))
(3.50)

and the EM algorithm for the ML estimation is given as follows:


EM algorithm
(i) E-step
Let s φ = ((s βi ), (s di ))T be the estimate of parameter vector φ at the s th iteration
in the EM algorithm. Then, the conditional expectations of complete data (n(x, a))
for given parameters s φ = ((s βi ), (s di )) are calculated in the (s + 1) th iteration as
follows:
I s
va i=1 P(X i = xi |θa )
s+1
n(x, a) = n(x)  I I s , a = 0, 1, 2, . . . , I, (3.51)
b=0 va i=1 P(X i = x i |θb )

where
exp(xi s βi (θa − s di ))
s
P(X i = xi |θa ) = , xi = 0, 1.
1 + exp(s βi (θa − s di ))

(ii) M-step

s+1
The log likelihood function of the complete data n(x, a) (3.51) is given by




Q φ|s φ = l φ| s+1 n(x, a)


A  I
exp(xi βi (θa − di ))
= s+1
n(x, a)log va
α=1 x i=1
1 + exp(βi (θa − di ))
3.6 The Latent Trait Model (Item Response Model) 79


A 
= s+1
n(x, a)logva
α=1 x
 I 

A  
+ s+1
n(x, a) {xi βi (θa − di ) − log(1 + exp(βi (θa − di )))} . (3.52)
α=1 x i=1

For estimating the other parameters βi and di , the Newton-Raphson method needs
to be used in the M-step. The first derivatives of Q(φ|s φ) with respect to θa and di ,
respectively, are calculated as follows:

∂ Q(φ|s φ)   s+1
A
= n(x, a)(θa − di )(xi − P(X i = xi |θa )), i = 1, 2, . . . , I ;
∂βi a=1 x
(3.53)

∂ Q(φ|s φ) A 
=− s+1
n(x, a)βi (xi − P(X i = xi |θa )), i = 1, 2, . . . , I.
∂di α=1 x
(3.54)

Then, the 2I -dimensional gradient vector is set as

⎛ ⎞
∂ Q(φ|s φ)
g= ⎝  ∂βi s  ⎠. (3.55)
∂ Q(φ| φ)
∂di

Consequently, the second-order partial derivatives of Q(φ|s φ) are calculated as


follows:

∂ 2 Q(φ|s φ) A 
= − s+1
n(x, a)(θa − di )2 P(X i = 1|a)(1 − P(X i = 1|a)),
∂βi2 a=1 x
(3.56)

a = 1, 2, . . . , A;

∂ 2 Q(φ|s φ) A 
=− s+1
n(x, a)
∂βi ∂di a=1 x
{(xi − P(X i = 1|a)) − (θa − di )βi P(X i = 1|a)(1 − P(X i = 1|a))},
(3.57)

i = 1, 2, . . . , I ;
80 3 Latent Class Analysis with Ordered Latent Classes

∂ 2 Q(φ|s φ) A 
=− s+1
n(x, a)βi2 P(X i = 1|a)(1 − P(X i = 1|a)), (3.58)
∂di2 α=1 x

i = 2, 3, . . . , I.

∂ 2 Q(φ|s φ) ∂ 2 Q(φ|s φ)
= = 0, i = j. (3.59)
∂βi ∂β j ∂di ∂d j

From the above results, the Hessian matrix H is set as follows:


⎛ 2  2 ⎞
∂ Q(φ| φ)
s s
∂ Q(φ| φ)
∂βi ∂β j   ∂βi ∂di  ⎠
H = ⎝  ∂ 2 Q(φ| s
φ) ∂ 2 Q(φ|s φ)
. (3.60)
∂di ∂βi ∂di ∂d j

Let s φ t = ((s βit ), (s dit )) be the t th iterative value of φ, where s φ 1 =


(( βi ), (s di )). Then, s φ t+1 is obtained as follows:
s

s
φ t+1 = s φ t − H −1
t g t , t = 1, 2, . . . ,

where H t and g t are values of the gradient vector (3.55) and the Hessian matrix
(3.60) at s φ t .

Remark 3.6 The expectation of the Hessian matrix is minus the Fisher informa-
tion matrix. Although the Fisher information matrix is positive definite, however,
Hessian matrices (3.60) calculated in the iterations are not necessarily guaranteed to
be negative definite. Since in latent class a,

E{X i − P(X i = 1|a)|a} = 0,

# $
E (X i − P(X i = 1|a))2 |a = P(X i = 1|a)(1 − P(X i = 1|a)), i = 1, 2, . . . , I,

for large samples, we can use the following approximation of (3.57):

∂ 2 Q(φ|s φ)   s+1
A
≈ n(x, a)(θa − di )βi P(X i = 1|a)(1 − P(X i = 1|a)),
∂βi ∂di a=1 x
(3.61)

i = 1, 2, . . . , I ;

Then, the Hessian matrix is always negative definite. 


First, latent class model (3.50) is applied to estimate the latent trait model by using
Stouffer-Toby’s data in Table 2.1. Before analyzing the data, latent continuous trait
3.6 The Latent Trait Model (Item Response Model) 81

Table 3.15 Upper limits of latent classes θ(a) , class values θa and latent class proportions va
a 1 2 3 4 5 6 7 8 9 10
θ(a) −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 +∞
θa −2.5 −1.75 −1.25 −0.75 −0.25 0.25 0.75 1.25 1.75 2.5
va 0.023 0.044 0.092 0.150 0.191 0.191 0.150 0.092 0.044 0.023

Table 3.16 The estimated parameters in latent trait model (3.50) from the Stouffer-Toby’s data
Manifest variable X1 X2 X3 X4
βi 1.128 1.559 1.330 2.076
di −1.471 −0.006 −0.061 0.643
G 2 = 8.570, (d f = 7, P = 0.285)

θ is divided as in (1.7). In order to demonstrate the above method, latent trait θ is


divided into ten intervals (latent classes):



−∞ = θ(0) < θ(1) < θ(2) < · · · < θ(9) < +∞ = θ(10) , (3.62)

and the class values

θ(a−1) < θa ≤ θ(a) , a = 1, 2, . . . , 10

are set. The values and the latent class proportions va calculated with the standard
normal distribution (1.8) are illustrated in Table 3.15. These values are fixed in the
estimation procedure. In the estimation procedure, (3.61) is employed for (3.57), and
the estimated parameters are given in Table 3.16. According to the test of goodness-
of-fit, latent trait model (3.58) fairly fits the data set and it is reasonable to assume
a latent continuous trait to respond to the four test items. The graphs of the item
response functions are illustrated in Fig. 3.8. The assessment of test items (manifest
variables) as indicators of the latent trait is shown in Table 3.17. In order to estimate
latent trait θ of an individual with response vector x, the following method is used.
Let f (x, θ ) be the joint probability function of x = (x1 , x2 , x3 , x4 ) and θ . The
estimate is given by θmax such that f (x, θmax ) = max f (x, θ ). Since θ is distributed
θ
according to the standard normal distribution ϕ(θ ), from (1.6) we have

1 θ2 I
exp(xi βi (θ − di ))
log f (x, θ ) = − log2π − + log
2 2 i=1
1 + exp(βi (θ − di ))

θ2  
I I
1
= − log2π − + xi βi (θ − di ) + log(1 + exp(βi (θ − di ))).
2 2 i=1 i=1
82 3 Latent Class Analysis with Ordered Latent Classes

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3

X1 X2 X3 X4

Fig. 3.8 Graph of latent trait model (3.50) estimated from Stouffer-Toby’s data (Table 2.1)

Table 3.17 The explained


Manifest variable βi Cov(X i , θ) ECD
entropy Cov(X i , θ) and
ECDs in the estimated model X1 0.155 0.134
in Table 3.15 X2 0.271 0.213
X3 0.248 0.199
X4 0.271 0.213
Total 0.946 0.486

In order to maximize the above function with respect to θ , differentiating the


above function with respect to θ and setting it to zero, we obtain

d  I
log f (x, θ ) = −θ + βi (xi − P(X i = 1|θ )) = 0. (3.63)
dθ i=1

By solving the above equation with respect to θ , we can get the estimate of latent
trait θ of a respondent with manifest response vector x = (x1 , x2 , x3 , x4 ) (Table
3.18).

Remark 3.7 In order to solve Eq. (3.63), the following Newton–Raphson method is
employed. Let θ (m) be the estimate of θ in the m th iteration. Then, the algorithm for
obtaining a solution in (3.63) is given by


(m+1) (m)
d
log f x, θ (m)
θ =θ − dθ
2
, m = 0, 1, 2, . . . ,
d
dθ 2
log f x, θ (m)

where
3.6 The Latent Trait Model (Item Response Model) 83

Table 3.18 Assessment of respondents by using the estimated latent trait model (Table 3.15)
Response pattern θa Response pattern θa
0000 −1.097 0001 −0.320
1000 −0.515 1001 −0.150
0100 −0.836 0101 0.278
1100 0.289 1101 0.384
0010 −0.313 0011 0.051
1010 0.061 1011 0.157
0110 −0.464 0111 0.585
1110 0.658 1111 0.900

Table 3.19 The estimated parameters in latent trait model (3.50) from the Lazarsfeld-Stouffer’s
data
Manifest variable X1 X2 X3 X4
βi 1.672 1.099 1.391 1.577
di 0.525 −0.586 −0.831 −0.983
G 2 = 7.515, (d f = 7, P = 0.377)

d2  I

2
log f (x, θ ) = −1 − βi2 P(X i = 1|θ )(1 − P(X i = 1|θ )).
dθ i=1

Second, McHugh’s data in Table 2.5 are analyzed with model (3.50). The log like-
lihood ratio test statistic G 2 = 22.011(d f = 7, P = 0.003) is obtained, and thus, the
model fitness to the data set is bad. It may be concluded that there is no latent contin-
uous trait distributed according to the standard normal distribution or the latent trait
space not one-dimensional. Finally, Lazarsfel-Stouffer’s data (Table 2.7) are analyzed
with model (3.50), and the estimated parameters and the latent response probabilities
P(X i = xi |θ ) are illustrated in Table 3.19 and Fig. 3.9, respectively. The latent trait
model makes a moderate fit to the data set, that is, G 2 = 7.515(d f = 7, P = 0.377).
The predictive or explanatory power of latent trait θ for manifest variables (Table
3.20) is similar to that of Stouffer-Toby’s data (Table 3.16). As demonstrated above,
the latent trait model can be estimated in a framework of the latent class model, and
the EM algorithm is effective to estimate the model parameters.

3.7 Discussion

In this chapter, latent class analyses with ordered latent classes have been discussed.
In latent distance analysis, the model is an extension of the Guttman scale model, and
the intrusion and omission errors are incorporated into the model itself. Assuming a
84 3 Latent Class Analysis with Ordered Latent Classes

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3

X1 X2 X3 X4

Fig. 3.9 Graph of latent trait model (3.50) estimated from Lazarsfeld-Stouffer’s data (Table 2.7)

Table 3.20 The explained


Manifest variable βi Cov(X i , θ) ECD
entropy Cov(X i , θ) and
ECDs in the estimated model X1 0.258 0.205
in Table 3.17 X2 0.208 0.172
X3 0.219 0.180
X4 0.215 0.177
Total 0.901 0.474

latent one-dimensional continuum, the positive response probabilities are structured


with threshold parameters to respond positively (successfully) to items. Another
model is constructed with a logit model with location parameters. The location
parameters are introduced to assess the levels of latent classes in one-dimensional
continuum, for example, trait, ability, and so on. In this sense, the model is viewed
as a discrete version of the latent trait model, that is, the Rash model. In the present
chapter, a latent trait model with discriminant parameters and item difficulties, that
is, a two-parameter logistic model, is also treated, and a latent class model approach
to the ML estimation of the parameters is provided, i.e., the ML estimation proce-
dure based on the EM algorithm. The method is demonstrated by using data sets in
Chapter 2. The latent class models in this chapter can deal with more latent classes
than latent class cluster model in Chapter 2. In practical data analyses, it is effec-
tive to use the latent class model that incorporates ordered latent classes into the
model itself, as demonstrated in this chapter. Moreover, it is sensible to challenge to
make new latent class models flexibly for purposes of data analyses, and it leads to
a development of latent class analysis.
References 85

References

1. Croon, M. A. (1990). Latent class analysis with ordered latent classes. British Journal of
Mathematical and Statistical Psychology, 43, 171–192.
2. Dayton, C. M., & Macready, G. B. (1976). A probabilistic model for validation of behavioral
hierarchies. Psychometrika, 43, 189–204.
3. Dayton, C. M., & Macready, G. B. (1980). A scaling model with response errors and intrinsically
unscalable responses. Psychometrika, 45, 343–356.
4. De Leeuw, J., & Verhelst, N. (1986). Maximum likelihood estimation in generalized Rasch
models. Journal of Educational Statistics, 11, 183–196.
5. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete
data via the EM algorithm (with discussion). J R Stat Soc B, 39, 1–38.
6. Eshima, N., & Asano, C. (1988). On latent distance analysis and the MLE algorithm.
Behaviormetrika, 24, 25–32.
7. Eshima, N., & Asano, C. (1989). Latent ordered class analysis. Bull Comput Stat Jpn, 2, 25–34.
(in Japanese).
8. Eshima, N. (1992). A hierarchical assessment of latent traits by using latent Guttman scaling.
Behaviormetrika, 19, 97–116.
9. Lazarsfeld, P. F., & Henry, N. M. (1968). Latent structure analysis. Boston: Houghton Mifflin.
10. Lindsay, B., Clogg, C., & Grego, J. (1991). Semiparametric estimation in the Rasch model and
related exponential response models, including a simple latent class model for item analysis.
Journal of American Statistical Association, 86, 96–107.
11. Magidson, J., & Vermunt, J. K. (2001). Latent class factor and cluster models: Bi-plots, and
related graphical displays. Sociological Methodology, 31, 223–264.
12. McHugh, R. B. (1956). Efficient estimation of local identification in latent class analysis.
Psychometrika, 20, 331–347.
13. Price, L. C., Dayton, C. M., & Macready, G. B. (1980). Discovery algorithms for hierarchical
relations. Psychometrika, 45, 449–465.
14. Proctor, C. H. (1970). A probabilistic formulation and statistical analysis of Guttman scaling.
Psychometrika, 35, 73–78.
15. Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Illinois: The
University of Chicago Press.
16. Stouffer SA, Toby J (1951) Role conflict and personality. Am J Soc 56:395–406
17. Vermunt, J. K. (2010). Latent class models. Int Encycl Educ, 7, 238–244.
Chapter 4
Latent Class Analysis with Latent Binary
Variables: An Application for Analyzing
Learning Structures

4.1 Introduction

Usual latent class analysis is carried out without any assumptions on latent response
probabilities for test items. In this sense, latent classes in the analysis are treated
parallelly and the analysis is referred to as the latent class cluster analysis [10].
In Chap. 3, latent class analyses with ordered latent classes have been discussed
with models that incorporate the ordered structures into the models themselves. In
latent distance analysis, the response items are ordered with respect to the item levels
(difficulties) that are located in a one-dimensional latent continuum, and an individual
beyond the levels responds to the correspondent items with higher probabilities than
an individual with below the levels. The latent distance model can be applied to
learning studies as well, for example, assessing individuals’ acquisition states of
several skills for solving test binary items. Let X i , i = 1, 2, . . . , I be manifest
response variables corresponding to items i, such that

1 (success to item i)
Xi = , (4.1)
0 ( f ailur e)

and let Si , i = 1, 2, . . . , I be acquisition states of skills i for solving test items i,


such that

1 (acquisition o f skill i)
Si = . (4.2)
0 (non − acquisition)

In this case, the test scales the states of the skill acquisitions, which are not
observed directly, and thus, Si are viewed as latent binary variables. In the above
assumption, the following inequalities for success probabilities for test items are
naturally required:

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 87
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_4
88 4 Latent Class Analysis with Latent Binary Variables …

P(X i = 1|Si = 1) > P(X i = 1|Si = 0), i = 1, 2, . . . , I.

If the skills under study have prerequisite relations, for example, skill i prerequisite
to skill i + 1, i = 1, 2, . . . , I , then, the latent states of skill acquisition are

(S1 , S2 , . . . , S I ) = (0, 0, . . . , 0), (1, 0, . . . , 0), . . . , (1, 1, . . . , 1)

and the latent states correspond to latent classes, so the model is the same as the
latent distance model. For example, assuming S1 is the state of addition skill in
arithmetic, S2 that of multiplication skill and S3 that of division skill, then, the
skill of addition is prerequisite to that of multiplication and the skill of multi-
plication is prerequisite to that of division, and the scale patterns (S1 , S2 , S3 ) are
(0, 0, 0), (1, 0, 0), (1, 1, 0), and(1, 1, 1). However, in general cases, skills under
consideration may have such a hierarchical order as the above, for example, for
skills S1 , S2 , S3 , S4 , there may be a case with skill patterns (0, 0, 0, 0),(1, 0, 0, 0),
(1, 1, 0, 0),(1, 1, 0, 0), (1, 0, 1, 0), (1, 1, 1, 0), (1, 1, 1, 1). For treating such cases,
extensions of the latent distance model were proposed by several authors [3–5, 8, 11].
In this chapter, latent class analysis with latent binary variables is discussed.
Section 4.2 reviews latent class models for dealing with scale patterns of the latent
variables. In Sect. 4.3, the ML estimation procedure for a structured latent class
model for explaining learning structures is discussed. Section 4.4 provides numerical
examples to demonstrate the analysis. In Sect. 4.5, an approach to consider learning
or developmental processes is given. Sections 4.6 and 4.7 consider a method for
evaluating mixed ratios of learning processes in a population. In Sect. 4.8, a path
analysis in learning and/or developmental structures is treated, and in Sect. 4.9, a
numerical example is provided to demonstrate the analysis. Finally, in Sect. 4.10,
a summary of the present chapter and discussions on the latent class analysis with
binary latent variables are given for leading to further studies to develop the present
approaches in the future.

4.2 Latent Class Model for Scaling Skill Acquisition


Patterns

In (4.1) and (4.2), let  be the sample space of latent variable (skill or trait acquisition)
vector S = (S1 , S2 , . . . , S I ) and let v(s) be the latent class proportions with latent
variable vector S = s ∈ , where s = (s1 , s2 , . . . , s I ). Then, an extended version
of latent distance model (3.6) was made as follows [5]:

P(X = x) = v(s)P(X = x|S = s), (4.3)
s
4.2 Latent Class Model for Scaling Skill Acquisition Patterns 89

where


I
P(X = x|S = s) = P(X i = xi |Si = si )
i=1
I 
 xi  1−x i
exp(αi + si exp(βi )) 1
=
1 + exp(αi + si exp(βi )) 1 + exp(αi + si exp(βi ))
i=1

I
exp{xi (αi + si exp(βi ))}
= . (4.4)
1 + exp(αi + si exp(βi ))
i=1

In which follows, the term “skill” is employed for convenience of the discussion.
In the above model, the intrusion (guessing) and omission (forgetting) error prob-
abilities for responding to items i, P(X i = 1|Si = 0) and P(X i = 0|Si = 1), are,
respectively, expressed as follows:

exp(αi )
P(X i = 1|Si = 0) = , P(X i = 0|Si = 1)
1 + exp(αi )
1
= , i = 1, 2, . . . , I. (4.5)
1 + exp(αi + exp(βi ))

Considering responses to test items, the following inequalities are needed.

P(X i = 1|Si = 0) < P(X i = 1|Si = 1), i = 1, 2, . . . , I. (4.6)

The above inequalities are satisfied by the structured model (4.5), so this model
is an extension of the following three models.
As reviewed in Chap. 3, in Proctor [11], the intrusion and omission error
probabilities were given by

P(X i = 1|Si = 0) = P(X i = 0|Si = 1) = π L , i = 1, 2, . . . , I. (4.7)

In this model, the intrusion and omission error probabilities are constant through
test items. Following the above model, in Macready and Dayton [3], the following
error probabilities are used:

P(X i = 1|Si = 0) = π L , P(X i = 0|Si = 1) = 1 − π H , i = 1, 2, . . . , I. (4.8)

In the above model, the intrusion and omission error probabilities are, respectively,
constant through the items. In Dayton and Macready [4],

P(X i = 1|Si = 0) = P(X i = 0|Si = 1) = π Li , i = 1, 2, . . . , I. (4.9)


90 4 Latent Class Analysis with Latent Binary Variables …

In this model, the intrusion and omission error probabilities are equal for each test
item. The above three models do not satisfy the inequalities (4.7) in the parameter
estimation, without making any structures as model (4.5). In the next section, an
ML estimation procedure for model (4.3) with (4.4) is given according to the EM
algorithm.

4.3 ML Estimation Procedure for Model (4.3) with (4.4)

For a practical convenience, it is assumed the sample space of latent vari-


able vector S = (S1 , S2 , . . . , S I )T , , includes all skill acquisition patterns,
(0, 0, . . . , 0), (1, 0, . . . , 0), . . . , (1, 1, . . . , 1). Let φ = ((v(s)), (αi ), (βi )) be the
parameter vector of the latent class model (4.5), the EM algorithm for obtaining the
ML estimates of the parameters is given as follows:
EM algorithm
(i) E-step
     T
Let t φ = t v(s) , t αi , t βi be the estimate of parameter vector φ at the t
th iteration in the EM algorithm. Then, the conditional
   expectations
 of complete
data (n(x, s)) for given parameters t φ = t v(s) , t αi , t βi are calculated in the
(t + 1) th iteration as follows:
I
t
v(s) t
P(X = x|S = s)
t+1
n(x, s) = n(x)
i=1
I , s ∈ , (4.10)
t v(u) t P(X = x|S = u)
u i=1

where
  
exp xi t αi + si exp t βi
t
P(X = x|S = s) = , xi = 0, 1.
1 + exp(t αi + si exp(t βi ))

In (4.10), notation u implies the summation over all patterns u ∈ .


(ii) M-step
t+1 
The loglikelihood function of parameter vector φ for complete data n(x, s) is
given by
   
Q φ| t φ = l φ| t+1 n(x, s) = t+1 n(x, s)logv(s)
s x
⎡ ⎤
 I
+ t+1 n(x, s) ⎣ {xi (αi + si exp(βi )) − log(1 + exp(αi + si exp(βi )))}⎦. (4.11)
s x i=1
4.3 ML Estimation Procedure for Model (4.3) with (4.4) 91

Based on a similar discussion in the previous chapter, we have the estimates


t+1
v(s) as follows:

t+1
n(x, s)
t+1
v(s) = x
, s ∈ . (4.12)
N
With respect to parameters αi and βi , the Newton–Raphson method has to be
employed for maximizing Q φ|t φ in the M-step. Let φ (α,β) = ((αi ), (βi )), and
let u th iterative value of φ (α,β) in the M-step be φ (α,β)u = ((αiu ), (βiu )), where
   
φ (α,β)1 = t αi , t βi , Then, φ (α,β)u+1 is obtained as follows:

φ (α,β)u+1 = φ (α,β)u − H −1
u g u , u = 1, 2, . . . , (4.13)

where g u and H u are values


 of the gradient vector and the Hessian matrix at
φ = t+1 v(s) , φ (α,β)u . From this algorithm, we can get u → ∞limφ (α,β)u =
t+1  t+1 
αi , βi .
Remark 4.1 The gradient vector and the Hessian matrix in the above M-step are set
as follows:
⎛ ⎞ ⎛ 2  2 ⎞
∂ Q (φ| t φ ) ∂ Q (φ| t φ ) ∂ Q (φ| t φ )
∂α ∂α ∂α ∂α ∂β
g = ⎝ ∂ Q (φ|i t φ )  ⎠, H = ⎝ ∂ 2 Q (i φ|t jφ )  ∂ 2 Q (i φ|tiφ )  ⎠, (4.14)
∂αi ∂βi ∂αi ∂βi ∂β j

where
 
∂ Q φ| t φ 
= t+1
n(x, s)(xi − P(X i = 1|Si = si )),
∂αi s x
i = 1, 2, . . . , I ;
 
∂ Q φ| t φ 
= t+1
n(x, s)(xi − P(X i = 1|Si = si ))si exp(βi ),
∂βi s x
i = 1, 2, . . . , I ;
 
∂ 2 Q φ|t φ 
=− t+1
n(x, s)P(X i = 1|Si = si )(1 − P(X i = 1|Si = si )),
∂ai2 s x
i = 1, 2, . . . , I ;
 
∂ 2 Q φ|t φ 
=− t+1 n(x, s)P(X = 1|S = s )(1 − P(X = 1|S = s ))s exp(β ),
i i i i i i i i
∂αi ∂βi s x
i = 1, 2, . . . , I ;
92 4 Latent Class Analysis with Latent Binary Variables …

  
∂ 2 Q φ t φ 
t+1
= n(x, s){xi − P(X i = 1|Si = si )
∂βi2 s x
− P(X i = 1|Si = si )(1 − P(X i = 1|Si = si ))si exp(βi )}si exp(βi ),
i = 1, 2, . . . , I ;

∂ 2 Q(φ|s φ) ∂ 2 Q(φ|s φ) ∂ 2 Q(φ|s φ)


= = = 0, i = j.
∂αi ∂α j ∂αi ∂β j ∂βi ∂β j

The above algorithm has been made, where the latent sample space  has all the
2 I skill acquisition patters; however, to identify the latent class model, the number
of latent classes A are restricted by

A < 2 I − 2I. (4.15)

The above algorithm has the following property. If we set the initial value of class
proportion v(s) as 0 v(s) = 0, from (4.10) we have 1 n(x, s) = 0 for all the manifest
response patterns x. From (4.12) it follows that 1 v(s) = 0, and inductively, we obtain
t
v(s) = 0, t = 1.2, 3, . . .

Hence, if we set 0 v(s) = 0 for all skill acquisition patterns s ∈  − 0 in order to


identify the model, where 0 is a set of skill acquisition patterns assumed beforehand,
the class proportions are automatically set as zeroes, and the above algorithm can
work effectively to get the ML estimates of the model parameters. For example, if
for I = 4, we set

0 = {(0, 0, 0, 0), (1, 0, 0, 0), (1, 1, 0, 0), (1, 1, 1, 0), (1, 1, 1, 1)},

the above algorithm can be used for estimating the latent distance model. In the
present latent class analysis, it is meaningful to detect latent classes (skill acquisition
patterns) s with positive class proportions v(s) > 0. In the next section, through
numerical examples with practical data sets in Chap. 2, an exploratory method for
determining the latent classes is demonstrated.

4.4 Numerical Examples (Exploratory Analysis)

By using the Stouffer-Toby data (Table 2.1), McHugh data (Table 2.5), and
Lazarsfeld-Stouffer data (Table 2.7), the present latent class analysis is demonstrated.
From restriction (4.15) with I = 4, we have A < 8, so the maximum number of latent
classes is seven. From this, considering response data in Tables 2.1, , 2.5, and 2.7,
the following skill acquisition patterns (latent classes) are assumed in the data sets
4.4 Numerical Examples (Exploratory Analysis) 93

Table 4.1 The sets of initial skill acquisition patterns (0 ) for the three data sets
Stouffer-Toby data (0, 0, 0, 0), (0, 1, 1, 0), (0, 0, 0, 1), (0, 1, 0, 1), (1, 1, 0, 1), (0, 0, 1, 1), (1, 1, 1, 1)
McHugh data (0, 0, 0, 0), (0, 1, 0, 0), (1, 1, 0, 0), (1, 1, 1, 0), (0, 0, 1, 1), (0, 1, 1, 1), (1, 1, 1, 1)
Lazarsfeld-Stouffer (0, 0, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1), (0, 1, 0, 1), (0, 0, 1, 1), (0, 1, 1, 1), (1, 1, 1, 1)
data

(Table 4.1) as the initial skill acquisition patterns (latent classes). In order to select
the best model, a backward elimination procedure is used. Let 0 be the initial set of
skill acquisition patterns, for example, in Table 4.1; M(0 ) be the initial model with
 

0 ; and let M (0 ) be the ML estimate of M(0 ). According to M (0 ), the latent


class with the minimum proportion v (s) is deleted from the initial skill acquisition
patterns. Let 1 be the set of the patterns left, and let M(1 ) be the model with 1 .
 

Then, the ML estimates M (0 ) and M (1 ) are compared with the log likelihood
ratio test or the Pearson chi-square test, and if the results are statistically significant
with significance level α, then, M(1 ) is accepted and the procedure continues simi-
larly by setting 1 as the initial skill acquisition pattern set, whereas if the results are
not significant, the procedure stops and model M(0 ) is selected as a most suitable
model. The algorithm is shown as follows:
Backward Elimination Procedure
(i) Set 0 as the initial skill acquisition patterns.


(ii) Obtain the ML estimate M (0 ). 

(iii) Delete the pattern sk with the minimum value of v (s) from k and set k+1 =
k \ {sk }


(iv) Obtain M (k+1 )


 

(v) If M (k+1 ) is accepted for (better than) M (k ) according to the loglikelihood
ratio test for the relative goodness-of-fit to data set, go to (iii) by substituting
k for k + 1, if not, the procedure stops.
According to the above procedure, we have the final models shown in Table 4.2.
For Stouffer-Toby data and Lazarsfel-Stouffer data, the results are the same as those
in Tables 2.3 and 2.9, respectively. It may be said that concerning Stouffer-Toby
data, there exist latent “universalistic” and “particularistic” states for responding to
the test items, and with respect to Lazarsfeld-Stouffer data, latent “favorable” and
“unfavorable” states to the Army. Hence, it means that all four skills (traits) are equal,
i.e., S1 = S2 = S3 = S4 . The learning structure is expressed as.

(0, 0, 0, 0) → (1, 1, 1, 1).

For McHugh data, the results are interpreted as S1 = S2 and S3 = S4 , and the
learning structure can be expected as in Fig. 4.1, and the following two learning
processes can be assumed:

(i) (0, 0, 0, 0) → (1, 1, 0, 1) → (1, 1, 1, 1) and


94 4 Latent Class Analysis with Latent Binary Variables …

Table 4.2 The results of the analysis of the three data sets
Item positive response probability
Pattern* Proportion** X1 X2 X3 X4
Stouffer-Toby (0, 0, 0, 0) 0.279 0.007 0.060 0.073 0.231
(1, 1, 1, 1) 0.721 0.286 0.670 0.646 0.868
Test of GF*** G 2 = 2.720, d f = 6, P = 0.843
McHugh (0, 0, 0, 0) 0.396 0.239 0.244 0.112 0.204
(1, 1, 0, 0) 0.077 0.894 0.996 0.112 0.204
(0, 0, 1, 1) 0.200 0.239 0.244 0.979 0.827
(1, 1, 1, 1) 0.327 0.894 0.996 0.979 0.827
Test of GF*** G 2 = 1.100, d f = 4, P = 0.894
Lazarsfeld-Stouffer (0, 0, 0, 0) 0.445 0.093 0.386 0.442 0.499
(1, 1, 1, 1) 0.555 0.572 0.818 0.906 0.944
Test of GF*** G 2 = 8.523, d f = 6, P = 0.202
* Skill Acquisition Pattern; ** Class Proportion; ***Test of Goodness-of-Fit

Fig. 4.1 The learning


structure in McHugh data set

(ii) (0, 0, 0, 0) → (0, 0, 1, 1) → (1, 1, 1, 1) (4.16)

In this case, it is significant to discuss the proportions of subpopulations according


to the above two learning processes. The topic is treated in the next section.

4.5 Dynamic Interpretation of Learning (Skill Acquisition)


Structures

Let Si be skill acquisition states of skill i = 1, 2, . . . , I , and let skill i be prerequisite


to skill i + 1, i = 1, 2, . . . , I − 1. To discuss a dynamic interpretation of learning
structures, the following notation is introduced:

S1 → S2 → · · · → S I . (4.17)

In the above prerequisite relation, the sample space of S = (S1 , S2 , . . . , S I ) is

 = {(0, 0, . . . , 0), (1, 0, . . . , 0), . . . , (1, 1, . . . , 1)}, (4.18)


4.5 Dynamic Interpretation of Learning (Skill Acquisition) Structures 95

and the space is called a learning space in this book. As in the previous section,
notation (sequence) (4.17) can also be expressed by using skill acquisition patterns
in learning space :

(0, 0, . . . , 0) → (1, 0, . . . , 0) → · · · → (1, 1, . . . , 1).

In this case, the model is the latent distance model discussed


in the previous chapter. Hence, the conditional probabilities
P(Si+1 = si+1 |(S1 , S2 , . . . , Si ) = (s1 , s2 , . . . , si )) are given by

P(Si+1 = si+1 |(S1 , S2 , . . . , Si ) = (s1 , s2 , . . . , si ))



⎨ 0 (si = 0, si+1 = 1)
= 1 (si = 0, si+1 = 0) , i = 1, 2, . . . , i.

P(Si+1 = si+1 |Si = si ) (si = 1)

Since in the sequence of latent variables {Si } (4.17), Si+1 depends on only the
state of Si from the above discussion, we have the following theorem:
Theorem 4.1 Sequence (4.17) is a Markov chain. 
From the above discussion, prerequisite relations among skills to be scaled can be
interpreted as learning processes shown in (4.17). Below, “structure” and “process”
will be used as the same in appropriate cases. Let

qsi si+1 ,i = P(Si+1 = si+1 |Si = si ), i = 1, 2, . . . , I − 1.

The transition matrix Q i from Si to Si+1 is given by


 
1 0
Qi = , i = 1, 2, . . . , I − 1. (4.19)
q10,i q11,i

Although sequence {Si } are not observed sequentially at points in time,


Theorem 4.1 induces a dynamic interpretation of learning space (4.18). The tran-
sition probabilities q11,i imply the intensities of skill acquisition of skill i + 1 given
skill i, i = 1, 2, . . . , I − 1. We can give it the following dynamic interpretation: if an
individual acquires skill i, then, the individual acquires skill i + 1 with probability
q11,i . We have the following theorem [6]:
Theorem 4.2 The transition probabilities q10,i and q11,i of Markov chain (4.17) are
expressed as follows:

i
  
v(1, 1, . . . , 1, 0, . . . , 0)
q10,i =
I , q11,i = 1 − q10,i , i = 1, 2, . . . , 1 − 1.
v(1, 1, . . . , 1, 0, . . . , 0)
k=i   
k
96 4 Latent Class Analysis with Latent Binary Variables …

P(Si =1,Si+1 =0)


Proof q10,i = P(Si+1 = 0|Si = 1) = P(Si =1)
.
Let subset i ⊂  be defined by
⎧ ⎫
⎨   k
 ⎬
i = (1, 1, . . . , 1, 0, . . . , 0), k = i, i + 1, . . . , I .
⎩ ⎭

Then, in Markov chain (4.17), it follows that

Si = 1 ⇐⇒ S = (S1 , S2 , . . . , S I ) ∈ i

and
i
  
Si = 1, Si+1 = 0 ⇔ S = (1, 1, . . . , 1, 0, . . . , 0).

From this, we have


I
P(Si = 1) = P((S1 , S2 , . . . , S I ) ∈ i ) = v(1, 1, . . . , 1, 0, . . . , 0),
  
k=i k
⎛ ⎞

P(Si = 1, Si+1 = 0) = P ⎝(S1 , S2 , . . . , S I ) = (1, 1, . . . , 1, 0, . . . , 0)⎠


  
i
⎛ ⎞

= v ⎝(1, 1, . . . , 1, 0, . . . , 0)⎠.
  
i

Thus, the theorem follows. 


The probabilities q11,i are regarded as the path coefficients relating to path Si →
Si+1 , i = 1, 2, . . . , I − 1, and the direct effects of Si on Si+1 are defined by the path
coefficients. The path diagram (4.17) can be illustrated as follows:
q11,1 q11,2 q11,I −1
S1 → S2 → · · · → S I .

Moreover, we have the following theorem.

Theorem 4.3 In (4.17), the following formulae hold true:

 
j−1

P S j = 1|Si = 1 = q11,k , j > i. (4.20)
k=i
4.5 Dynamic Interpretation of Learning (Skill Acquisition) Structures 97

Proof Since sequence (4.17) is a Markov chain with transition matrices (4.19), the
theorem follows. 
The probabilities in (4.20) are calculated by multiplying the related path
coefficients, so we have the following definition:

Definition
  learning structure (4.17), for j > i, probabilities
4.1 In
P S j = 1|Si = 1 in (4.20) are defined as the pathway effects of Si on S j through
path
q11,i q11,i+1 q11, j−1
Si → Si+1 → · · · → S j .
 
The pathway effects are denoted by e path Si → Si+1 → · · · → S j .
In the above definition, paths Si → Si+1 → · · · → S j are partial paths of (4.17).
If path Si1 → Si2 → · · · → Sik is not a partial path of (4.17), then, we set
 
e path Si1 → Si2 → · · · → Sik = 0.

The above discussion is applied to the results of the latent distance analysis (Table
3.2) of Stouffer-Toby data set (Table 2.1). Since
  

v (0, 0, 0, 0) = 0.296, v (1, 0, 0, 0) = 0.344, v (1, 1, 0, 0) = 0.103,

 

v (1, 1, 1, 0) = 0.049, v (1, 1, 1, 1) = 0.208,

from Theorem 4.2, we have




 v (1, 0, 0, 0)
q 10,1 =     = 0.489,
v (1, 0, 0, 0) + v (1, 1, 0, 0) + v (1, 1, 1, 0) + v (1, 1, 1, 1)


q 11,1 = 0.511,


 v (1, 1, 0, 0) 

q 10,2 =    = 0.286, q 11,2 = 0.714,


v (1, 1, 0, 0) + v (1, 1, 1, 0) + v (1, 1, 1, 1)


 v (1, 1, 1, 0) 

q 10,3 =   = 0.191, q 11,3 = 0.809.


v (1, 1, 1, 0) + v (1, 1, 1, 1)

The path coefficients are illustrated with the sequence of Si , i = 1, 2, 3, 4, and


we have
0.511 0.714 0.809
S1 → S2 → S3 → S4 . (4.21)

According to Theorem 4.3, for example, the pathway effect of S1 on S4 is


calculated as
98 4 Latent Class Analysis with Latent Binary Variables …

Table 4.3 Pathway effects of


S2 S3 S4
Si on S j , i < j in sequence
(4.21) S1 0.511 0.375 0.295
S2 – 0.714 0.578
S3 – – 0.809

0.511 × 0.714 × 0.809 = 0.295.

All the pathway effects in sequence (4.21) are shown in Table 4.3.
For McHugh data, the results of latent class analysis show there are two learning
processes (4.16) (Fig. 4.1). In this case, for S1 (= S2 ) and S3 (= S4 ), the learning
structure is a mixture of the following processes:

(i)S1 → S3 , (ii)S3 → S1 , (4.22)

and it is meaningful to consider the mixed ratios of the learning processes in the
population. To treat such cases, the next section discusses general learning structures.

4.6 Estimation of Mixed Proportions of Learning Processes

Suppose that there exist the following three learning processes:



⎨ (i)S1 → S2 → S3 → S4 ,
(ii)S3 → S2 → S1 → S4 , (4.23)

(iii)S3 → S1 → S2 → S4 .

Then, the sample space of (S1 , S2 , S3 , S4 ) is

 ={(0, 0, 0, 0), (1, 0, 0, 0), (0, 0, 1, 0), (1, 1, 0, 0),


(1, 0, 1, 0), (0, 1, 1, 0)(1, 1, 1, 0), (1, 1, 1, 1)}, (4.24)

so the above learning structure is expressed with skill acquisition patterns


(s1 , s2 , s3 , s4 ) (Fig. 4.2). It is assumed that a population is divided into three subpop-
ulations, each of which depends on one of the three learning processes (struc-
ture) (4.23). Let v(s1 , s2 , s3 , s4 , Pr ocess k) be the proportions of individuals with
skill acquisition patters (s1 , s2 , s3 , s4 ) and learning process k, k = 1, 2, 3. Then, in
general, it follows that


3
v(s1 , s2 , s3 , s4 ) = v(s1 , s2 , s3 , s4 , Pr ocess k). (4.25)
k=1
4.6 Estimation of Mixed Proportions of Learning Processes 99

Fig. 4.2 Path diagram of (4.23) based on the sample space (4.24)

In Fig. 4.2, the following equations hold:




3

⎪ v(0, 0, 0, 0) = v(0, 0, 0, 0, Pr ocess k)



⎨ k=1

3
v(1, 1, 1, 0) = v(1, 1, 1, 0, Pr ocess k) , (4.26)




k=1


3
⎩ v(1, 1, 1, 1) = v(1, 1, 1, 1, Pr ocess k)
k=1

v(0, 0, 1, 0) = v(0, 0, 1, 0, Pr ocess 2) + v(0, 0, 1, 0, Pr ocess 3), (4.27)



⎪ v(1, 0, 0, 0) = v(1, 0, 0, 0, Pr ocess 1)

v(1, 1, 0, 0) = v(1, 1, 0, 0, Pr ocess 1)
. (4.28)
⎪ v(0, 1, 1, 0) = v(0, 1, 1, 0, Pr ocess 2)


v(1, 0, 1, 0) = v(1, 0, 1, 0, Pr ocess 3)

In (4.23), although each sequence is a Markov chain, parameters


v(s1 , s2 , s3 , s4 , Pr ocess k) in (4.26) and (4.27) are not identified, so we have
to impose a constraint on the parameters. Let wk , k = 1, 2, 3 be proportions of
subpopulations with learning processes (4.23). Then,


3
wk = 1.
k=1

In order to identify the parameters v(s1 , s2 , s3 , s4 , Pr ocess k), in this chapter,


the following constraint is placed on the parameters. If a skill acquisition pattern
is derived from some learning processes, it is assumed that the proportions of indi-
viduals with the skill acquisition patterns derived from the learning processes are
in proportion to the related proportions wk . For example, skill acquisition pattern
(0, 0, 1, 0) comes from learning processes 2 and 3 (4.27), so we have the following
equations:
$ w
v(0, 0, 1, 0, Pr ocess 2) = w +w 2 v(0, 0, 1, 0)
w3
2 3 .
v(0, 0, 1, 0, Pr ocess 3) = w +w v(0, 0, 1, 0) = v(0, 0, 1, 0) − v(0, 0, 1, 0, Pr ocess 2)
2 3
100 4 Latent Class Analysis with Latent Binary Variables …

Under the above assumption, for learning process 1, we obtain




⎪ v(0, 0, 0, 0, Pr ocess 1) = w1 v(0, 0, 0, 0)



⎨ v(1, 0, 0, 0, Pr ocess 1) = v(1, 0, 0, 0)
v(1, 1, 0, 0, Pr ocess 1) = v(1, 1, 0, 0) . (4.29)



⎪ v(1, 1, 1, 0, Pr ocess 1) = w1 v(1, 1, 1, 0)

⎩ v(1, 1, 1, 1, Pr ocess 1) = w v(1, 1, 1, 1)
1

From (4.29), we have

w1 = v(0, 0, 0, 0, Pr ocess 1) + v(1, 0, 0, 0, Pr ocess 1)


7 + v(1, 1, 0, 0, Pr ocess 1) + v(1, 1, 1, 0, Pr ocess 1) + v(1, 1, 1, 1, Pr ocess 1)
= w1 (v(0, 0, 0, 0) + v(1, 1, 1, 0) + v(1, 1, 1, 1)) + v(1, 0, 0, 0) + v(1, 1, 0, 0). (4.30)

Similarly, it follows that

w2 = w2 (v(0, 0, 0, 0) + v(1, 1, 1, 0) + v(1, 1, 1, 1))


w2
+ v(0, 0, 1, 0) + v(0, 1, 1, 0), (4.31)
w2 + w3

w3 = w3 (v(0, 0, 0, 0) + v(1, 1, 1, 0) + v(1, 1, 1, 1))


w3
+ v(0, 0, 1, 0) + v(1, 0, 1, 0). (4.32)
w2 + w3

In this chapter, the above Eqs. (4.30)–(4.32) are called separating equations for
evaluating the mixed proportions wk , k = 1, 2, 3. From the above equations, we get

v(1,0,0,0)+v(1,1,0,0)

⎨ w1 = 1−(v(0,0,0,0)+v(1,1,1,0)+v(1,1,1,1))
v(0,1,1,0)
w2 = (1 − w1 ) v(0,1,1,0)+v(1,0,1,0) . (4.33)


w 3 = 1 − w1 − w2

Remark 4.2 Let us consider the solution of separating equations in (4.33). It is seen
that

0 < w1 < 1,

and in

0 < w2 < 1 − w1 < 1,

so we have
4.6 Estimation of Mixed Proportions of Learning Processes 101

0 < w3 < 1.

Hence, solution (4.33) is proper, and such solutions are called proper solutions.
Properties of the separating equations are discussed generally in the next section.
By using the above method, the mixed proportions of learning processes 1 and
2 (4.22) in McHugh data are calculated. From Table 4.2, we have the following
equations:

w1 = w1 (v(0, 0, 0, 0) + (1, 1, 1, 1)) + v(1, 1, 0, 0),
w2 = 1 − w1 .

From the above equations, we have the following solution:

v(1, 1, 0, 0) v(0, 0, 1, 1)
w1 = , w2 = .
1 − (v(0, 0, 0, 0) + v(1, 1, 1, 1)) 1 − (v(0, 0, 0, 0) + v(1, 1, 1, 1))
(4.34)

Hence, from (4.34) and Table 4.2, the estimates of the mixed proportions are
calculated as follows:
 0.077  

w1 = = 0.278, w 2 = 1 − w 1 = 0.722.
1 − (0.396 + 0.327)

4.7 Solution of the Separating Equations

The separating equations introduced in the previous section for estimating the
mixed proportions of learning processes are considered in a framework of learning
structures. First, the learning structures are classified into two types.
Definition 4.2 If all learning processes in a population have skill acquisition patterns
peculiar to them, the learning structure is called a clear learning structure. If not, the
learning structure is referred to as an unclear learning structure.
In the above definition, learning structures (4.16) and (4.23) are clear learning
structures, as shown in Figs. 4.1 and 4.2. On the other hand, the following learning
structure is an unclear one:

⎨ Pr ocess1(w1 ) : S1 → S2 → S3 → S4 ,
Pr ocess2(w2 ) : S2 → S1 → S3 → S4 , (4.35)

Pr ocess3(w3 ) : S2 → S3 → S1 → S4 .
102 4 Latent Class Analysis with Latent Binary Variables …

Fig. 4.3 a Path diagram of (4.35). b Path diagram of the learning structure with Processes 1 and 2
in (4.35)

From the above structure, Fig. 4.3a is made. From the figure, learning processes
1 and 3 have skill acquisition patterns (1, 0, 0, 0) and (0, 1, 1, 0) peculiar to them,
respectively; however, there are no skill acquisition patterns peculiar to Process 2.
Even if Process 2 is deleted from (4.35), the sample space of (S1 , S2 , S3 , S4 ) is the
same as (4.35); however, the structure is expressed as in Fig. 4.3b, then, the structure
is a clear one. With respect to the learning structure (4.35), from Fig. 4.3a, we have
the following separating equations:
w1
w1 = v(1, 1, 0, 0) + w1 (v(0, 0, 0, 0) + v(1, 1, 1, 0) + v(1, 1, 1, 1))
w1 + w2
+ v(1, 0, 0, 0),

w2 w2
w2 = v(1, 1, 0, 0) + v(0, 0, 1, 0)
w1 + w2 w2 + w3
+ w2 (v(0, 0, 0, 0) + v(1, 1, 1, 0) + v(1, 1, 1, 1)),

w3
w3 = v(0, 0, 1, 0) + w3 (v(0, 0, 0, 0) + v(1, 1, 1, 0) + v(1, 1, 1, 1))
w2 + w3
+ v(0, 1, 1, 0).

From the above equations, we have the following solution:

v(1, 0, 0, 0) v(0, 1, 1, 0)
w1 = , w2 = 1 − w1 − w3 , w3 = . (4.36)
v(1, 0, 0, 0) + v(1, 1, 0, 0) v(0, 1, 0, 0) + v(0, 1, 1, 0)

In this solution, we see that 0 < w1 < 1, 0 < w3 < 1, however, there are
cases where condition 0 < w2 < 1 does not hold, for example, if v(1, 0, 0, 0) =
0.1, v(1, 1, 0, 0) = 0.1, v(0, 1, 1, 0) = 0.3, v(0, 1, 0, 0) = 0.1, then, we have
4.7 Solution of the Separating Equations 103

w1 = 0.5, w2 = −0.25, w3 = 0.75.

The above solution is improper. If we set w2 = 0, that is, a clear learning structure
is shown in Fig. 4.3b, the mixed proportions wk are calculated as follows:

v(1, 0, 0, 0) + v(1, 1, 0, 0)
w1 = , w2 = 0, w3 = 1 − w1 .
v(1, 0, 0, 0) + v(0, 1, 0, 0) + v(1, 1, 0, 0) + v(0, 1, 1, 0)

The above solution is viewed as a proper solution for learning structure with
Processes 1 and 3 in (4.35), and is referred to as a boundary solution for learning
structure (4.35). With respect to the separating equations, in general we have the
following theorem:

Theorem 4.4 Let a clear learning structure be made of K learning processes, and
let wk , k = 1, 2, . . . , K be the mixed proportions of the processes. Then, the set of
separating equations has a proper solution such that

wk > 0, k = 1, 2, . . . , K ,

and


K
wk = 1. (4.37)
k=1

Proof Suppose that a clear learning structure consists of Process 1, Process 2,…, and
Process K . Let  be the sample space of skill acquisition patterns s = (s1 , s2 , . . . , s I )
in the clear learning structure; let k (= φ), k = 1, 2, . . . , K be the set of all skill
acquisition patterns peculiar to Process k, and let wk be the proportions of individuals
according to Process k in the population. Then, the separating equations are expressed
as follows:
% '
 &
K
wk = v(s) + f k w1 , w2 , . . . , w K |v(s), s ∈  \ k , k = 1, 2, . . . , K ,
s∈k k=1
(4.38)

(K 
where f k w1 , w2 , . . . , w K , s ∈  \ k=1 k are positive and continuous functions
(K
of wi , k = 1, 2, . . . , K , given v(s), s ∈  \ k=1 k . From k (= φ), we have

v(s) > 0, k = 1, 2, . . . , K .
s∈k

Let us consider the following function, (w1 , w2 , . . . , w K ) → (u 1 , u 2 , . . . , u K ):


104 4 Latent Class Analysis with Latent Binary Variables …
% '
 &
K
uk = v(s) + f k w1 , w2 , . . . , w K , v(s), s ∈  \ k , k = 1, 2, . . . , K .
s∈k k=1
(4.39)

For wk > 0, k = 1, 2, . . . , K , from (4.39) we have



uk > v(s) > 0, k = 1, 2, . . . ., K
s∈k

and from the definition of the separating equation, it follows that


K
u k = 1.
k=1

From the above discussion, function (w1 , w2 , . . . , w K ) → (u 1 , u 2 , . . . , u K ) is


continuous on domain
$ )

K 
D = (w1 , w2 , . . . , w K )| wk = 1; wk ≥ v(s) > 0 .
k=1 s∈k

The above function can be regarded as function D → D. Since set D is


convex and closed, from the Brouwer’s fixed point theorem, there exists a point
(w01 , w02 , . . . , w0K ) such that
⎛ ⎞
 &
K
w0k = v(s) + f k ⎝w01 , w02 , . . . , w0K |v(s), s ∈  \ k ⎠ > 0, k = 1, 2, . . . , K .
s∈k k=1

Hence, the theorem follows: 


In general, we have the following theorem.

Theorem 4.5 Let a learning structure be made of K learning processes, and let
wk , k = 1, 2, . . . , K be the mixed proportions of the processes. Then, the set of
separating equations has a solution such that

wk ≥ 0, k = 1, 2, . . . , K , (4.40)

and


K
wk = 1.
k=1
4.7 Solution of the Separating Equations 105

Proof In the clear learning structure, from Theorem 4.4, the theorem follows. On
the other hand, in the unclear learning structure, deleting some learning processes
from the structure, that is, setting the mixed proportions of the correspondent learning
processes as zeroes,wk = 0, then, we have a clear learning structure that has the same
sample space of skill acquisition patterns as the original unclear learning structure.
Then, we have the solution as in (4.40). This completes the theorem. 
A general method for obtaining solutions of the separating equations is given. In
general, a system of separating equations is expressed as follows:

wk = gk (w1 , w2 , . . . , w K |v(s), s ∈ ), k = 1, 2, . . . , K . (4.41)

The following function (w1 , w2 , . . . , w K ) → (u 1 , u 2 , . . . , u K ):

u k = gk (w1 , w2 , . . . , w K |v(s), s ∈ ), k = 1, 2, . . . , K (4.42)

is viewed as a continuous function C → C, where C is an appropriate convex and


closed set, for example, in learning structure (4.35), from Fig. 3a, we can set
⎧ ⎫
⎨ 
3 ⎬
C= (w1 , w2 , w3 )| wk = 1, w1 + w2 ≥ v(1, 0, 0, 0) + v(1, 1, 0, 0), w3 ≥ v(0, 1, 1, 0) ,
⎩ ⎭
k=1

and the above set is convex and closed. Then, function (4.42) has a fixed point
(w1 , w2 , w3 ) ∈ C. From this, the fixed point can be obtained as a convergence value
of the following sequence (wn1 , wn2 , . . . , wn K ), n = 1, 2, . . . :

wn+1,k = gk (wn1 , wn2 , . . . , wn K |v(s), s ∈ ), k = 1, 2, . . . , K ; n = 1, 2, . . . .


(4.43)

4.8 Path Analysis in Learning Structures

Let Si , i = 1, 2, . . . , I be skill acquisitions of skill i, and let Sk j , j = 1, 2, . . . , I


be those of skill k j in Process k, k = 1, 2, . . . , K , where {Si , i = 1, 2, . . . , I } =
Sk j , j = 1, 2, . . . , I . Then, we have the following learning processes:

Pr ocess k(wk ) : Sk1 → Sk2 → · · · → Sk I , k = 1, 2, . . . , K ; (4.44)


K
wk = 1
k=1
106 4 Latent Class Analysis with Latent Binary Variables …

where wk are proportions of subpopulations with Processes k. The pathway effects


of path Si → S j in Process k are defined according to Definition 4.1, and in general,
path coefficients for path Si → S j in (4.44) are defined as follows [6].
 
Definition 4.3 Let e path Si → S j |Pr ocess k be the pathway effects in Process
k, k = 1, 2, . . . , K . Then, the pathway effects of Si → S j , i = j are defined by

  K
 
e path Si → S j = wk e path Si → S j |Pr ocess k . (4.45)
k=1

The effects are the probabilities that paths Si → S j exist in the population
(learning structure). By using the above definition, path coefficients in a general
learning structure (4.44) can be calculated. In (4.35), for example, since

e path (S1 → S2 |Pr ocess 2) = e path (S1 → S2 |Pr ocess 3) = 0,

we have

e path (S1 → S2 ) = w1 e path (S1 → S2 |Pr ocess 1).

Similarly, we have

e path (S1 → S3 ) = w1 e path (S1 → S3 |Pr ocess 1)


+w2 e path (S1 → S3 |Pr ocess 3).

In the next section, the above method is demonstrated. In general, as an extension


of Definition 4.1, the following definition is made:

Definition 4.4 Let path Si1 → Si2 → · · · → Si J be a partial path in (4.44). Then,
the pathway effect of Si1 on Si J is defined by

  K
 
e path Si1 → Si2 → · · · → Si J = wk e path Si1 → Si2 → · · · → Si J |Pr ocess k .
k=1

By using learning structure (4.35), the above definition is demonstrated. The path
diagram of latent variables Si , i = 1, 2, 3, 4 is illustrated in Fig. 4.4. For example,
the pathway effect of S1 → S2 → S3 is


3
e path (S1 → S2 → S3 ) = wk e path (S1 → S2 → S3 |Pr ocess k)
k=1
4.8 Path Analysis in Learning Structures 107

Fig. 4.4 Path diagram for


learning structure (4.35)

= w1 e path (S1 → S2 → S3 |Pr ocess 1),

because Processes 2 and 3 do not have the path, i.e.,


e path (S1 → S2 → S3 |Pr ocess 1) = e path (S1 → S2 → S3 |Pr ocess 3) = 0.
Similarly, we have


3
e path (S3 → S1 → S4 ) = wk e path (S3 → S1 → S4 |Pr ocess k)
k=1
= w3 e path (S1 → S2 → S3 |Pr ocess 3).

The above method is demonstrated in a numerical example in the next section.

4.9 Numerical Illustration (Confirmatory Analysis)

In this section, a confirmatory analysis for explaining a learning structure is demon-


strated by using the above discussion. Table 4.4 shows the first data set in Proctor
[11]. For performing the analysis, let us assume there exist at most the following
three learning processes in a population:

⎨ Pr ocess 1(w1 ) : S5 → S4 → S3 → S2 → S1 ,
Pr ocess 2(w2 ) : S4 → S5 → S3 → S2 → S1 , (4.46)

Pr ocess 3(w3 ) : S4 → S3 → S5 → S2 → S1 .

From the above learning processes, we have seven sub-structures. Let Structure
(i) be the learning structures made by Process (i), i = 1, 2, 3; Structure (i, j) be
the structures composed of Processes i and j, i < j; and let Structure (1, 2, 3) be
the structure formed by three processes in (4.46). Then, the skill acquisition patterns
in the learning structures are illustrated in Table 4.5. From the table, Structures
(1,2,3) and (1,3) have the same skill acquisition patterns, so in this sense, we cannot
identify the structures from latent class model (4.3). The path diagram based on
108 4 Latent Class Analysis with Latent Binary Variables …

skill acquisition patterns (s1 , s2 , s3 , s4 , s5 ) is produced by learning Structure (1,2,3)


(Fig. 4.5), so learning Structure (1,2,3) is an unclear one. On the other hand, learning
Structure (1,3) produces a clear structure as shown in Fig. 4.6. In order to demonstrate
the discussion in the previous section, latent class model (4.3) with skill acquisition
patterns of learning Structure (1,2,3) is used. The results of the analysis are given in
Table 4.6. First, based on the path diagram shown in Fig. 4.5, the following separating
equations are obtained:

w1 = w1 (v(0, 0, 0, 0, 0) + v(0, 0, 1, 1, 1) + v(0, 1, 1, 1, 1) + v(1, 1, 1, 1, 1))


w1
+ v(0, 0, 0, 1, 1) + v(0, 0, 0, 0, 1), (4.47)
w1 + w2

w2 = w2 (v(0, 0, 0, 0, 0) + v(0, 0, 1, 1, 1) + v(0, 1, 1, 1, 1) + v(1, 1, 1, 1, 1))


w2 w2
+ v(0, 0, 0, 1, 0) + v(0, 0, 0, 1, 1), (4.48)
w2 + w3 w1 + w2

w3 = w3 (v(0, 0, 0, 0, 0) + v(0, 0, 1, 1, 1) + v(0, 1, 1, 1, 1) + v(1, 1, 1, 1, 1))


w3
+ v(0, 0, 0, 1, 0) + v(0, 0, 1, 1, 0). (4.49)
w2 + w3

From the above equations, we have

Table 4.4 Proctor’s first data


Response pattern Frequency Response pattern Frequency
00000 14 10000 2
00001 4 10001 1
00010 7 10010 0
00011 8 10011 3
00100 2 10100 2
00101 3 10101 7
00110 6 10110 1
00111 10 10111 14
01000 4 11000 0
01001 4 11001 3
01010 5 11010 0
01011 7 11011 9
01100 1 11100 1
01101 3 11101 10
01110 1 11110 2
01111 17 11111 62
Data Source Proctor [11]
4.9 Numerical Illustration (Confirmatory Analysis) 109

Table 4.5 Skill acquisition patterns for learning structures


Structure Skill acquisition patterns
Structure (1) (0,0,0,0,0), (0,0,0,0,1), (0,0,0,1,1), (0,0,1,1,1), (0,1,1,1,1), (1,1,1,1,1)
Structure (2) (0,0,0,0,0), (0,0,0,1,0), (0,0,0,1,1), (0,0,1,1,1), (0,1,1,1,1), (1,1,1,1,1)
Structure (3) (0,0,0,0,0), (0,0,0,1,0), (0,0,1,1,0), (0,0,1,1,1), (0,1,1,1,1), (1,1,1,1,1)
Structure (1,2) (0,0,0,0,0), (0,0,0,1,0), (0,0,0,0,1), (0,0,0,1,1), (0,0,1,1,1), (0,1,1,1,1),
(1,1,1,1,1)
Structure (2,3) (0,0,0,0,0), (0,0,0,1,0), (0,0,0,1,1), (0,0,1,1,0), (0,0,1,1,1), (0,1,1,1,1),
(1,1,1,1,1)
Structure (1,3) (0,0,0,0,0), (0,0,0,1,0), (0,0,0,1,1), (0,0,1,1,0), (0,0,0,1,1), (0,0,1,1,1),
(0,1,1,1,1), (1,1,1,1,1)
Structure (1,2,3) (0,0,0,0,0), (0,0,0,1,0), (0,0,0,1,1), (0,0,1,1,0), (0,0,0,1,1), (0,0,1,1,1),
(0,1,1,1,1), (1,1,1,1,1)

Fig. 4.5 Path diagram of skill acquisition patterns in learning Structure (1,2,3)

Fig. 4.6 Path diagram of skill acquisition patterns in learning Structure (1,3)

Table 4.6 The estimated positive response probabilities in learning Structure (1.2.3)
Skill acquisition pattern Class proportion Item response probability
Item 1 Item 2 Item 3 Item 4 Item 5
(0,0,0,0,0) 0.144 0.097 0.294 0.145 0.198 0.140
(0,0,0,0,1) 0.016 0.097 0.294 0.145 0.198 0.969
(0,0,0,1,0) 0.049 0.097 0.294 0.145 0.812 0.140
(0,0,0,1,1) 0.065 0.097 0.294 0.145 0.812 0.969
(0,0,1,1,0) 0.041 0.097 0.294 0.864 0.812 0.140
(0,0,1,1,1) 0.046 0.097 0.297 0.864 0.812 0.969
(0,1,1,1,1) 0.092 0.097 0.781 0.864 0.812 0.969
(1,1,1,1,1) 0.548 0.923 0.781 0.864 0.812 0.969
G2 = 16.499, (d f = 15, P = 0.350)
110 4 Latent Class Analysis with Latent Binary Variables …

v(0, 0, 0, 0, 1)
w1 = ,
v(0, 0, 0, 1, 0) + v(0, 0, 0, 0, 1)

w2 = 1 − w1 − w3 ,

v(0, 0, 1, 1, 0)
w3 = .
v(0, 0, 1, 1, 0) + v(0, 0, 0, 1, 1)

By using the estimates in Table 4.5, the estimates of the mixed proportions are
calculated as follows:
  

w1 = 0.246, w 2 = 0.368, w 3 = 0.386.

The above solution is a proper solution. The discussion in Sect. 4.5 is applied
to this example. Let v(s1 , s2 , s3 , s4 , s5 |Pr ocess k) be the proportions of individuals
with skills (s1 , s2 , s3 , s4 , s5 ) in Process k, k = 1, 2, 3. For example, considering
(4.47) for Process 1, we have

1 = v(0, 0, 0, 0, 0) + v(0, 0, 1, 1, 1) + v(0, 1, 1, 1, 1)


1 1
+ v(1, 1, 1, 1, 1) + v(0, 0, 0, 1, 1) + v(0, 0, 0, 0, 1),
w1 + w2 w1

so it follows that

v(0, 0, 0, 0, 0|Pr ocess 1) = v(0, 0, 0, 0, 0),

1
v(0, 0, 0, 0, 1|Pr ocess 1) = v(0, 0, 0, 0, 1)
w1
1
v(0, 0, 0, 1, 1|Pr ocess 1) = v(0, 0, 0, 1, 1),
w1 + w2

v(0, 0, 1, 1, 1|Pr ocess 1) = v(0, 0, 1, 1, 1),

v(0, 1, 1, 1, 1|Pr ocess 1) = v(0, 1, 1, 1, 1),

v(1, 1, 1, 1, 1|Pr ocess 1) = v(1, 1, 1, 1, 1).

1
In Process 1 in (4.46), the sequence is a Markov chain and let q11,i be the related
transition probabilities, for example, q11,5 is related to path S5 → S4 and q11,4
1 1
to
path S4 → S3 , and so on. Then, from Theorem 4.2, we have
4.9 Numerical Illustration (Confirmatory Analysis) 111

w1 +w2 v(0, 0, 0, 1, 1) + v(0, 0, 1, 1, 1) + v(0, 1, 1, 1, 1) + v(1, 1, 1, 1, 1)


1
1
q11,5 = ,
w1 v(0, 0, 0, 0, 1) + w1 +w2 v(0, 0, 0, 1, 1) + v(0, 0, 1, 1, 1) + v(0, 1, 1, 1, 1) + v(1, 1, 1, 1, 1)
1 1

v(0, 0, 1, 1, 1) + v(0, 1, 1, 1, 1) + v(1, 1, 1, 1, 1)


1
q11,4 = ,
1
w1 +w2
v(0, 0, 0, 1, 1) + v(0, 0, 1, 1, 1) + v(0, 1, 1, 1, 1) + v(1, 1, 1, 1, 1)
v(0, 1, 1, 1, 1) + v(1, 1, 1, 1, 1)
1
q11,3 = ,
v(0, 0, 1, 1, 1) + v(0, 1, 1, 1, 1) + v(1, 1, 1, 1, 1)
v(1, 1, 1, 1, 1)
1
q11,2 = .
v(0, 1, 1, 1, 1) + v(1, 1, 1, 1, 1)

From the above results and Table 4.6, first, we have the following path coefficients:
   0.924 0.866 0.933 0.856
Pr ocess 1 w1 = 0.246 : S5 → S4 → S3 → S2 → S1 .

Similarly, we get the estimates of Processes 2 and 3 as


   0.924 0.866 0.933 0.856
Pr ocess 2 w 2 = 0.368 : S4 → S5 → S3 → S2 → S1 ,

   0.924 0.866 0.933 0.856


Pr ocess 3 w 3 = 0.386 : S4 → S3 → S5 → S2 → S1.

According to Definition 4.3, second, the path coefficients of the above learning
structure are calculated, for example, we have

e path (S5 → S4 ) = w1 e path (S5 → S4 |Pr ocess 1) = 0.246 × 0.924 = 0.227,

e path (S4 → S3 ) = w1 e path (S4 → S3 |Pr ocess 1) + w3 e path (S4 → S3 |Pr ocess 3)
= 0.246 × 0.866 + 0.386 × 0.923 = 0.571.

All the path coefficients calculated as above are illustrated in Fig. 4.7a. Third,
some pathway effects in the learning structure are demonstrated. For example,

e path (S3 → S2 → S1 ) = w1 e path (S3 → S2 → S1 |Pr ocess 1)


+ w2 e path (S3 → S2 → S1 |Pr ocess 2)
= 0.246 × 0.933 × 0.856 + 0.358 × 0.933 × 0.856 = 0.340,

e path (S3 → S5 → S2 ) = w3 e path (S3 → S5 → S2 |Pr ocess 3)


= 0.386 × 0.866 × 0.933 = 0.312.

e path (S5 → S3 → S2 → S1 ) = w2 e path (S5 → S2 → S2 → S1 |Pr ocess 2)


112 4 Latent Class Analysis with Latent Binary Variables …

a b

Fig. 4.7 a Path coefficients of learning structure (4.46). b Path coefficients of learning Structure
(1,3)

= 0.368 × 0.866 × 0.933 × 0.856 = 0.255,

and so on.
If the solution is improper, that is, there are negative estimates in the solution,
Process 2 is deleted from the learning structure (4.46). Then, the learning structure
becomes as follows:

Pr ocess 1(w1 ) : S5 → S4 → S3 → S2 → S1
, (4.50)
Pr ocess 3(w3 ) : S4 → S3 → S5 → S2 → S1

and based on the path diagram shown in Fig. 4.6, we have

w1 = w1 (v(0, 0, 0, 0, 0) + v(0, 0, 1, 1, 1) + v(0, 1, 1, 1, 1) + v(1, 1, 1, 1, 1))


+ v(0, 0, 0, 1, 1) + v(0, 0, 0, 0, 1),

w3 = w3 (v(0, 0, 0, 0, 0) + v(0, 0, 1, 1, 1) + v(0, 1, 1, 1, 1) + v(1, 1, 1, 1, 1))


+ v(0, 0, 0, 1, 0) + v(0, 0, 1, 1, 0).

From the above equations, it follows that

v(0, 0, 0, 1, 1) + v(0, 0, 0, 0, 1)
w1 = , w3 = 1 − w1 .
v(0, 0, 0, 1, 1) + v(0, 0, 0, 0, 1) + v(0, 0, 0, 1, 0) + v(0, 0, 1, 1, 0)

By using the estimates in Table 4.6, we obtain


4.9 Numerical Illustration (Confirmatory Analysis) 113

 0.065 + 0.016  

w1 = = 0.474, w 3 = 1 − w 1 = 0.526.
0.065 + 0.016 + 0.049 + 0.041

In this learning structure, we have the following path coefficients:


$    0.961 0.833 0.933 0.856
Pr ocess 1 w 1 = 0.474 : S5 → S4 → S3 → S2 → S1 .
   0.891 0.898 0.933 0.856
Pr ocess 3 w 3 = 0.526 : S4 → S3 → S5 → S2 → S1.

By using the above results, the path diagram of the skill acquisitions of Si , i =
1, 2, 3, 4, 5 is illustrated in Fig. 4.7b.

4.10 A Method for Ordering Skill Acquisition Patterns

In this chapter, skill acquisition patterns, which are expressed as latent classes, are
explained with a latent class model that is an extended version of the latent distance
model discussed in Chap. 3. In the latent distance model, linear learning structures
are discussed, so the skill acquisition patterns are naturally ordered; however, in
the analysis of general learning structures, as treated in this chapter, such a natural
ordering the skill acquisition patterns cannot be made, for example, learning struc-
tures in Figs. 4.5 and 4.6, skill acquisition patterns, (0, 0, 0, 0, 1), (0, 0, 0, 1, 1),
(0, 0, 0, 1, 0), (0, 0, 1, 1, 0), cannot be order in a natural sense; however, it may
be required to assess the levels with a manner. In this example, since it is clear
pattern (0, 0, 0, 0, 0) is the lowest and (1, 1, 1, 1, 1) the highest, it is sensible to
measure distances from (0, 0, 0, 0, 0) or (1, 1, 1, 1, 1) to skill acquisition patterns
(s1 , s2 , s3 , s4 , s5 ) with a method. In order to do it, an entropy-based method for
measuring distances between latent classes proposed in Chap. 2 (2.30) is used. In
latent class model (4.3) with (4.4), let 0 = (0, 0, . . . , 0) and s = (s1 , s2 , . . . , s I ), and
let P(X = x|0) and P(X = x|s) be the conditional distributions with the skill acqui-
sition patterns, respectively. Then, from (2.30) the entropy-based distance between
the skill acquisition patterns, i.e., latent classes, is defined by

D ∗ (P(X = x|s)||P(X = x|0)).

From the above distance, we have


I
P(X i = 1|0 )
D ∗ (P(X = x|s ) P(X = x|0 ) ) = {P(X i = 1|0 ) log
i=1
P(X i = 1|si )
1 − P(X i = 1|0 )
+ (1 − P(X i = 1|0 )) log
1 − P(X i = 1|si )
P(X i = 1|si )
+ P(X i = 1|si ) log
P(X i = 1|0 )
114 4 Latent Class Analysis with Latent Binary Variables …

1 − P(X i = 1|si )
+ (1 − P(X i = 1|si )) log }.
1 − P(X i = 1|0 )

I  
P(X i = 1|0) P(X i = 1|si )
= (P(X i = 1|0) − P(X i = 1|si )) log − log ,
1 − P(X i = 1|0) 1 − P(X i = 1|si )
i=1
(4.51)

where

P(X i = 1|0) = P(X i = 1|Si = 0), P(X i = 1|si ) = P(X i = 1|Si = si ).

Let

D ∗ (P(X i = xi |si )||P(X i = xi |0)) = (P(X i = 1|0) − P(X i = 1|si ))


 
P(X i = 1|0) P(X i = 1|si )
log − log
1 − P(X i = 1|0) 1 − P(X i = 1|si )

Then, the above quantity is an entropy-based distance between distributions


P(X i = xi |si ) and P(X i = xi |0). By using the notation, formula (4.51) becomes


I
D ∗ (P(X = x|s)||P(X = x|0)) = D ∗ (P(X i = xi |si )||P(X i = xi |0))
i=1


I
= si D ∗ (P(X i = xi |1)||P(X i = xi |0)).
i=1
(4.52)

In Fig. 4.6, let 0 = (0, 0, . . . , 0) and s = (0, 0, 1, 1, 0), then, from (4.51) we have


4
D ∗ (P(X = x|s)||P(X = x|0)) = D ∗ (P(X i = xi |1)||P(X i = xi |0)).
i=3

Applying the above method to Table 4.6, the skill acquisition patterns are ordered.
Table 4.7 shows the entropy-based distances D ∗ (P(X i = xi |1)||P(X i = xi |0)), and
by using the distances, we have the distances D ∗ (P(X = x|s)||P(X = x|0)) which
are in an increasing order (Table 4.8). For example, with respect to skill acquisition

Table 4.7 Entropy-based distances D ∗ (P(X i = xi |1)||P(X i = xi |0)) with respect to manifest
variables X i for Table 4.5
Manifest variable X1 X2 X3 X4 X5
D ∗ (P(X i = xi |1)||P(X i = xi |0)) 3.894 1.046 2.605 1.757 4.359
4.10 A Method for Ordering Skill Acquisition Patterns 115

Table 4.8 Entropy-based distances D ∗ (P(X = x|s)||P(X = x|0)) with respect to skill acquisi-
tion patterns for Table 4.5
Skill acquisition pattern D ∗ (P(X = x|s)||P(X = x|0))
(0, 0, 0, 0, 0) 0
(0, 0, 0, 1, 0) 1.757
(0, 0, 0, 0.1) 4.359
(0, 0, 1, 1, 0) 4.362
(0, 0, 0, 1, 1) 6.116
(0, 0, 1, 1, 1) 8.721
(0, 1, 1, 1, 1) 9.767
(1, 1, 1, 1, 1) 13.661

patterns (0, 0, 1, 1, 0) and (0, 0, 0, 1, 1), the latter can be regarded as a higher level
than the former.

Remark 4.3 In the above method for grading the latent classes (skill acquisition
patterns), we can use the distances from 1 = (1, 1, . . . , 1) as well. Then, it follows
that

D ∗ (P(X = x|s)||P(X = x|1))


= D ∗ (P(X = x|1)||P(X = x|0)) − D ∗ (P(X = x|s)||P(X = x|0)).

For example, in Table 4.8, for s = (0, 0, 1, 1, 1), we have

D ∗ (P(X = x|s)||P(X = x|1)) = 13.661 − 8.721 = 4.940.

Hence, the results from the present ordering (grading) method based on 1 =
(1, 1, . . . , 1) are intrinsically the same as that based on 0 = (0, 0, . . . , 0). 
In latent class model (4.3), let s1 = (s11 , s12 , . . . , s1I ) and s2 = (s21 , s22 , . . . , s2I )
be skill acquisition patterns. From (4.50) and (4.51), the difference between the two
skill acquisition patterns, i.e., latent classes, is calculated by


I
D ∗ (P(X = x|s1 )||P(X = x|s2 )) = |s1i − s2i |D ∗ (P(X i = xi |1)||P(X i = xi |0)).
i=1
(4.53)

For example, in the example shown in Tables 4.7 and 4.8, for s1 = (0, 0, 1, 1, 0)
and s2 = (0, 1, 1, 1, 1), the difference between the latent classes is calculated as
follows:

D ∗ (P(X = x|s1 )||P(X = x|s2 )) = D ∗ (P(X 2 = x2 |1)||P(X 2 = x2 |0))


+ D ∗ (P(X 5 = x5 |1)||P(X 5 = x5 |0))
116 4 Latent Class Analysis with Latent Binary Variables …

= 1.046 + 4.359 = 5.405.

The difference calculated above can be interpreted as the distance between the
latent classes, measured in entropy. Figure 4.8 shows an undirected graph and the
values are the entropy-based difference between the latent classes. The distance
between the latent classes can be calculated by summing the values in the shortest
way between the latent classes, for example, in Fig. 4.8, there are two shortest ways
between (0, 0, 0, 1, 0) and (0, 0, 1, 1, 1):

(i) (0,0,0,1,0) ----- (0,0,0,1,1) ----- (0,0,1,1,1),


(ii) (0,0,0,1,0) ----- (0,0,1,1,0) ----- (0,0,1,1,1).

By the first way, the distance is calculated as 4.359 + 2.605 = 6.964, and the
same result is also obtained from the second way. It may be significant to make a
tree graph of latent classes by using cluster analysis with entropy (Chap. 2, Sect. 5),
in order to show the relationship of the latent classes. From Fig. 4.8, we have a tree
graph of the latent classes (Fig. 4.9).

Fig. 4.8 Undirected graph for explaining the differences between the latent classes

4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
00000 00010 00110 00001 00011 00111 01111 11111

Fig. 4.9 A tree graph of latent classes in Table 4.6 based on entropy
4.10 A Method for Ordering Skill Acquisition Patterns 117

Table 4.9 Entropy-based distances D ∗ (P(X i = xi |1)||P(X i = xi |0)) with respect to manifest
variables X i for Table 3.8 in Chap. 3
Manifest variable X 11 X 12 X 13 X 21 X 22 X 23
D ∗ (P(X i = xi |1)||P(X i = xi |0)) 3.193 4.368 3.637 4.413 2.855 5.543

Table 4.10 Entropy-based distances D ∗ (P(X = x|s)||P(X = x|0)) with respect to skill acquisi-
tion patterns for Table 3.8 in Chap. 3
Latent class D ∗ (P(X = x|s)||P(X = x|0)) Latent class D ∗ (P(X = x|s)||P(X = x|0))
(0, 0) 0 (0, 2) 8.398
(1, 0) 3.367 (1, 2) 12.036
(2, 0) 8.005 (2, 2) 16.404
(3, 0) 11.198 (3, 2) 19.596
(0, 1) 5.543 (0, 3) 12.812
(1, 1)ara> 9.180 (1, 3) 16.449
(2, 1) 13.549 (2, 3) 20.817
(3, 1) 16.741 (3, 3) 24,010

The present method for grading latent classes is applied to an example treated
in Sect. 3.4 (Chap. 3). The estimated latent classes are expressed as score vectors
(i, j), i = 0, 1, 2, 3; j = 0, 1, 2, 3, which imply pairs of levels for the general
intelligence θ1 and the verbal ability of children θ2 , respectively. In this example,
although the levels of child ability can be graded according to the sum of the scores,
t = i + j, it may be meaningful to use the present method for the grading. From
the estimated model shown in Table 3.8, we have the entropy-based distances with
respect to manifest variables (Table 4.9). Distances D ∗ (P(X = x|s)||P(X = x|0))
are calculated in Table 4.10, where 0 = (0, 0). By using the distances, grading of the
latent classes can be made. For example, for score t = 3, the order of latent classes
(3, 0), (2, 1) (1, 2), and (0.3) is as follows.

(3, 0) < (1.2) < (0, 3) < (2, 1).

4.11 Discussion

The present chapter has applied latent class analysis to explain learning structures.
Skill acquisitions are scaled with the related test items (manifest variables), and the
states of skill acquisition are expressed by latent binary variables, and thus, manifest
responses measure the states with response errors, i.e., omission (forgetting) and
intrusion (guessing) ones. The structures expressed in this context are called the
learning structures in this book. When the skills under consideration are ordered
with respect to prerequisite relationships, for example, for skills in calculation, (1)
118 4 Latent Class Analysis with Latent Binary Variables …

addition, (2) multiplication, and (3) division, the learning structure is called a linear
learning structure. The model in this chapter is an extension of the latent distance
model. From the learning structure, the traces of skill learning process in a population
can be discussed through the path diagrams of skill acquisition patterns, and based
on the traces, dynamic interpretations of the learning structures can be made. In
general, learning structures are not necessarily linear, that is, there exist some learning
processes of skills in a population. Hence, it is valid to assume that the population
is divided into several subpopulations that depend on learning processes of their
own. The present chapter gives a method to explain learning processes of skills
by using cross-sectional data. It is assumed that manifest variables depend only on
the corresponding latent variables (states of skills); however, it is more realistic to
introduce “transfer effects” in the latent class models [1, 2]. In the above example
of skills for calculation, it is easily seen that the skill of addition is prerequisite to
that of multiplication. In this case, the mastery of the skill of multiplication will
facilitate the responses to test items for addition, i.e., a “facilitating” transfer effect
of multiplication on addition, and thus, it is more appropriate to take the transfer
effects into account to discuss learning structures. In the other way around, there
may be cases where “inhibiting” transfer effects are considered in analysis of learning
structures [9]. In this chapter, “transfer effects” have not been hypothesized on the
latent class model. It is significant to go into further studies to handle the transfer
effects as well as prerequisite relationships between skills in studies on learning.
Approaches to pairwise assessment of prerequisite relationships between skills were
made by several authors, for example, White and Clark [12], Macready [9], and
Eshima et al. [5]. It is the first attempt that Macready [7] dealt with the transfer
effects in a pairwise assessment of skill acquisition by using latent class models with
equality constraints. In order to improve the model, Eshima et al. [5] proposed a
latent class model structured with skill acquisition and transfer effect parameters for
making pairwise assessments of kill acquisition; however, transfer effect parameters
in the model are common to the related manifest variables. The study on the pairwise
assessment of prerequisite relationships among skills is important to explain learning
structures, and the studies based on latent structure models are left as significant
themes in the future.

References

1. Bergan, J. R. (1980). The structural analysis of behavior: An alternative to the learning-


hierarchy model. Review of Educational Research, 50, 625–646.
2. Coptton, J. W., Gallagher, J. P., & Marshall, S. P. (1977). The identification and decomposition
of hierarchical tasks. American Educational Research Journal, 14, 189–212.
3. Dayton, M., & Macready, G. B. (1976). A probabilistic model for validation of behavioral
hierarchies. Psychometrika, 41, 190–204.
4. Dayton, M., & Macready, G. B. (1980). A scaling model with response errors and intrinsically
unscalable respondents. Psychometrika, 344–356.
5. Eshima, N. (1990). Latent class analysis for explaining a hierarchical learning structure. Journal
of the Japan Statistical Society, 20, 1–12.
References 119

6. Eshima, N., Asano, C., & Tabata, M. (1996). A developmental path model and causal analysis
of latent dichotomous variables. British Journal of Mathematical and Statistical Psychology,
49, 43–56.
7. Eshima, N., Asano, C., & Obana, E. (1990). A latent class model for assessing learning
structures. Behaviormetrika, 28, 23–35.
8. Goodman, L. A. (1975). A new model for scaling response patterns: An application of quasi-
independent concept. Journal of the American Statistical Association, 70, 755–768.
9. Macready, G. B. (1982). The use of latent class models for assessing prerequisite relations and
transference among traits. Psychometrika, 47, 477–488.
10. Magidson, J., & Vermunt, J. K. (2001). Latent class factor and cluster models: Bi-plots, and
related graphical displays. Sociological Methodology, 31, 223–264.
11. Proctor, C. H. (1970). A probabilistic formulation and statistical analysis of Guttman scaling.
Psychometrika, 35, 73–78.
12. White, R. T., & Clark, R. M. (1973). A test of inclusion which allows for errors of measurement.
Psychometrika, 38, 77–86.
Chapter 5
The Latent Markov Chain Model

5.1 Introduction

The Markov chain model is important for describing time-dependent changes of states
in human behavior; however, when observing changes of responses to a question
about a particular characteristic, individuals’ responses to the question may not reflect
the true states of the characteristic. As an extension of the Markov chain model, the
latent Markov chain model was proposed in an unpublished Ph.D. dissertation by
Wiggins L. M. in 1955 [5, 14]. The assumptions of the model are (i) at every observed
time point, a population is divided into several latent states, which are called latent
classes as well in the present chapter; (ii) an individual in the population takes one of
the manifest states according to his or her latent state at the time point; and (iii) the
individual changes the latent states according to a Markov chain. The assumptions
are the same as those of the hidden Markov model in time series analysis [6, 7].
In behavioral sciences, for individuals in a population, responses to questions about
particular characteristics may be observed several times to explain the changes of
responses. In this case, the individuals’ responses to the questions are viewed as the
manifest responses that may not reflect their true states at the observed time points,
that is, intrusion and omission errors have to be taken into consideration. The response
categories to be observed are regarded as the manifest states and the true states of
the characteristics, which are not observed directly, as the latent states. Concerning
parameter estimation in the latent Markov chain model, algebraic methods were
studied by Katz and Proctor [15] and [14]. The methods were given for cases where
the number of manifest states equals that of latent states, so were not able to treat
general cases. Moreover, the methods may derive improper estimates of transition
probabilities, for example, negative estimates of the probabilities, and any method
of assessing the goodness-of-fit of the model to data sets was not given. These
shortages hinder the application of the model to practical research in behavioral
sciences. The Markov chain model is a discrete-time model, and may be an asymptotic
one for the continuous-time model. In most social or behavioral phenomena, an

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 121
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_5
122 5 The Latent Markov Chain Model

individual in a population changes his or her states in continuous time; however, it is


difficult to observe the changes continuously, and so data are collected, for example,
monthly or annually. Thus, it is significant to describe the changes with discrete-time
models, in which changes are treated as if they took place at the observed time points.
Continuous-time models are also important to explain changes in human behaviors,
so discussions for the parameter estimations were made by Singer and Spilerman [2,
16–18], and so on. In the present chapter, the discrete-time model is called the latent
Markov chain model, and the continuous-time one the latent Markov process model,
for convenience.
In the present chapter, the latent Markov models are discussed and an ML estima-
tion procedure is made via the EM algorithm. In Sect. 5.2, the latent Markov chain
model is explained and the relationship between the usual latent class model and
the Markov model is considered. Section 5.3 constructs an ML estimation proce-
dure for the latent Markov chain model via the EM algorithm, and Sect. 5.4 gives
a property of the procedure, preferable for the parameter estimation. In Sects. 5.5
and 5.6, numerical examples are given to demonstrate the ML estimation procedure.
Section 5.7 considers an example of the latent Markov chain model missing mani-
fest observations, and Sect. 5.8 discusses a general model for treating such cases.
In Sect. 5.9, the latent Markov process model with finite manifest and latent states
is considered. Finally, in Sect. 5.10, discussions and themes to be studied through
further research are given.

5.2 The Latent Markov Chain Model

Let X t be manifest variables that take values on sample space mani f est =
{1, 2, . . . , J } at time points t = 1, 2, . . . , and let St be the corresponding latent vari-
ables on sample space latent = {1, 2, . . . , A}. In what follows, states on mani f est
are called manifest states and those on laten latent ones. At time point t, it is assumed
an individual in a population takes a manifest state on mani f est according to his
latent state on latent and he changes the latent states according to a (first-order)
Markov chain St , t = 1, 2, . . . . First, the Markov chain is assumed to be time-
homogeneous, that is, the transition probabilities are independent of time points. Let
m ab , a, b = 1, 2, . . . , A be the transition probabilities; let va , a = 1, 2, . . . , A be
the probabilities of S1 = a, that is, the initial state distribution; and let pa j be the
probabilities of X t = j, given St = a, that is, pa j = P(X t = j|St = a), and let
p(x1 , x2 , . . . , x T ) be the probabilities with which an individual takes manifest state
transition x1 → x2 → · · · → x T . Then, the following accounting equations can be
obtained:

 −1
T
p(x1 , x2 , . . . , x T ) = vs1 ps1 x1 m st st+1 pst+1 xt+1 , (5.1)
s t=1
5.2 The Latent Markov Chain Model 123

Fig. 5.1 Path diagram of the


latent Markov chain model

where the summation in the above equations implies that over all latent states s =
(s1 , s2 , . . . , sT ). The parameters are restricted as


A 
A 
J
va = 1, m ab = 1, pax = 1. (5.2)
a=1 b=1 x=1

The above equations specify the time-homogeneous latent Markov chain model
that is an extension of the Markov chain model, which is expressed by setting A = J
and paa = 1, a = 1, 2, . . . , A. The Markov chain model is expressed as

−1
T
p(x1 , x2 , . . . , x T ) = vx1 m xt xt+1 .
t=1

For the latent Markov chain model, the path diagram of manifest variables X t and
latent variables St is illustrated in Fig. 5.1.
Second, the non-homogeneous model, that is, the latent Markov chain model
with non-stationary transition probabilities, is treated. Let m (t)ab , a, b = 1, 2, . . . , A
be transition probabilities at time point t = 1, 2, . . . , T − 1. Then, the accounting
equations are given by

 −1
T
p(x1 , x2 , . . . , x T ) = vs1 ps1 x1 m (t)st st+1 pst+1 xt+1 , (5.3)
S t=1

where


A 
A 
J
va = 1, m (t)ab = 1, pax = 1. (5.4)
a=1 b=1 x=1

In the above model, it is assumed that the manifest response probabilities pa j are
independent of time points. If the probabilities depend on the observed time points,
the probabilities are expressed as p(t)ax , and then, the above accounting equations
are modified as follows:

 −1
T
p(x1 , x2 , . . . , x T ) = vs1 p(1)s1 x1 m (t)st st+1 p(t+1)st+1 xt+1 , (5.5)
S t=1
124 5 The Latent Markov Chain Model

where


A 
A 
J
va = 1, m (t)ab = 1, p(t)ax = 1. (5.6)
a=1 b=1 x=1

In the above model, set m (t)ab = 0, a = b, then, (5.5) becomes


A 
T
p(x1 , x2 , . . . , x T ) = va p(t)axt . (5.7)
a=1 t=1

In the above expression, regarding variables X t , t = 1, 2, . . . , T as T item


responses formally, then, the above equations are those for the usual latent class
model with A latent classes. In this sense, the latent Markov chain model is an exten-
sion of the usual latent class model. On the contrary, since (5.5) can be reformed
as

 −1
T 
T
p(x1 , x2 , . . . , x T ) = vs1 m (t)st st+1 p(t)st xt , (5.8)
S t=1 t=1

regarding latent state transition patterns in t=1


T
latent =
T
  
latent × latent · · · × latent , from (5.8) the class
T proportions with latent state
−1
transitions s1 → s2 → · · · → sT are given by vs1 t=1 m (t)st st+1 . Hence, the above
discussion derives the following theorem.

Theorem 5.1 The latent class model (1.2) and the latent Markov chain model (5.5)
are equivalent. 

Remark 5.1 The latent Markov chain models treated above have responses to
one question (manifest variable) at each observed time point. Extended versions of
the models can be constructed by introducing a set of questions (a manifest vari-
able vector) X = (X 1 , X 2 , . . . , X I ). For the manifest variable vector, responses are
observed as

x1 → x2 → · · · → xT ,

where

x t = (xt1 , xt2 , . . . , xt I ), t = 1, 2, . . . , T.

Setting T = 1, the above model is the usual latent class model (1.2). Let
pist xti , i = 1, 2, . . . , I be the response probabilities for manifest variables X t =
5.2 The Latent Markov Chain Model 125

I
(X 1 , X 2 , . . . , X I ), given latent state St = st , that is, i=1 pist xti at time points
t = 1, 2, . . . , T . Then, model (5.5) is extended as

−1

 
I T 
I
p(x 1 , x 2 , . . . , x T ) = vs1 pis1 x1i m (t)st st+1 pist+1 xt+1i , (5.9)
s i=1 t=1 i=1

where


A 
A 
J
va = 1, m (t)ab = 1, piax = 1,
a=1 b=1 x=1


and notation s implies the summation over all latent state transitions s =
(s1 , s2 , . . . , sT ). An ML estimation procedure via the EM algorithm can be built
with a method similar to the above ones. The above model is related to a multi-
variate extension of the Latent Markov chain model with covariates by Bartolucci
and Farcomeni [3].

5.3 The ML Estimation of the Latent Markov Chain Model

First, the EM algorithm is considered for model (5.1) with constraints (5.2). Let
n(x1 , x2 , . . . , x T ) be the numbers of individuals who take manifest state transitions
(responses) x → x2 → · · · → x T ; let n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT ) be those with
the manifest state transitions and latent state transitions, s1 → s2 → · · · → sT ; and
let N be the total of the observed individuals. Then,

n(x1 , x2 , . . . , x T ) = n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT )
s ,
N = x n(x1 , x2 , . . . , x T )

T
where s and x imply summations over s = (s1 , s2 , . . . , sT ) ∈ t=1 latent and
T
x = (x1 , x2 , . . . , x T ) ∈ t=1 mani f est , respectively. In this model, the complete and
incomplete data are expressed by sets Dcomplete = {n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT )}
and Dincomplete = {n(x1 , x2 , . . . , x T )}, respectively. Let ϕ = (v a ), (m (t)ab ), ( pax )
be the parameter vector. Then, we have the following log likelihood function of ϕ,
given the complete data:
⎧ ⎫
  ⎨ −1
T 
T ⎬
l ϕ|Dincomplete = n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT ) logvs1 + logm st st+1 + log pst xt ,
⎩ ⎭
x,s t=1 t=1
(5.10)
126 5 The Latent Markov Chain Model


where x,s implies the summation over manifest and latent states transition patterns
x = (s1 , s2 , . . . , sT ) and s = (s1 , s2 , . . . , sT ). The model parameters ϕ are estimated
by the EM algorithm. Let r ϕ = (r va ), (r m ab ), (r pax )) be the estimates at the r th
iteration in the M-step; and let r +1 D complete = r +1 n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT )
be the conditional expectations of the complete data Dcomplete at the r + 1 th iteration
in the E-step. Then, the E- and M-steps are given as follows.
(i) E-step
In this step, the conditional expectation of (5.10) given parameters r ϕ and Dincomplete
is calculated, that is,
  
Q ϕ|r ϕ = E l ϕ|Dcomplete |r ϕ, Dincomplete . (5.11)

Since the complete data are sufficient statistics, the step is reduced to calculating
the conditional expectations of the complete data n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT )
and we have

r v T −1 r m T r
r +1 n(x , x , . . . , x ; s , s , . . . , s ) = n(x , x , . . . , x ) s1 t=1 st st+1 t=1 p st xt
1 2 T 1 2 T 1 2 T  T −1 r T r .
s v s1 t=1 m st st+1 t=1 p st xt
r
(5.12)

(ii) M-step
Function (5.11) is maximized with respect to parameters vs1 , m st st+1 , and pst xt under
constraints (5.2). By using Lagrange multipliers, κ, λa , a = 1, 2, . . . , A and μc , c =
1, 2, . . . , A, the Lagrange function is given by

 
A 
A 
A 
A 
J
L = Q ϕ|r ϕ − κ va − λa m ab − μc pax . (5.13)
a=1 a=1 b=1 c=1 x=1

From the following equations and constraints (5.2)

∂L ∂L
= 0, a = 1, 2, . . . , A; = 0, a, b = 1, 2, . . . , A;
∂va ∂m ab
∂L
= 0, a = 1, 2, . . . , A, x = 1, 2, . . . , J,
∂ pax

we have the following estimates:


r +1
r +1 x,s\1 n(x1 , x2 , . . . , x T ; a, s2 , . . . , sT )
va = , a = 1, 2, . . . , A, (5.14)
N
5.3 The ML Estimation of the Latent Markov Chain Model 127

\1
where x,s\1 implies the summation over all x = (x 1 , x 2 , . . . , x T ) and s =
(s2 , s3 , . . . , sT );
T −1 r +1
r +1 t=1 x,s\t,t+1 n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , st−1 , a, b, st+2 , . . . , sT )
m ab = T −1 ,
r +1 n(x
t=1 x,s\t 1 , x 2 , . . . , x T ; s1 , s2 , . . . st−1 , a, st+1 , . . . , sT )

a, b = 1, 2, . . . , A, (5.15)

\t
where x,s\t implies the summation over all x = (x 1 , x 2 , . . . , x T ) and s =
(s1 , s2 , . . . , st−1 , st+1 , . . . , sT );
T −1 r +1 n x , x , . . . , x 
r +1 p t=1 ,x \t ,s\t 1 2 t−1 , b, x t+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT
ab = T −1
r +1 n x , x , . . . , x

t=1 ,x,,s\t, 1 2 t−1 , x t , x t+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT

a = 1, 2, . . . , A; b = 1, 2, . . . , J, (5.16)


where ,x \t ,s\t implies the summation over all x = (x1 , x2 , . . . , xt−1 , xt+1 , . . . , x T )
and s\t = (s1 , s2 , . . . , st−1 , st+1 , . . . , sT ).

Remark 5.2 In order to estimate the parameters of Markov models explained in


Sect. 5.2, the conditions of model identification need to hold. For model identification
of model (5.1) with constraints (5.2), the following condition is needed:

T he number o f p(x1 , x2 , . . . , x T ) > those o f va , m ab and pax . (5.17)

According to constraints (5.2) and



p(x1 , x2 , . . . , x T ) = 1,
x

constraint (5.17) becomes

J T − 1 > (A − 1) + A(A − 1) + A(J − 1) = A(A + J − 1) − 1

⇔ J T − A(A + J − 1) > 0 (5.18)

Similarly, for the other models, the model identification conditions can be
derived. 
Second, an ML estimation procedure  for model (5.3) with constraints (5.4) is
given. Let r ϕ = r va ), (r m (t)ab
 ), (r
pax ) be the estimates at the r th
 iteration in the M-
step; and let r +1 D complete = r +1 n(x1 , x2 , . . . , x T |s1 , s2 , . . . , sT ) be the conditional
expectations of the complete data Dcomplete at the r + 1 th iteration in the E-step.
Then, the E- and M-steps are given as follows.
128 5 The Latent Markov Chain Model

(i) E-step
r +1
n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT )
T −1 r T r
r
v s1 t=1 m (t)st st+1 t=1 p st xt
= n(x1 , x2 , . . . , x T )  T −1  T
. (5.19)
S v s1
r r r
t=1 m (t)st st+1 t=1 p st xt

(ii) M-step
Estimates r +1 va and r +1 pax are given by (5.14) and (5.16), respectively. We have
r +1
m (t)ab as follows:
r +1
r +1 x,s\t,t+1 n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , st−1 , a, b, st+2 , . . . , sT )
m (t)ab = r +1 n(x
,
x,s\t 1 , x 2 , . . . , x T ; s1 , s2 , . . . st−1 , a, st+1 , . . . , sT )

a, b = 1, 2, . . . , A; t = 1, 2, . . . , T − 1. (5.20)

Finally, the parameter


estimation procedure  for model (5.5) with (5.6) is
constructed. Let r ϕ = r va ), (r m (t)ab ), (r p(t)ax ) be the estimates at the r th itera- 
tion in the M-step; and let r +1 D complete = r +1 n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT ) be
the conditional expectations of the complete data Dcomplete at the r + 1th iteration in
the E-step. The model is an extended version of model (5.3) with (5.4), so the EM
algorithm is presented as follows:
(i) E-step
r +1
n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , sT )
T −1 r T r
r
v s1 t=1 m (t)st st+1 t=1 p (t)st xt
= n(x1 , x2 , . . . , x T ) T −1 r T r . (5.21)
rv
S s1 t=1 m (t)st st+1 t=1 p (t)st xt

(ii) M-step
Estimates r +1 va and r +1 m (t)ab are given by (5.14) and (5.20), respectively.
Estimates r +1 p (t)st xt are calculated as follows:
r +1 n x , x , . . . , x 
r +1 p ,x \t ,s\t 1 2 t−1 , b, x t+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT
(t)ab = ,
r +1 n x , x , . . . , x
,x,,s\t 1 2 t−1 , x t , x t+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT

a = 1, 2, . . . , A; b = 1, 2, . . . , J ; t = 1, 2, . . . , T. (5.22)
5.4 A Property of the ML Estimation Procedure via the EM Algorithm 129

5.4 A Property of the ML Estimation Procedure


via the EM Algorithm

The parameter estimation procedures in the previous section have the following
properties.
Theorem 5.2 In the parameter estimation procedures (5.11)–(5.16) for the time-
homogeneous latent Markov chain model (5.1) with (5.2), if some of the initial trial
values 0 v a , 0 m ab , and 0 pab are set to extreme values 0 or 1, then, the iterative values
are automatically fixed to the values in the algorithm.

Proof Let us set 0 p ab = 0 for given a and b. From (5.12), we have


1
n(x1 , x2 , . . . , xt−1 , b, xt+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT ) = 0.

By using the above values, formula (5.16) derives 1 pab = 0. Hence, inductively
it follows that
t
pab = 0, t = 1, 2, . . . .

In the other cases,


0
pab = 1; 0 va = 0, 1; 0 m ab = 0, 1,

the theorem holds true. 


For the parameter estimation procedure for model (5.3) with (5.4), a similar prop-
erty can be proven. In this sense, the estimation procedures based on the EM algorithm
are convenient for the constraints.
By use of prior information about the phenomena concerned, some of the model
parameters may be fixed to the extreme values, 0 s or 1 s. Especially, it may be
significant to place the constraints on transition matrices. For example, let us consider
a state transition diagram for latent state space latent = {1, 2, 3} shown in Fig. 5.2.
From this figure, in model (5.1) with (5.2), the transition matrix is made as follows:
⎛ ⎞
1 0 0
M = ⎝ 0 m 22 m 23 ⎠.
m 31 m 32 m 33

Fig. 5.2 A state transition


path diagram
130 5 The Latent Markov Chain Model

The above model can be estimated via the procedure mentioned in the previous
section by setting
0
m 12 = 0 m 13 = 0 m 21 = 0.

From Theorem 5.2, the above values are held fixed through iterations, that is,
r
m 12 = r m 13 = r m 21 = 0, r = 1, 2, . . . .

In model (5.1) with (5.2), if we set

A = J, 0 pab = 0, a = b,

the estimation procedure derives the parameter estimates for the time-homogeneous
Markov chain model. In a general model (5.5) with (5.6), if we set
0
m (t)ab = 0, a = b,

and formally identify the states X t at time points t as item responses, the EM algorithm
for the model derives the ML estimates of the usual latent class model.

5.5 Numerical Example I

Table 5.1 shows the data [15] obtained by observing the changes in the configuration
of interpersonal relationships in a group of 25 pupils at three time points: September,
November, and January. They were asked “with whom would you like to sit?”, and
considering the state of each pair of pupils, the state concerned is one of the following
three states: mutual choice, one-way choice, and indifference that are coded as “2”,
“1”, and “0”, respectively. The observation was carried out three times, two-monthly.
In this case, the latent Markov chain model (5.1) or (5.5) can be used. First, the data
are analyzed by use of the latent Markov chain models and the Markov chain models
with three latent classes (states). The results of the analysis are shown in Table 5.2,
and the estimated parameters of Markov models are illustrated in Tables 5.3 and 5.4.
The latent Markov chain models fit the data set well according to the log likelihood
test statistic G 2 ,
 n(x1 , x2 , . . . , x T )
G2 = 2 n(x1 , x2 , . . . , x T )log  ,
x
p (x1 , x2 , . . . , x T )

where p (x1 , x2 , . . . , x T ) are the ML estimates of p(x1 , x2 , . . . , x T ). The above


statistic is asymptotically χ 2 − distributed with degrees of freedom “the number
5.5 Numerical Example I 131

Table 5.1 Data set I


Response Observed Response Observed Response Observed
pattern frequency pattern frequency pattern frequency
SNJ SNJ SNJ
000 197 100 15 200 3
001 20 101 6 201 0
002 0 102 0 202 0
010 12 110 6 210 0
011 9 111 9 211 3
012 1 112 4 212 2
020 0 120 3 220 3
021 0 121 0 221 0
022 1 122 2 222 4
*S: September; N: November; J: January
Source Katz and Proctor [15]

Table 5.2 Results of the analysis of data set I with Markov models
Model G2 df P-val. AIC
Time-homogeneous Markov chain model 27.903 18 0.064 100.369
Time-homogeneous latent Markov chain model 18.641 16 0.288 95.106
Non-homogeneous Markov chain model 17.713 14 0.220 98.179
Non-homogeneous latent Markov chain model 12.565 12 0.401 97.030

of manifest response patterns (x1 , x2 , . . . , x T ) minus 1” minus “the number of esti-


mated parameters”. Based on AIC [1], the time-homogeneous latent Markov chain
model is selected as the most suitable one for explaining the data set, where

AI C = − 2 × (the maximum log likeli hood )


+ 2 × (the number o f estimated parameter s).

5.6 Numerical Example II

Table 5.5 shows an artificial data set, and binary variables X ti , t = 1, 2 imply mani-
fest variables for the same items i = 1, 2, 3, where variables X ti , i = 1, 2, 3 are
indicators of latent variables St , t = 1, 2, assuming the three questions are asked to
the same individuals at two time points. All variables are binary, so the states are
denoted as 1 and 0. The data sets are made in order to demonstrate the estimation
132 5 The Latent Markov Chain Model

Table 5.3 Estimated parameters in the time-homogeneous Markov models


Markov chaim model Latent Markov chain model
Initial distribution Initial dstribution
State Latent state
0 1 2 0 1 2
0.800 0.150 0.050 0.816 0.122 0.062
Transition matrix Transition matrix
State Latent state
State 0 1 2 Latent State 0 1 2
0 0.898 0.100 0.002 0 0.957 0.043 0.000
1 0.426 0.440 0.134 1 0.116 0.743 0.141
2 0.321 0.179 0.500 2 0.279 0.101 0.621
Latent response probability Latent response probability
Latent State 0 1 2 Latent State 0 1 2
0 1* 0* 0* 0 0.952 0.043 0.000
1 0* 1* 0* 1 0.208 0.792 0.000
2 0* 0* 1* 2 0.000 0.135 0.815
* The numbers 0 and 1 are fixed

Table 5.4 Estimated parameters in the non-time-homogeneous Markov models


Markov chaim model Latent Markov chain model
Initial distribution Initial distribution
State Latent state
0 1 2 0 1 2
0.800 0.150 0.050 0.823 0.127 0.050
Transition matrix Transition matrix
State Latent state
State 0 1 2 Latent State 0 1 2
Sept. 0 0.904 0.092 0.004 0 0.959 0.040 0.001
to 1 0.467 0.422 0.111 1 0.225 0.625 0.146
Nov. 2 0.200 0.333 0.467 2 0.191 0.343 0.467
Nov. 0 0.892 0.108 0.000 0 0.942 0.058 0.000
to 1 0.391 0.457 0.152 1 0.138 0.681 0.181
Jan. 2 0.462 0.000 0.538 2 0.462 0.000 0.538
Latent response Latent response
probability probability
Latent State 0 1 2 Latent State 0 1 2
0 1* 0* 0* 0 0.953 0.047 0.000
1 0* 1* 0* 1 0.121 0.879 0.000
2 0* 0* 1* 2 0.000 0.000 1.000
5.6 Numerical Example II 133

Table 5.5 An artificial data set for a longitudinal observation


Time point Time point
1 2 1 2
X 11 X 12 X 13 X 21 X 22 X 23 Freq X 11 X 12 X 13 X 21 X 22 X 23 Freq
0 0 0 0 0 0 14 0 0 0 0 0 1 27
1 0 0 0 0 0 13 1 0 0 0 0 1 12
0 1 0 0 0 0 5 0 1 0 0 0 1 6
1 1 0 0 0 0 26 1 1 0 0 0 1 30
0 0 1 0 0 0 25 0 0 1 0 0 1 51
1 0 1 0 0 0 5 1 0 1 0 0 1 12
0 1 1 0 0 0 2 0 1 1 0 0 1 2
1 1 1 0 0 0 4 1 1 1 0 0 1 0
0 0 0 1 0 0 16 0 0 0 1 0 1 4
1 0 0 1 0 0 25 1 0 0 1 0 1 8
0 1 0 1 0 0 11 0 1 0 1 0 1 1
1 1 0 1 0 0 75 1 1 0 1 0 1 14
0 0 1 1 0 0 22 0 0 1 1 0 1 17
1 0 1 1 0 0 6 1 0 1 1 0 1 3
0 1 1 1 0 0 3 0 1 1 1 0 1 0
1 1 1 1 0 0 3 1 1 1 1 0 1 0
0 0 0 0 1 0 9 0 0 0 0 1 1 2
1 0 0 0 1 0 15 1 0 0 0 1 1 3
0 1 0 0 1 0 6 0 1 0 0 1 1 0
1 1 0 0 1 0 31 1 1 0 0 1 1 6
0 0 1 0 1 0 12 0 0 1 0 1 1 4
1 0 1 0 1 0 3 1 0 1 0 1 1 0
0 1 1 0 1 0 1 0 1 1 0 1 1 0
1 1 1 0 1 0 0 1 1 1 0 1 1 0
0 0 0 1 1 0 38 0 0 0 1 1 1 0
1 0 0 1 1 0 86 1 0 0 1 1 1 4
0 1 0 1 1 0 33 0 1 0 1 1 1 2
1 1 0 1 1 0 191 1 1 0 1 1 1 16
0 0 1 1 1 0 51 0 0 1 1 1 1 5
1 0 1 1 1 0 21 1 0 1 1 1 1 3
0 1 1 1 1 0 6 0 1 1 1 1 1 0
1 1 1 1 1 0 10 1 1 1 1 1 1 0
134 5 The Latent Markov Chain Model

Table 5.6 The estimated parameters from data set II


Initial distribution Transition matrix
Latent state Latent state
1 0 Latent state 1 0
0.644 0.356 1 0.854 0.146
0 0.492 0.508
Response probability for manifest variable
Latent State X t1 X t2 X t3
1 0.858 0.736 0.054
0 0.197 0.055 0.681
Log likelihood ration statistic G 2 = 46.19, d f = 54, P = 0.766.

procedure for model (5.9) with two latent classes A = 2 and the number of obser-
vation time points T = 2. Since the ML estimation procedure via the EM algorithm
can be constructed as in Sect. 5.3, the details are left for readers. The results of
the parameter estimation are given in Table 5.6. According to the transition matrix,
latent state “1” may be interpreted as a conservative one, and latent state 2 a less
conservative one. In effect, the latent state distribution at the second time point is
calculated by
 
 0.854 0.146 
0.644 0.356 = 0.725 0.275 ,
0.492 0.508

and it implies that the individuals with the first latent state are increased. If necessary,
the distributions of St , t ≥ 3 are calculated by
 
 0.854 0.146 t−1
0.644 0.356 , t = 3, 4, . . . .
0.492 0.508

5.7 A Latent Markov Chain Model with Missing Manifest


Observations

Before constructing a more general model, a data set given in Bye and Schechter [8] is
discussed. Table 5.7 illustrates the data from Social Security Administration services,
and the individuals were assessed as severe or not severe with respect to the extent
of work limitations. The observations were made in 1971, 1972, and 1974, where
response “severe” is represented as “1” and “not severe” “0”. The interval between
1972 and 1974 is two years and that between 1971 and 1972 is one year, that is,
5.7 A Latent Markov Chain Model with Missing Manifest Observations 135

the time interval between 1972 and 1974 is twice as long as that between 1971
and 1972. When applying the Markov model to Data Set II, it is valid to assume
that the observation in 1973 was missed, though the changes of latent states took
place, that is. the transitions of manifest and latent states are X 1 → X 2 → X 3 and
S1 → S2 → U → S3 , respectively. Thus, the joint state transition can be expressed
by

(X 1 , S1 ) → (X 2 , S2 ) → U → (X 3 , S3 ).

In order to analyze the data set, a more general model was proposed by Bye
and Schechter [8]. By using the notations in the previous section, for the time-
homogeneous latent Markov chain model, the accounting equations are given by


A 
3
p(x1 , x2 , x3 ) = vs1 m s1 s2 m s2 u m us3 pst xt , (5.23)
S u=1 t=1

where s = (s1 , s2 , s3 ). The parameter estimation procedure via the EM algorithm


(5.11–5.16) for the time-homogeneous Markov chain model is modified as follows:
(i) E-step
r +1
n(x1 , x2 , x3 ; s1 , s2 , s3 ; u)
r
v s r m s1 s2 r m s2 u r m us3 r p s1 x1 r p s2 x2 r p s3 x3
= n(x1 , x2 , x4 ) A 1 . (5.24)
u=1 v s1 m s1 s2 m s2 u m us3 p s1 x1 p s2 x2 p s3 x3
r r r r r r r
s

(ii) M-step
A r +1
r +1 x,s\1 u=1 n(x1 , x2 , x3 ; a, s2 , s3 ; u)
va = , a = 1, 2, . . . , A; (5.25)
N
r +1 p
ab
T −1 Ar +1 n x , x , . . . , x 
t=1 ,x \t ,s\t u=1 1 2 t−1 , b, x t+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT ; u
= 
T −1 A r +1
n x1 , x2 , . . . , xt−1 , xt , xt+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT ; u
t=1 ,x,,s\t, u=1

T = 3, a = 1, 2, . . . , A; b = 1, 2, . . . , J, (5.26)

r +1 Dab + E ab + Fab
m ab = A , (5.27)
b=1 (Dab + E ab + Fab )

where

 
A
r +1
Dab = n(x1 , x2 , x3 ; a, b, s3 : u),
x,s\1,2 u=1
136 5 The Latent Markov Chain Model

Table 5.7 Data set II


Response Observed Response Observed
pattern frequency pattern frequency
t1 t2 t3 t1 t2 t3
000 145 100 39
001 47 101 34
010 18 110 41
011 45 111 219
1 : 1971; t2 : 1972; t3 : 1974
*t

Source Bye and Schechter [8]


r +1
E ab = n(x1 , x2 , x3 ; s1 , a, s3 ; b),
x,s\2

r +1
Fab = n(x1 , x2 , x3 ; s1 , s2 , b; a).
x,s\3

For the data set, model (5.23) is used for A = 2, J = 2 and the ML estimates
of the parameters are obtained with the above procedure. Testing the goodness-of-fit
of the model to the data set, we have G 2 = 4.754, d f = 2, and P = 0.093 and it
implies the goodness-of-fit of the model to the data set is fair. The estimates of the
parameters are given in Table 5.8.

Remark 5.3 Bye and Schechter [8] uses the Newton method to obtain the ML
estimates of the parameters in (5.23) for A = J = 2. In order to keep the following
constraints

Table 5.8 The ML estimates


Latent Markov chain model
of the parameters in model
(5.23) Initial distribution
Latent state
0 1
0.430 0.570
Transition matrix
Latent state
Latent state 0 1
0 0.914 0.086
1 0.047 0.953
Latent Response probability
Latent state 0 1
0 0.897 0.103
1 0.097 0.903
5.7 A Latent Markov Chain Model with Missing Manifest Observations 137

0 < va < 1; 0 < m a1 < 1, ; 0 < pai < 1, a = 1, 2, i = 0.1, (5.28)

the following parameter transformation was employed:

1 exp(β)
v1 = , v2 = ;
1 + exp(β) 1 + exp(β)
1 exp(δa )
m a1 = , m a2 = , a = 1, 2,
1 + exp(δa ) 1 + exp(δa )
1 exp(εa )
pa0 = , pa1 = , a = 1, 2.
1 + exp(εa ) 1 + exp(εa )

However, the above transformation is model-specific, that is, (5.23) for A = J =


2, and a general transformation for multicategories A > 2 and/or J > 2 makes the
ML estimation to be complicated. In contrast to it, the EM method such as (5.22–5.27)
can be used easily, and the estimates always satisfy constraints

0 < va < 1; 0 < m ab < 1, ; 0 < pa j < 1, a = 1, 2, . . . , A; j = 1, 2, . . . , J.


(5.29)

In this respect, the EM method is superior to the Newton–Raphson method with


the above parameter transformation.

5.8 A General Version of the Latent Markov Chain Model


with Missing Manifest Observations

The time-homogeneous latent Markov chain model mentioned in the previous section
[8] is extended to a general one. For observed time points ti , i = 1, 2, . . . , T , manifest
responses X i , i = 1, 2, . . . , T are observed, and the responses depend on latent states
Si at time points, where time points ti are assumed integer, such that

t1 < t2 < . . . < t T (5.30)

If interval ti+1 − ti > 1, there are ti+1 − ti − 1 time points (integers) between ti
and ti+1 , and at the time points the latent states u i j , j = 1, 2, . . . , ti+1 − ti − 1 are
changed as follows:

(si →)u i1 → u i2 → · · · → u i h i (→ si+1 ), h i = ti+1 − ti − 1, (5.31)

whereas the manifest states are not observed at the time points when the above
sequences of latent states take place, for example, in Data Set II, it is assumed u 21
138 5 The Latent Markov Chain Model

Fig. 5.3 Path diagram of latent Markov chaim model (5.33)

would occur at time point 1973. The above chain (5.31) is denoted as u i j , i =
1, 2, . . . , T − 1 for short. Then, the changes of latent states are expressed as

s1 → u 1 j  → s2 → u 2 j  → · · · → u T −1, j  → sT . (5.32)

The manifest variables X t are observed with latent state St and the responses
depend on the latent states at time points t, t = 1, 2, . . . , T . This model is
depicted in Fig. 5.3. It is assumed that sequence (5.32) with (5.31) is distributed 
according
to a time-homogeneous Markov chain with transition  matrix m i j . Let
p x1 , x2 , . . . , x T ; s1 , , s2 , , . . . , , sT ; u 1 j , u 2 j , . . . , u T −1, j  be the joint proba-
bilities of manifest responses (x1 , x2 , . . . , x T ) and latent state transition (5.32). Then,
we have

p x1 , x2 , . . . , x T ; s1 , , s2 , , . . . , , sT ; u 1 j , u 2 j , . . . , u T −1, j 
⎛ ⎞
  T T−1 ht −1

= vs1 psi xi × ⎝m st u t1 m u t,h st+1 m u t j u t,i+1 ⎠, (5.33)


i
S i=1 t=1 j=1

where
t −1
h 
m st u t1 m u t1 st+1 (h t = 1)
m st u t1 m u t,hi st+1 m u t j u t,i+1 = . (5.34)
m st st+1 (h t = 0)
j=1

In repeated measurements, the time units are various, for example, day, week,
month, and year. Although the observations are planned to make at regular intervals,
there may be cases where the practices are not carried out. On such occasions., the
above model may be feasible to apply to the cases. The ML estimation procedure
via the EM algorithm can be constructed by extending (5.24) to (5.27).

5.9 The Latent Markov Process Model

Human behavior or responses takes place continuously in time; however, our obser-
vation is made in discrete time points, for example, daily, weekly, monthly, and so on.
In this section, the change of latent states in latent is assumed to occur in a continuous
5.9 The Latent Markov Process Model 139

time. Before constructing the model, the Markov process model is briefly reviewed
[13]. It is assumed that an individual in a population takes decisions to change states
in time interval (0, t) according to a Poisson distribution with mean λt, where λ > 0,
and that
state
 changes are taken place with a Markov chain with transition matrix
Q = qi j . Let ti , i = 1, 2, . . . be the decision time points that are taken place, and
let S(t) be a latent state at time point t. Then, given the time points ti , i = 1, 2, . . . ,
the following sequence is distributed according to the Markov chain with transition
matrix Q:

S(t1 ) → S(t2 ) → · · · → S(tn ) → . . . .



The process is depicted in Fig. 5.4. Let P(t) = pi j (t) be the transition matrix
of Markov process S(t) on state space latent at time point t. Then, we have

 (λt)n n
M(t) = e−λt Q , (5.35)
n=0
n!

where for J × J identity matrix E, we set Q 0 = E. By differentiating the above


matrix function with respect to time t, it follows that

∞ ∞
−λt (λt) (λt)n−1 n
n
d
M(t) = −λ e Q + nλ
n
e−λt Q
dt n=0
n! n=0
n!

 (λt)n n
= −λM(t) + λ Q e−λt Q
n=0
n!
= −λM(t) + λ Q M(t) = λ( Q − E)M(t).

Setting R = λ( Q − E), we have the following differential equation:

d
M(t) = R M(t).
dt
From the above equation, given the initial condition P(t) = E, we get

M(t) = exp(Rt), (5.36)

Fig. 5.4 Decision time points and latent state transitions


140 5 The Latent Markov Chain Model

where for square matrix B, we set

∞
1 n
exp(B) ≡ B ,
n=0
n!


where 0!1 B 0 ≡ E. In (5.36), matrix R = ri j is called a generator matrix and, from
the definition, the following constraints hold:


J
rii ≤ 0; ri j ≥ 0, i = j; ri j = 0. (5.37)
j=1

From (5.35), for t, u > 0, we also have

M(t + u) = exp(R(t + u)) = exp(t R)exp(u R) = M(t)M(u). (5.38)

Especially, for integer k and t = k t, from (5.38), it follows that

M(k t) = M( t)k . (5.39)

From the above equation, if we observe a change of states at every time interval
t (Fig. 5.5), the following sequence is the Markov chain with transition matrix
M( t):

S( t) → S(2 t) → · · · → S(k t) → . . . .

Considering the above basic discussion, the latent Markov process model is
constructed. Let ti , i = 1, 2, . . . , K be time points to observe manifest states X (ti )
on finite state space mani f est ; let S(t) be the latentMarkov process with generator
matrix R with constraints (5.37); let M(t) = m (t)i j be the transition matrix at time
point t; and let psx be the probabilities of X (ti ) = s, given S(ti ) = s. For simplicity
of the notation, given the time points, we use the following notation:

X i = X (ti ); Si = S(ti ), i = 1, 2, . . . , K .

Fig. 5.5 Markov process due to observations at equal time intervals


5.9 The Latent Markov Process Model 141

Fig. 5.6 The latent Markov process due to observations at any time interval

Then, by using similar notations as for the latent Markov chain modes, the
following accounting equations can be obtained:

 
K −1 
K
p(x1 , x2 , . . . , x T ) = vs1 m (ti+1 −ti )st st+1 pst xt . (5.40)
S i=1 j=1

The above model (X (t), S(t)) is called the latent Markov process model in the
present chapter. The process is illustrated in Fig. 5.6.
In order to estimate the model parameters va , ri j , and pab , the estimation procedure
in the previous section may be used, because it is complicated to make a procedure for
getting the estimates of ri j directly, that is, parameters ri j are elements of generator
matrix R in (5.36). Usually, repeated observations of state changes are carried out
in the intervals with time units, for example, daily, weekly, monthly, and so on, as
in Table 5.7, so such
time
 points can be viewed as integers. From this, for transition
matrix M(t) = m (t)i j , we have

M(ti+1 − ti ) = exp(R(ti+1 − ti )) = M(1)ti+1 −ti . (5.41)

In order to simplify the notation, setting M(1) ≡ M = (m ab ), the same treatment


of the model as in Sect. 5.7 may be conducted. If the estimates of transition prob-
abilities m ab can be available, we will estimate generator matrix R by the formal
inversion:

 (−1)n−1
R = logM = (M − E)n , (5.42)
n=1
n

where E is the identity matrix, whereas it is an important question whether there


exists a Markov process with the transition matrix [2]. The problem is called that of
Embeddability. Singer and Spilerman [17] gave the following necessary condition
for obtaining the generator matrix.
Theorem 5.3 If the eigenvalues of transition matrix M are positive and distinct,
any solution of M = exp(R) is unique. 
Remark 5.4 In the above theorem, for A × A matrix M, let ρa , a = 1, 2, . . . , A
be the positive and distinct eigenvalues. Then, there exists a non-singular matrix C,
and the transition matrix is expressed by
142 5 The Latent Markov Chain Model

M = C DC −1 ,

where
⎛ ⎞
ρ1 0 ··· 0
⎜ 0 ρ2 · · · 0 ⎟
⎜ ⎟
D=⎜ . .. .. ⎟.
⎝ .. . ··· . ⎠
0 0 · · · ρA

From this, we can get


⎛ ⎞
logρ1 0 ··· 0
⎜ 0 logρ2 ··· 0 ⎟
⎜ ⎟ −1
R = logC DC −1 = C ⎜ ..
.. . ⎟C . (5.43)
⎝ .. · · · .. ⎠
00 · · · logρ A

The result is the same as calculated by (5.42). 


For the number of latent states A = 2, the following transition matrix is
considered:
 
m 11 m 12
M= . (5.44)
m 21 m 22

Under the condition m a1 + m a2 = 1, a = 1, 2, the characteristic equation is given


by

(x − 1)(x − m 11 − m 22 + 1) = 0.

From the equation, we have two eigenvalues 1 and m 11 + m 22 − 1. If

(2 >)m 11 + m 22 > 1, (5.45)

from (5.42), the matrix equation

M = exp(R)

is solved via (5.43). For transition matrix (5.44), we have


   −1
1 m 12 1 0 1 m 12
M= .
1 −m 21 0 m 11 + m 22 − 1 1 −m 21

It follows that
5.9 The Latent Markov Process Model 143

      −1
r r 1 m 12 0 0 1 m 12
R = 11 12 =
r21 r22 1 −m 21 0 log(m 11 + m 22 − 1) 1 −m 21
 
m 12 log(m 11 + m 22 − 1) −m 12 log(m 11 + m 22 − 1)
= .
−m 21 log(m 11 + m 22 − 1) m 21 log(m 11 + m 22 − 1)
(5.46)

From (5.45), since 1 > m 11 + m 22 − 1 > 0, we see

m 12 log(m 11 + m 22 − 1) < 0, m 21 log(m 11 + m 22 − 1) < 0,

and the condition for the generator matrix (5.37) is met for A = 2. Thus, the tran-
sition matrix (5.44) is embeddable under the condition (5.45). Applying the above
discussion to Table 5.8, since
 

m 11 + m 22 = 0.957 + 0.743 = 1.700 > 1,

condition (5.45) is satisfied by the estimated transition matrix. From Theorem 5.3,
there exists a unique generator matrix in equation M = exp(R) and then, by using
(5.43), we have
 

−0.085 0.085
R= .
0.046 −0.046

According to the above generator matrix, the transition matrix can be calculated
at any time point by (5.36), for example, we have
   
0.875 0.125 0.806 0.194
M(1.5) = , M(2.5) = ,
0.068 0.932 0.106 0.894
   
0.746 0.254 0.694 0.306
M(3.5) = , M(4.5) = ,...,
0.139 0.861 0.167 0.833
 
0.353 0.647
M(∞) = .
0.353 0.647

Next, for the following transition matrix with three latent states:
⎛ ⎞
m 11 m 12 m 13
M = ⎝ m 21 m 22 m 23 ⎠,
m 31 m 32 m 33

the characteristic function is calculated as


144 5 The Latent Markov Chain Model
⎛ ⎞
m 11 − x m 12 m 13
det(M − x E) = det ⎝ m 21 m 22 − x m 23 ⎠
m 31 m 32 m 33 − x
⎛ ⎞
1 m 12 m 13
= (1 − x)det ⎝ 1 m 22 − x m 23 ⎠
1 m 32 m 33 − x
⎛ ⎞
1 0 0
= (1 − x)det ⎝ 1 m 22 − m 12 − x m 23 − m 13 ⎠
1 m 32 − m 12 m 33 − m 13 − x
 
m 22 − m 12 − x m 23 − m 13
= (1 − x)det = 0.
m 32 − m 12 m 33 − m 13 − x

Setting
   
g11 g12 m 22 − m 12 m 23 − m 13
= , (5.47)
g21 g22 m 32 − m 12 m 33 − m 13

if

⎨ (g11 − 1)(g22 − 1) − g12 g21 = 0,
g11 g22 − g12 g21 > 0, (5.48)

g11 + g22 > 0,

from Theorem 5.3, the above matrix has a unique matrix R such that M = exp(R).

Remark 5.5 Condition (5.48) does not always imply the matrix obtained with (5.42)
is the generator matrix of a Markov process. 

The above discussion is applied to the estimated transition matrices of the


time-homogeneous Markov models in Table 5.3. For the Markov chain model, the
estimated transition matrix is
⎛ ⎞

0.898 0.100 0.002
M = ⎝ 0.426 0.440 0.134 ⎠.
0.321 0.179 0.500

From (5.47), we have


   
g11 g12 0.340 0.132
= ,
g21 g22 0.079 0.498
5.9 The Latent Markov Process Model 145

and from (5.48) the sufficient condition for getting the unique solution R of M =
exp(R) is checked as follows:

⎨ (g11 − 1)(g22 − 1) − g12 g21 = 0.321,
g11 g22 − g12 g21 = 0.159,

g11 + g22 = 0.838.

From this, the three conditions in (5.48) are met. In effect, we can get the generator
matrix by (5.43) as follows:
⎛ ⎞⎛ ⎞⎛ ⎞−1

1 0.145 0.138 log1 0 0 1 0.145 0.138
R = ⎝ 1 −0.491 −0.848 ⎠⎝ 0 log0.548 0 ⎠⎝ 1 −0.491 −0.848 ⎠
1 −0.859 0.512 0 0 log290 1 −0.859 0.512
⎛ ⎞
−0.148 0.166 −0.018
= ⎝ 0.641 −0.949 0.307 ⎠.
0.382 0.361 −0.743

However, the above estimate of the generator matrix is improper, because the


condition in (5.37) is not met, that is, r 13 = −0.018 < 0. For the latent Markov
chain model in Table 5.3, the transition matrix is estimated as
⎛ ⎞

0.957 0.043 0.000
M = ⎝ 0.116 0.743 0.141 ⎠.
0.279 0.101 0.621

The eigenvalues of the above transition matrix are distinct and positive. Through
a similar discussion above, we have the estimate of the generator matrix of the latent
Markov chain model as
⎛ ⎞

−0.046 0.051 −0.005
R = ⎝ 0.104 −0.314 0.210 ⎠.
0.353 0.140 −0.493

However, this case also gives an improper estimate of the generator. Although
the ML estimation procedure for the continuous-time mover-stayer model is compli-
cated, Cook et al. [9] proposed a generalized version of the model, and gave an ML
estimation procedure for the model.
146 5 The Latent Markov Chain Model

5.10 Discussion

This chapter has considered the latent Markov chain models for explaining changes
of latent states in time. The ML estimation procedures of the models have been
constructed via the EM algorithm, and the methods are demonstrated by using numer-
ical examples. As in Chap. 2, the latent states in the latent Markov chain model are
treated parallelly, so in this sense, the analysis can be viewed as latent class cluster
analysis as well, though the latent Markov analysis is a natural extension of the
latent class analysis. In confirmatory contexts as discussed in Sect. 5.4, as shown in
Theorem 5.2, the ML estimation procedures are flexible to handle the constraints for
setting some of the model parameters to the extreme values 0 s or 1 s. As in Chaps. 3
and 4, it is important to make the latent Markov models structured for measurement
of latent states in an ability or trait, in which logit models are effective to express
the effects of latent states [3]. For the structure of latent state transition, logit models
with the effects of covariates have been applied to the initial distribution and the
transition matrices in the latent Markov chain model [19], and the extensions have
also been studied by Bartolucci et al. [4], Bartolucci and Pennoni [5], and Bartolucci
and Farcomeni [3]. Such approaches may link to path analysis with generalized
linear models ([10–12], Chap. 6), and further studies for extending latent Markov
approaches will be expected.

References

1. Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317–332.


2. Bartholomew, D. J. (1983). Some recent development in social statistics. International
Statistical Review, 51, 1–9.
3. Bartolucci, F., & Farcomeni, A. (2009). A multivariate extension of the dynamic logit model
for longitudinal data based on a latent Markov heterogeneity structure. Journal of the American
Statistical Association, 104, 816–831.
4. Bartolucci, F., Pennoni, F., & Francis, B. (2007). A latent Markov model for detecting patterns
of criminal activity. Journal of the Royal Statistical Society, A, 170, 115–132.
5. Bartolucci, F., Farcomeni, A., & Pennoni, F. (2014). Latent Markov models: A review of a
general framework for the analysis of longitudinal data with covariates. TEST, 23, 433–465.
6. Baum, L., & Petrie, T. (1966). Statistical inference for probabilistic functions of finite state
Markov chains. Annals of Mathematical Statistics, 37, 1554–1563.
7. Baum, L., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in
the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical
Statistics, 41, 164–171.
8. Bye, B. V., & Schechter, E. S. (1986). A latent Markov model approach to the estimation of
response error in multiway panel data. Journal of the American Statistical Association, 51,
702–704.
9. Cook, R. J., Kalbfleisch, J. D., & Yi, G. Y. (2002). A generalized mover-stayer model for panel
data. Biostatistics, 3, 407–420.
10. Eshima, N. (2020). Statistical data analysis and entropy. Springer.
11. Eshima, N., Tabata, M., Borroni, C. G., & Kano, Y. (2018). An entropy-based approach to path
analysis of structural generalized linear models: A basic approach. Entropy, 17, 5117–5132.
References 147

12. Eshima, N., Tabata, M., & Zhi, G. (2001). Path analysis with logistic regression models:
Effect analysis of fully recursive causal systems of categorical variables. Journal of the Japan
Statistical Society, 31, 1–14.
13. Hatori, H., & Mori, T. (1993). Finite Markov Chains, Faifukan: Tokyo (in Japanese).
14. Lazarsfeld, P. F., & Henry, N. M. (1968). Latent structure analysis. Houghton Mifflin.
15. Katz, L., & Proctor, C. (1959). The concept of configuration of interpersonal relation in a group
as a time-dependent stochastic process. Psychometrika, 24, 317–327.
16. Singer, B., & Spilerman, S. (1975). Identifying structural parameters of social processes using
fragmentary data. Bulletin of International Statistical Institute, 46, 681–697.
17. Singer, B., & Spilerman, S. (1976). The representation of social processes by Markov models.
American Journal of Sociology, 82, 1–54.
18. Singer, B., & Spilerman, S. (1977). Fitting stochastic models to longitudinal survey data—some
examples in the social sciences. Bulletin of International Statistical Institute, 47, 283–300.
19. Vermunt, J. K., Langeheine, R., & Bockenholt, U. (1999). Discrete-time discrete-state latent
Markov models with time-constant and time-varying covariates. Journal of Educational and
Behavioral Statistics, 24, 179–207.
Chapter 6
The Mixed Latent Markov Chain Model

6.1 Introduction

As a model that explains time-dependent human behavior, the Markov chain has
been applied in various scientific fields [1, 2, 6, 7]. When employing the model for
describing human behavior, it may be usually assumed that every individual in a
population changes his or her states at any observational time point according to
the same low of probability, as an approximation; however, there are cases where
the population is not homogeneous, and it makes the analysis of human response
processes to be complicated [9]. In order to overcome the heterogeneity of the popu-
lation, it is valid to consider the population is divided into subpopulations that depend
on the Markov chains of their own. In order to analyze the heterogeneous popula-
tion, Blumen et al. [8] proposed the mover-stayer model, in which the population
is divided into two subpopulations of, what we call, “movers” and “stayers”. The
movers change their states according to a Markov chain and the stayers do not change
their states from the initial observed time points. Human behavior, which is observed
repeatedly, is more complicated than the mover-stayer model. An extended version
of the mover-stayer model is the mixed Markov chain model that was introduced
first by Poulsen, C. S. in 1982 in his Ph. D. dissertation, though the work was not
officially published [18, 19]. Eshima et al. [12, 13], Bye & Schechter [10], Van de
Pol & de Leeuw [20], and Poulsen [17] also discussed similar topics. Van de Pol [18]
proposed the mixed latent Markov chain model as an extension of the mixed Markov
chain model. Figure 1 shows the relation of the above Markov models, in which the
arrows indicate the natural directions of extension.
Following Chap. 5, this chapter provides a discussion of dynamic latent structure
analysis within a framework of the latent Markov chain model [11]. In Sect. 6.2,
dynamic latent structure models depicted in Fig. 6.1 are briefly reviewed, and the
equivalence of the latent Markov chain model and the mixed latent Markov chain
model is shown. Section 6.3 discusses the ML estimation procedure for the models via
the EM algorithm in relation to that for the latent Markov chain model. In Sect. 6.4,

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 149
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_6
150 6 The Mixed Latent Markov Chain Model

The Markov chain model

The mover-stayer model

The Latent Markov chain model


The mixed Markov chain model

The mixed latent Markov chain model

Fig. 6.1 Relation of Markov models. *The directions expressed by the arrows imply the extensions
of the related models

a numerical example is given to demonstrate the method for the ML estimation.


Finally, Sect. 6.5 briefly reviews advanced studies for the mixed Markov modeling
to link to further research.

6.2 Dynamic Latent Class Models

In the present section, the mover-stayer model, the mixed Markov chain model, and
the mixed latent Markov chain model are reviewed.
(i) The mover-stayer model
Let  = {1, 2, . . . , A} be the manifest state space of the Markov chain. It is assumed
that a population is divided into two types of individuals, movers and stayers, and
the proportions are set as λ1 and λ2 , respectively, where

λ1 + λ2 = 1.

Let va be probabilities that take the initial states a of the mover and let wa be
those of the stayer at the first time point, a = 1, 2, . . . , A. Then,


A 
A
va = wa = 1.
a=1 a=1

Suppose that the mover changes the states according to a time-homogeneous


Markov chain. Let m ab be the transition probabilities from state a at time point t to
state b at time point t + 1, and let p(x1 , x2 , . . . , x T ) be the probabilities of manifest
state transition, x1 → x2 → · · · → x T . Then, we have the following accounting
equations:
6.2 Dynamic Latent Class Models 151

 −1
T  −1
T
p(x1 , x2 , . . . x T ) = λ1 vx1 m xt xt+1 +λ2 wx1 δxt xt+1 , (6.1)
s t=1 s t=1

where the above summation is made over all response patterns s = (s1 , s2 , . . . , sT )
and

1 xt = xt+1
δxt xt+1 = .
0 xt = xt+1

Remark 6.1 When the Markov chain in (6.1) is time-dependent, the transition
probabilities for movers m xt xt+1 are substituted for m (t)xt xt+1 , t = 1, 2, . . . , T − 1.

(ii) The mixed Markov chain model


It is assumed that a population is divided into K subpopulations that depend on time-
homogeneous Markov chains of their own. Let ψk be the proportions of subpopu-
lations k = 1, 2, . . . , K ; vka , a = 1, 2, . . . , A be the initial state distributions of
Markov chain k, and let m kxt xt+1 be transition probabilities from manifest state xt at
time point t to xt+1 at time point t +1. The manifest state space is  = {1, 2, . . . , A}.
Then, the accounting equations are given by


K −1
T
p(x1 , x2 , . . . , x T ) = ψk vkx1 m kxt xt+1 , (6.2)
k=1 t=1

where


K 
A 
A
ψk = 1, vka = 1, m kab = 1.
k=1 a=1 b=1

For K = 2, setting

1a=b
m 2ab = ,
0 a = b

model (6.2) expresses the mover-stayer model (6.1).


(iii) The mixed latent Markov chain model
Let latent variables St , t = 1, 2, . . . , T be Markov chains on  = {1, 2, . . . , C}
in subpopulation k = 1, 2, . . . , K ; let X t , t = 1, 2, . . . , T be manifest variables
on state space  = {1, 2, . . . , A}, and let pkskt xt be the conditional probability of
X t = xt given latent state St = st in subpopulation k at time point t. Then, the
response probabilities are expressed as follows:
152 6 The Mixed Latent Markov Chain Model

Fig. 6.2 Path diagram of the


mixed latent Markov chain
model (6.3)

  −1
T
K
 
p(x1 , x2 , . . . , x T ) = ψk vkx1 pks1 x1 pks,t+1 xt+1 m kst st+1 , (6.3)
k=1 s t=1

where


K 
B 
A 
B
ψk = 1, vkb = 1, pkba = 1, m kbc = 1. (6.4)
k=1 b=1 a=1 c=1

For A = B and pkba = δab , where δab is the Kronecker delta, (6.3) expresses
(6.2), and setting K = 1, model (6.3) becomes the latent Markov chain model (5.1).

Remark 6.2 When manifest response probabilities and transition ones are depen-
dent on observed time points t, pkab and m kab are replaced by p(t)kab and m (t)kab ,
respectively. 

Let U = k ∈ {1, 2, . . . , K } be a categorical latent variable that expresses subpop-


ulations depending on the latent Markov chains with the initial state distributions
(vka ) and the transition matrices (m kab ). Then, it means that the conditional distri-
bution of sequence S1 → S2 → · · · → ST given U = k is a Markov chain with
the initial distribution v(k)a and transition matrix m (k)ab , and the path diagram of
variables U , {St }, and {X t } is illustrated in Fig. 6.2. Although a natural direction in the
extension of latent structure models is illustrated in Fig. 6.1, we have the following
theorem.

Theorem 6.1 The latent Markov chain model and the mixed latent Markov chain
model are equivalent.

Proof In latent Markov chain model (5.1) with B = C K latent states, let latent
state space  = {1, 2, . . . , B} of latent variables St be divided into K subspaces as
k = {C(k − 1) + 1, C(k − 1) + 2, . . . , Ck}, k = 1, 2, . . . , K . If the subspaces are
closed with respect to the state transition, the transition matrix of the latent Markov
chain model is given as the following type:
⎛ ⎞
M1 0 ··· 0
⎜ 0 M2 · · · 0 ⎟
⎜ ⎟
M=⎜ .. .. . . .. ⎟, (6.5)
⎝ . . . . ⎠
0 · · · · · · MK
6.2 Dynamic Latent Class Models 153

where M k , k = 1, 2, . . . , K are C × C transition matrices with latent state spaces


k . In model (6.5), we have


C
m C(k−1)+i,C(k−1)+ j = 1, i = 1, 2, . . . , C; k = 1, 2, . . . , K .
j=1

For the latent Markov chain model (5.1), setting


C
vC(k−1)+c
λk = vC(k−1)+i , v(k)c = , c = 1, 2, . . . , C; k = 1, 2, . . . , K , (6.6)
i=1
λk

it follows that


C
v(k)c = 1, k = 1, 2, . . . , K .
c=1

and the latent Markov chain model expresses the mixed latent Markov chain model.
This completes the theorem. 
In Chap. 5, the equivalence of the latent class model and the latent Markov chain
model is shown in Theorem 5.1, so the following result also holds true:

Theorem 6.2 The mixed latent Markov chain model is equivalent to the latent class
model.

6.3 The ML Estimation of the Parameters of Dynamic


Latent Class Models

In this section, the ML estimation procedures of latent class models via the EM algo-
rithm are summarized. Although the methods can be directly made for the individual
latent structure models as shown in the previous chapters, the ML estimation proce-
dures are given through that for the latent Markov chain model in Chap. 5, based on
Theorem 5.2.
(i) The mover-stayer model
Let the manifest and latent state space be set as  = {1, 2, . . . , A} and  =
{1, 2, . . . , 2 A}, respectively, and let space  be divided into the following two
subspaces:

1 = {1, 2, . . . , A} and 2 = {A + 1, A + 2, . . . , 2 A}.


154 6 The Mixed Latent Markov Chain Model

If the initial trial values of estimates of the parameters in (5.15) and (5.16) in the
ML estimation algorithm are put, respectively, as follows:

⎨ 0 (a ∈ 1 , b ∈ 2 ; a ∈ 2 , b ∈ 1 )
0
m ab = 1 (a = b ∈ 2 ) , (6.7)

0 (a = b ∈ 2 )

and

1 (a = b or a = b + A)
0
pab = (6.8)
0 (other wise)

then, from (5.12), we have


1
n(x1 , x2 , . . . , x T ; s1 , s2 , . . . , st−1 , a, b, st+2 , . . . , sT ) = 0
(a ∈ 1 , b ∈ 2 ; a ∈ 2 , b ∈ 1 ; a = b ∈ 2 ), (6.9)

1
n(x1 , x2 , . . . , xt−1 , b, xt+1 , . . . x T ; s1 , s2 , . . . , st−1 , a, st+1 , . . . , sT ) = 0
(a = b or a = b + A). (6.10)

Putting (6.9) into (5.15), it follows that



⎨ 0 (a ∈ 1 , b ∈ 2 ; a ∈ 2 , b ∈ 1 )
1
m ab = 1 (a = b ∈ 2 ) .

0 (a = b ∈ 2 )

Similarly, by (6.10) we also have



1 (a = b or a = b + A)
1
pab =
0 (other wise)

from (5.16). Inductively,



1 (a = b or a = b + A)
r
pab = ,
0 (otherwise)

⎨ 0 (a ∈ 1 , b ∈ 2 ; a ∈ 2 , b ∈ 1 )
r
m ab = 1 (a = b ∈ 2 ) , r = 1, 2, . . . .

0 (a = b ∈ 2 )
6.3 The ML Estimation of the Parameters of Dynamic … 155

Hence, by setting the initial trial values of the parameters as (6.7) and (6.8), the
ML estimates of parameters in the mover-stayer model can be obtained via the EM
algorithm for the latent Markov chain model in (5.11) through (5.16).
(ii) The mixed Markov chain model
For manifest state space  = {1, 2, . . . , A}, latent state space  is divided into

k = {A(k − 1) + 1, A(k − 1) + 2, . . . , Ak}, k = 1, 2, . . . , K ,

K
where  = k=1 k . If we set the initial trial values for (5.15) and (5.16) in the EM
algorithm for the latent Markov chain model (5.1) as
0
m ab = 0(a ∈ k , b ∈ l , k = l),

and

1 (a = b + A(k − 1), k = 1, 2, . . . , K )
0
pab =
0 (other wise)

then, the algorithm estimates the mixed Markov chain model.


(iii) The mixed latent Markov chain model
By using the parameterization in Theorem 6.1 and identifying response probabilities
pbx in the latent Markov chain model (5.1) and pkbx in the mixed latent Markov
chain model (6.3) as

pkcx = pc+C(k−1),x , c = 1, 2, . . . , C; x = 1, 2, . . . , A; k = 1, 2, . . . , K ,

setting the initial trial values of the parameters as in (6.5), the algorithm for the
latent Markov chain model in (5.11) through (5.16) makes the ML estimates of the
parameters.

6.4 A Numerical Illustration

In order to demonstrate the above discussion, for the data set in Table 5.1, the mover-
stayer model and the mixed latent Markov chain model are estimated via the EM
algorithm for the latent Markov chain model in Chap. 5. The mover-stayer model is
estimated with the method in the previous section. For the same patterns of initial trial
values of the parameters (6.7) and (6.8), we have the ML estimates of the parameters
in Table 6.1. According to the log likelihood test of goodness-of-fit to the data set,
156 6 The Mixed Latent Markov Chain Model

Table 6.1 The estimated parameters in the mover-stayer model


Initial distribution
Mover (latent state) Stayer (latent state)
1 2 3 4 5 6
0.346 0.147 0.046 0.454 0.003 0.004
Latent transition matrix
Latent state
1 2 3 4 5 6
Latent State 1 0.765 0.230 0.005 0* 0* 0*
2 0.438 0.428 0.135 0* 0* 0*
3 0.348 0.193 0.459 0* 0* 0*
4 0* 0* 0* 1* 0* 0*
5 0* 0* 0* 0* 1* 0*
6 0* 0* 0* 0* 0* 1*
Response probability
Manifest state
1 2 3
Latent State 1 1* 0* 0*
2 0* 1* 0*
3 0* 0* 1*
4 1* 0* 0*
5 0* 1* 0*
6 0* 0* 1*
The log likelihood ratio statistic G 2 = 23.385, d f = 15, p = 0.076
The numbers with “*” imply the fixed values

G 2 = 23.385, d f = 15, p = 0.076, the model is accepted at the significant level


0.05. Similarly, the ML estimation of the mixed latent Markov chain model is carried
out, and the estimates of the parameters are illustrated in Table 6.2. The results do not
provide a good fit for the data set. As illustrated in this section, the ML estimation
of the latent structure models in Fig. 6.1 can be made practically by using the ML
estimation procedure for the latent Markov chain model.

6.5 Discussion

The present chapter has treated a basic version of the mixed latent Markov chain
model, in which it is assumed the response probabilities to test items at a time point
and the transition probabilities depend only on the latent states at the time point.
In this sense, the basic model gives an exploratory analysis similar to the latent
6.5 Discussion 157

Table 6.2 The estimated parameters in the mixed latent Markov chain model
Initial distribution
Mover (latent state) Stayer (latent state)
1 2 3 4 5 6
0.215 0.124 0.033 0.611 0.000 0.017
Latent transition matrix
Latent state
1 2 3 4 5 6
Latent State 1 0.727 0.265 0.008 0* 0* 0*
2 0.368 0.491 0.141 0* 0* 0*
3 0.385 0.000 0.615 0* 0* 0*
4 0* 0* 0* 1.000 0.000 0.000
5 0* 0* 0* 0.000 0.598 0.402
6 0* 0* 0* 0.051 0.949 0.000
Response probability
Manifest state
1 2 3
Latent State 1 0.968 0.032 0.000
2 0.000 1.000 0.000
3 0.000 0.000 1.000
4 0.968 0.032 0.000
5 0.000 1.000 0.000
6 0.000 0.000 1.000
The log likelihood ratio statistic G 2 = 8.809, d f = 3, p = 0.032.
The numbers with “*” imply the fixed values

class cluster analysis. As discussed here, the model is an extension of the latent
structure models in Fig. 6.1 from a natural viewpoint; however, the mixed latent
Markov chain model is equivalent to the latent Markov chain model, and also to the
latent class model. The parameter estimation in the mixed latent Markov chain model
via the EM algorithm can be carried out by using that for the latent Markov chain
model as shown in Sect. 6.3, and the method has been demonstrated in Sect. 6.4.
The estimation algorithm is convenient to handle the constraints for extreme values,
setting a part of the response and transition probabilities as zeroes and ones, and the
property is applied to the parameter estimation in the mixed latent Markov chain
model. Applying the model to various research fields, there may be cases where
the response probabilities to manifest variables and the transition probabilities are
influenced by covariates and histories of latent state transitions, and for dealing with
such cases, the mixed latent Markov chain models have been excellently developed
in applications by Langeheine [16], Vermunt et al. [21], Bartolucci [3], Bartolucci &
Farcomeni [4], Bartolucci et al. [5], and so on. Figure 6.3 illustrates the manifest
158 6 The Mixed Latent Markov Chain Model

Fig. 6.3 A path diagram of the mixed latent Markov chain model with the effects of latent state
histories on manifest variables

Fig. 6.4 A path diagram of


the mixed latent Markov
chain model with the effects
of covariates and latent state
histories on manifest
variables

variables X t , t = 2, 3, .. depend on histories St−1 → St , and Fig. 6.4 shows covariate


V influences manifest variables X t , t = 1, 2, . . . in addition to latent state histories
St−1 → St . In these cases, logit model approaches can be made as in Chaps. 3 and 4,
for example, in Fig. 6.3, for binary latent variables St and binary manifest variables
X t , assuming there are no interaction effects of St−1 and St on X t , the following logit
model can be made:
 
exp xt α(t) + xt β(t)t−1 st−1 + xt β(t)t st
P(X t = xt |St−1 = st−1 , St = st ) =   , t = 2, 3, . . . ,
1 + exp α(t) + β(t)t−1 st−1 + β(t)t st

where α(t) and β(t)t−1 are parameters. For polytomous variables, generalized logit
models can be constructed by considering phenomena under study. Similarly, for
Fig. 6.4, appropriate logit models can also be discussed. In such models, it may be
useful to make path analysis of the system of variables. Further developments of latent
Markov modeling in data analysis can be expected. In Chap. 7, an entropy-based
approach to path analysis [14, 15] is applied to latent class models.

References

1. Andersen, E. B. (1977). Discrete statistical models with social science application. Amsterdam:
North-Holland Publishing Co.
2. Bartholomew, D. J. (1983). Some recent development of social statistics. International
Statistical Review, 51, 1–9.
3. Bartolucci, F. (2006). Likelihood inference for a class of latent Markov models under linear
hypotheses on the transition probabilities. Journal of the Royal Statistical Society, B, 68, 155–
178.
4. Bartolucci, F., & Farcomeni, A. (2009). A multivariate extension of the dynamic logit model
for longitudinal data based on a latent Markov heterogeneity structure. Journal of the American
References 159

Statistical Association, 104, 816–831.


5. Bartolucci, F., Lupparelli, M., & Montanari, G. E. (2009). Latent Markov model for binary
longituidinal data: An application to the performance evaluation of nursing homes. Annals of
Applied Statistics, 3, 611–636.
6. Bartolucci, F., Farcomeni, A., & Pennoni, F. (2010). An overview of latent Markov models for
longitudinal categorical data, arXiv:1003.2804 [math.ST].
7. Bartolucci, F., Farcomeni, A., & Pennoni, F. (2014). Latent Markov models: A review of a
general framework for the analysis of longitudinal data with covariates. TEST, 23, 433–465.
8. Blumen, I., Kogan, M., & McCarthy, P. J. (1955). The industry mobility of labor as a probability
process. Ithaca: Cornel University.
9. Bush, R. R., & Cohen, B. P. (1956). Book Review of. Journal of the American Statistical
Association, 51, 702–704.
10. Bye, B. V., & Schechter, E. S. (1986). A latent Markov model approach to the estimation of
response error in multiway panel data. Journal of the American Statistical Association, 81,
357–380.
11. Eshima, N. (1993). Dynamic latent structure analysis through the latent Markov chain model.
Behaviormetrika, 20, 151–160.
12. Eshima, N, Asano, C, & Watanabe, M (1984). A time-dependent latent class analysis based on
states-transition. In Proceedings of the First China-Japan Symposium on Statistics, pp. 62–66.
13. Eshima, N., Asano, C., & Watanabe, M. (1985). A time-dependent latent class analysis based
on states-transition. Sougo Rikougaku Kenkyuka Houkoku, 6, 243–249. (in Japanese).
14. Eshima, N., Tabata, M., & Zhi, G. (2001). Path analysis with logistic regression models: Effect
analysis of fully recursive causal systems of categorical variables. Journal of Japan Statistical
Society, 31, 1–14.
15. Eshima, N., Tabata, M., Borroni, C. G., & Kano, Y. (2018). An entropy-based approach to path
analysis of structural generalized linear models: a basic approach. Entropy, 17, 5117–5132.
16. Langeheine, R. (1988). Manifest and latent Markov chain models for categorical panel data.
Journal of Educational Statistics, 13, 299–312.
17. Poulsen, C. S. (1990). Mixed Markov and latent Markov modelling applied to brand choice
behaviour. International Journal of Research in Marketing, 7, 5–19.
18. Van de Pol, F. (1990). A unified framework for Markov modeling in discrete and discrete time.
Sociological Method and Research, 18, 416–441.
19. Van de Pol, F., & Langeheine, R. (1990). Mixed Markov latent class models. Sociological
Methodology, 20, 213–247.
20. Van de Pol, F., & de Leeuw, J. (1986). A latent Markov model to correct for measurement error.
Sociological Method and Research, 15, 118–141.
21. Vermunt, J. K., Langeheine, R., & Böckenholt, U. (1999). Discrete-time discrete-state latent
Markov models with time-constant and time-varying covariates. Journal of Educational and
Behavioral Statistics, 24, 179–207.
Chapter 7
Path Analysis in Latent Class Models

7.1 Introduction

It is a useful approach to analyze causal relationships among variables in latent struc-


ture models. The relationships are considered in real data analysis based on obser-
vational methods of the variables and by using particular scientific theories, and
according to it, causes and effects are hypothesized on sets of the variables before
statistical analysis. In many scientific fields, for example, sociology, psychology,
education, medicine, and so on, there are many cases where some of the observed
variables are regarded as indicators of latent variables, so meaningful causal rela-
tionships have to be discussed not only for manifest variables but also for latent
variables through path analysis methods [25]. Linear structural equation models
(Jöreskog and Sörbom, 1996) [2] are significant approaches for path analysis of
continuous variables, and the path analysis is easily carried out by using regression
coefficients in linear structural equations that express path diagrams among manifest
and latent variables. For categorical variables, Goodman’s approach to path analysis
with odds ratios [11, 12] made a great stimulus for developing path analysis of cate-
gorical variables, and Goodman [13] tried a direction of path analysis in latent class
models, though the direct and indirect effects were not discussed. This approach was
performed in the case where all variables concerned are binary, and the models used
are called the multiple-indicator, multiple-cause models. Macready (1982) also used
a latent class model with four latent classes, which represent learning patterns of
two kinds of skill, to perform a causal analysis in explaining a learning structure.
Similar models are also included in White and Clark [24], Owston [23], and Eshima
et al. [7]. In path analysis, it is important how the total effects of parent variables on
descendant ones are measured and how the total effects are divided into the direct
and indirect effects, that is, the following additive decomposition is critical:

The total effect = the direct effect + the indirect effect. (7.1)

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 161
N. Eshima, An Introduction to Latent Class Analysis, Behaviormetrics: Quantitative
Approaches to Human Behavior 14,
https://doi.org/10.1007/978-981-19-0972-6_7
162 7 Path Analysis in Latent Class Models

Eshima et al. [8] proposed path analysis for categorical variables in logit models by
using log odds ratios, and the above decomposition was given. Kuha and Goldthorpe
[16] also gave a path analysis method of categorical variables by using odds ratios,
however, decomposition (7.1) approximately holds true. Following the approaches,
an entropy-based method of path analysis for generalized linear models, which can
make the effect decomposition shown in (7.1), was proposed by Eshima et al. [9].
The present chapter applies a method of path analysis in Eshima et al. [8, 9]
to multiple-indicator, multiple-cause models and the latent Markov chain model.
Section 7.2 discusses a multiple-indicator, multiple-cause model. In Sect. 7.3, the
path analysis method is reviewed, and the effects of variables are calculated in some
examples. Section 7.4 gives a numerical illustration to make a path analysis in the
multiple-indicator, multiple-cause model. In Sect. 7.5, path analysis in the latent
Markov chain model is considered, and in Sect. 7.6, a numerical example is presented
to demonstrate the path analysis. Section 7.7 provides discussions and a further
perspective of path analysis in latent class models.

7.2 A Multiple-Indicator, Multiple-Cause Model

Let binary variables X ki , i = 1,2, . . . , Ik be the indicators of latent variables Sk , k =


1,2, . . . , K , and let us suppose that the conditional probabilities of indicator variables
X ki , given (S1 , S2 , . . . , SK ) = (s1 , s2 , . . . , s K ), depend only on Sk = sk , that is, for
l = k, latent variables Sl do not have the direct effects on X ki , i = 1,2, . . . , Ik . Let
v(s1 , s2 , . . . , s K ) be the probability of (S1 , S2 , . . . , SK ) = (s1 , s2 , . . . , s K ). Then,
the latent class model is given as follows:

P(X ki = xki , i = 1,2, . . . , Ik ; k = 1,2, . . . K )


 
K 
Ik
= v(s) P(X ki = 1|Sk = sk )xki (1 − P(X ki = 1|Sk = sk ))1−xki ,
s k=1 i=1
(7.2)


where notation s implies the summation over all latent variable patterns s =
(s1 , s2 , . . . , s K ). In this model, the following inequalities have to hold:

P(X ki = 1|Sk = 0) < P(X ki = 1|Sk = 1), i = 1,2, . . . , Ik ; k = 1,2, . . . , K ,


(7.3)

because X ki , i = 1,2, . . . , Ik are the indicators of latent variables (states) Sk , k =


1,2, . . . , K . The path diagram between Sk and X ki , i = 1,2, . . . , Ik is illustrated in
Fig. 7.1. The probabilities P(X ki = 1|Sk = 0) imply guessing (intrusion) errors and
1 − P(X ki = 1|Sk = 1) forgetting (omission) ones. Although the parameters can be
7.2 A Multiple-Indicator, Multiple-Cause Model 163

Fig. 7.1 Path diagram of


manifest variables X i and
latent variable Sk

estimated according to the EM algorithm for the usual latent class model in Chap. 2,
imposing the equality constraints, there may be cases that the estimated models are
not identified with the hypothesized structures. Hence, in order to estimate the latent
probabilities P(X ki = 1|Sk = sk ), it is better to formulate the latent probabilities as
models in Chaps. 3 and 4, that is,
 exp(αki )
1+exp(αki ) (sk = 0)
P(X ki = 1|Sk = sk ) = exp(αki +βki ) , (7.4)
1+exp(αki +βki ) (sk = 1)

where

βki = exp(γki ), i = 1,2, . . . , Ik ; k = 1,2, . . . , K . (7.5)

The ML estimation procedure based on the EM algorithm can be constructed by


a method similar to those in Chaps. 2 and 3. For K = 2, a path diagram of the
model is illustrated in Fig. 7.2a. The total, direct and indirect effects have to be

Fig. 7.2 a Path diagram of


manifest variables
a
X ki , i = 1,2, . . . , Ik and
latent variables Sk , k = 1,2,
where S1 is a parent variable
of S2 . b Path diagram of
manifest variables
X ki , i = 1,2, . . . , Ik and
latent variables Sk , k = 1,2,
where S1 and S2 have no
causal order
b
164 7 Path Analysis in Latent Class Models
 
calculated according to path diagrams. Let  f s1 , s2 , x1i , x2 j be the joint probability
functions of variables S1 , S2 , X 1i , X 2 j , g1 (s1 ) the marginal probability function of
S1 , g12 (s2 |s1 ) the conditional probability function
 of S2 for given S1 = s1 , f 1i (x1i |s1 )
that of X 1i for given S1 = s1 , and let f 2 j x2 j |s2  be that of X 2 j for given S2 = s2 .
Then, from Fig. 7.2a, functions f s1 , s2 , x1i , x2 j are decomposed as follows:
   
f s1 , s2 , x1i , x2 j = g1 (s1 )g12 (s2 |s1 ) f 1i (x1i |s1 ) f 2 j x2 j |s2 ,
i = 1,2, . . . , I1 ; j = 1,2, . . . , I2 . (7.6)

Since f ki (xki |sk ) = P(X ki = xki |Sk = sk ), for binary latent variables Sk ,
formulae in (7.4) are expressed as

exp(xki αki + xki βki sk )


f ki (xki |sk ) = , i = 1,2, . . . , Ik : k = 1,2. (7.7)
1 + exp(αki + βki sk )

Similarly, we have

exp(s2 γ1 + s2 δ1 s1 )
g12 (s2 |s1 ) = , s1 , s2 ∈ {0,1}, (7.8)
1 + exp(γ1 + δ1 s1 )

where δ1 is a regression coefficient and γ1 is an intercept parameter. From the above


formulation, the path system in Fig. 7.2a is viewed as a recursive system of logit
models. In Fig. 7.2a, S1 is the parent variable of the other variables, and S2 is the
parent variable of manifest variables X 2i , i = 1,2, . . . , I2 . If there is no causal order
between latent variables S1 and S2 , then, the path diagram is illustrated in Fig. 7.2b.
In this model, the latent variables have to be treated parallelly. Before making a path
analysis of the multiple-indicator, multiple-cause model (7.2), an entropy-based path
analysis method for generalized linear model (GLM) systems [8, 9] is considered
for logit models in the next section.

7.3 An Entropy-Based Path Analysis of Categorical


Variables

For simplicity of the discussion, a system of variables Y and Ui , i = 1,2, 3 shown


in Fig. 7.3 is discussed. From the path diagram, the joint probability of the four
variables is recursively decomposed as follows:

f (u 1 , u 2 , u 3 , y) = f 1 (u 1 ) f 2 (u 2 |u 1 ) f 3 (u 3 |u 1 , u 2 ) f (y|u 1 , u 2 , u 3 ),

where functions f i (∗|∗), i = 1,2, 3 and f (∗|∗) imply the conditional probability
functions related to the variables. In Fig. 7.3, the relationship is expressed as follows:
7.3 An Entropy-Based Path Analysis of Categorical Variables 165

Fig. 7.3 A path diagram of


manifest variable Y and
latent variables
Ui , i = 1,2, 3

U1 ≺ U2 ≺ U3 ≺ Y.

For the following logit model without no interaction effects:

exp(yα + yβ1 u 1 + yβ2 u 2 + yβ3 u 3 )


f (y|u 1 , u 2 , u 3 ) = , (7.9)
1 + exp(α + β1 u 1 + β2 u 2 + β3 u 3 )

the total, direct, and indirect effects of parent variables Uk on descendant  variable
Y are discussed. For baseline category (U1 , U2 , U3 , Y ) = u ∗1 , u ∗2 , u ∗3 , y ∗ , the total
effect of (U1 , U2 , U3 ) = (u 1 , u 2 , u 3 ) on Y = y can be defined by the following log
odds ratio:
 
f (y|u 1 , u 2 , u 3 ) f y ∗ u ∗1 , u ∗2 , u ∗3   
log  ∗ ∗ ∗  = log f (y|u 1 , u 2 , u 3 ) − log f y ∗ |u 1 , u 2 , u 3

f (y |u 1 , u 2 , u 3 ) f y|u 1 , u 2 , u 3
    
− log f y|u ∗1 , u ∗2 , u ∗3 − f y ∗ |u ∗1 , u ∗2 , u ∗3 .

3
   
= y − y ∗ βk u k − u ∗k . (7.10)
k=1

The above log odds ratio implies the decrease of the uncertainty of response 
variable Y for a change of parent variables (U1 , U2 , U3 ) from baseline u ∗1 , u ∗2 , u ∗3 ,
that is, the amount of information on Y explained by the parent variables. Since the
logit model in (7.9) has no interactive effects of the explanatory variables Ui , the log
∗ ∗
odds ratio is a bilinear
 ∗ form

of y−y
∗ ∗
 and u k −u k with respect to regression coefficients
βk . The baseline u 1 , u 2 , u 3 , y is formally substituted for the related means of the
variables (μ1 , μ2 , μ3 , ν) and the total effect of (U1 , U2 , U3 ) = (u 1 , u 2 , u 3 ) on Y = y
is defined by


3
(y − ν)βk (u k − μk ). (7.11)
k=1

The total effect of (u 2 , u 3 ) on Y = y at U1 = u 1 is defined by


3
(y − ν(u 1 ))βk (u k − μk (u 1 )), (7.12)
k=2
166 7 Path Analysis in Latent Class Models

where ν(u 1 ) and μk (u 1 ) are the conditional means of Y and Uk , k = 2,3 given
 1∗ =∗ u 1∗, respectively.
U  The above formula can be derived by formally setting
u 1 , u 2 , u 3 , y ∗ = (u 1 , μ2 (u 1 ), μ3 (u 1 ), ν(u 1 )) in (7.10). Subtracting (7.11) from
(7.12), it follows that

The total effect of U1 = u 1 on Y = y at (U2 , U3 ) = (u 2 , u 3 )


= (the total effect of (U1 , U2 , U3 ) = (u 1 , u 2 , u 3 ) on Y = y)
− (the total effect of (U2 , U3 ) = (u 2 , u 3 ) on Y = y at U1 = u 1 )

3 
3
= (y − μY )βk (u k − μk ) − (y − ν(u 1 ))βk (u k − μk (u 1 )). (7.13)
k=1 k=2

 
Putting u ∗1 , u ∗2 , u ∗3 , y ∗ = (μ1 (u 2 , u 3 ), u 2 , u 3 , ν(u 2 , u 3 )), where ν(u 2 , u 3 ) and
μ1 (u 2 , u 3 ) are the conditional means of Y and U1 given (U2 , U3 ) = (u 2 , u 3 ), respec-
tively, the direct effect of U1 = u 1 on Y = y at (U2 , U3 ) = (u 2 , u 3 ) is defined
by

(y − ν(u 2 , u 3 ))β1 (u 1 − μ1 (u 2 , u 3 )). (7.14)

From this, the indirect effect of U1 = u 1 on Y = y through (U2 , U3 ) = (u 2 , u 3 )


is calculated by subtracting (7.14) from (7.13).
Remark 7.1 The indirect effect of U1 = u 1 on Y = y is defined by the total effect
minus the direct effect as discussed above. Since the direct and the total effects of
U1 = u 1 can be interpreted as information, the indirect effect is also interpreted in
information. 
Second, the effects of U2 = u 2 on Y = y at (U1 , U3 ) = (u 1 , u 3 ) are computed.
 effect of U3 = u 3 on Y = y at (U1 , U2 ) = (u 1 , u 2 ) can be calculated by
The total
setting u ∗1 , u ∗2 , u ∗3 , y ∗ = (u 1 , u 2 , u 3 (u 1 , u 2 ), ν(u 1 , u 2 )) in (7.10), that is,

(y − v(u 1 , u 2 ))β3 (u 3 − μ3 (u 1 , u 2 )), (7.15)

where ν(u 1 , u 2 ) and μ3 (u 1 , u 2 ) are the conditional means of Y and U3 given


(U1 , U2 ) = (u 1 , u 2 ), respectively. From (7.12) and (7.15), the total effect of U2 = u 2
on Y = y at (U1 , U3 ) = (u 1 , u 3 ) is calculated as follows:

(the total effect of (U2 , U3 ) = (u 2 , u 3 ) on Y = y at U1 = u 1 )


− (The total effect of U3 = u 3 on Y = y at (U1 , U2 ) = (u 1 , u 2 ))

3
= (y − ν(u 1 ))βk (u k − μk (u 1 )) − (y − ν(u 1 , u 2 ))β3 (u 3 − μ3 (u 1 , u 2 ))
k=2
(7.16)
7.3 An Entropy-Based Path Analysis of Categorical Variables 167
 
and for baseline u ∗1 , u ∗2 , u ∗3 , y ∗ = (u 1 , μ2 (u 1 , u 3 ), u 3 , μY (u 1 , u 3 )) in (7.10), the
direct effect U2 = u 2 on Y = y at (U1 , U3 ) = (u 1 , u 3 ) is given by

(y − ν(u 1 , u 3 ))β2 (u 2 − μ2 (u 1 , u 3 )), (7.17)

where ν(u 1 , u 3 ) and μ2 (u 1 , u 3 ) are the conditional means of Y and U2 given


(U1 , U3 ) = (u 1 , u 2 ), respectively. From the above calculation, we have the following
additive decomposition of the total effect of (U1 , U2 , U3 ) = (u 1 , u 2 , u 3 ) on Y = y:

(The total effect of (U1 , U2 , U3 ) = (u 1 , u 2 , u 3 ) on Y )


= (The total effect of U1 = u 1 on Y = y at (U2 , U3 ) = (u 2 , u 3 ))
+ (The total effect of U2 = u 2 on Y = y at (U1 , U3 ) = (u 1 , u 3 ))
+ (The total effect of U3 = u 3 on Y = y at (U1 , U2 ) = (u 1 , u 2 ))
(7.18)

Remark 7.2 The effects defined in this section are interpreted in information, and
the exponentials of them are viewed as the multiplicative effects in odds ratios. 
In order to summarize and standardize the effects based on log odds ratios, the
entropy coefficient of determination (ECD) [6] is used. In logit model (7.9), the
standardized summary total effect of (U1 , U2 , U3 ) on Y is given by
3
βk Cov(Y, Uk )
eT ((U1 , U2 , U3 ) → Y ) = 3 k=1
. (7.19)
k=1 βk Cov(Y, Uk ) + 1

Remark
3 7.3 By taking the expectation of (7.11) over all (u 1 , u 2 , u 3 ) and y, we have
β
k=1 k Cov(Y, Uk ) and (7.19) is ECD of (U1 , U2 , U3 ) and Y . 
Summarizing and standardizing the effects from (7.12) to (7.17) as in (7.19), we
also have
3
βk Cov(Y, Uk |U1 )
eT ((U2 , U3 ) → Y ) = 3k=2 ,
k=1 βk Cov(Y, Uk ) + 1
3 
βk Cov(Y, Uk ) − 3k=2 βk Cov(Y, Uk |U1 )
eT (U1 → Y ) = k=1 3 ,
k=1 βk Cov(Y, Uk ) + 1

β1 Cov(Y, U1 |U2 , U3 )
e D (U1 → Y ) = 3 ,
k=1 βk Cov(Y, Uk ) + 1

β3 Cov(Y, U3 |U1 , U2 )
eT (U3 → Y ) = e D (U3 → Y ) = 3 ,
k=1 βk Cov(Y, Uk ) + 1
3
βk Cov(Y, Uk |U1 )
eT ((U2 , U3 ) → Y ) = 3k=2 ,
k=1 βk Cov(Y, Uk ) + 1
168 7 Path Analysis in Latent Class Models

eT (U2 → Y ) = eT ((U2 , U3 ) → Y ) − eT (U3 → Y ),

β2 Cov(Y, U2 |U1 , U3 )
e D (U2 → Y ) = 3 ,
k=1 βk Cov(Y, Uk ) + 1

where notations e D (∗) and eT (∗) imply the standardized summary direct and total
effects of the related variables, respectively. In the above path analysis, from (7.18),
we also have


3
eT ((U1 , U2 , U3 ) → Y ) = eT (Uk → Y ).
k=1

Next, path analysis for a path system in Fig. 7.4 is carried out. The joint probability
of the three variables is decomposed as follows:

f (u 1 , u 2 , y) = f 12 (u 1 , u 2 ) f (y|u 1 , u 2 ),

where f 12 (u 1 , u 2 ) is the joint probability function of U1 and U2 and f (y|u 1 , u 2 )


the conditional probability function of Y for given (U1 , U2 ) = (u 1 , u 2 ). For the
following logit model:

exp(yα + yβ1 u 1 + yβ2 u 2 )


f (y|u 1 , u 2 ) = ,
1 + exp(α + yβ1 u 1 + yβ2 u 2 )

the total effect of (U1 , U2 ) = (u 1 , u 2 ) on Y = y is given by

(y − ν)β1 (u 1 − μ1 ) + (y − ν)β2 (u 2 − μ2 ).

The direct effect of U1 = u 1 on Y = y at U2 = u 2 and that of U2 = u 2 on Y = y


at U1 = u 1 are given, respectively, as follows:

(y − ν(u 2 ))β1 (u 1 − μ1 (u 2 ))

Fig. 7.4 A path diagram of


Y and its parent variables
Ui , i = 1, 2
7.3 An Entropy-Based Path Analysis of Categorical Variables 169

and

(y − ν(u 1 ))β2 (u 2 − μ2 (u 1 )),

where ν(u k ), k = 1,2, μ1 (u 2 ) and ν(u 1 ) are the conditional expectations of Y , U1 ,


and U2 , respectively, as in the above discussion. The total effect of U1 = u 1 on Y = y
at U2 = u 2 is calculated by

(the total effect of (U1 , U2 ) = (u 1 , u 2 ) on Y = y)


− (the direct effect of U2 = u 2 on Y = y at U1 = u 1 )
= (y − ν)β1 (u 1 − μ1 ) + (y − ν)β2 (u 2 − μ2 ) − (y − ν(u 1 ))β2 (u 2 − μ2 (u 1 )).

Similarly, the total effect of U2 = u 2 on Y = y at U1 = u 1 can also be calculated.


Summarizing and standardizing the above effects, we have

β1 Cov(Y, U1 ) + β2 Cov(Y, U2 )
eT ((U1 , U2 ) → Y ) = ,
1 + β1 Cov(Y, U1 ) + β2 Cov(Y, U2 )
β1 Cov(Y, U1 |U2 )
e D (U1 → Y ) = ,
1 + β1 Cov(Y, U1 ) + β2 Cov(Y, U2 )
β2 Cov(Y, U2 |U1 )
e D (U2 → Y ) = ,
1 + β1 Cov(Y, U1 ) + β2 Cov(Y, U2 )

eT (U1 → Y ) = eT ((U1 , U2 ) → Y ) − e D (U2 → Y ),

eT (U2 → Y ) = eT ((U1 , U2 ) → Y ) − e D (U1 → Y ).

The above method can be applied to the causal system in Fig. 7.2b.

7.4 Path Analysis in Multiple-Indicator, Multiple-Cause


Models

7.4.1 The Multiple-Indicator, Multiple-Cause Model


in Fig. 7.2a

The above method of path analysis is applied to a multiple-cause, multiple-indicator


model in Fig. 7.2a. According to (7.6), the effects of S1 on X 1i , i = 1,2, . . . , I1 and
those of S1 and S2 on X 2i , i = 1,2, . . . , I2 are calculated. Let νk and μki be the
expectations of X ki and Sk , respectively. Since the variables concerned are binary,
we have
170 7 Path Analysis in Latent Class Models

μk = P(Sk = 1), k = 1,2;

νki = P(X ki = 1), i = 1,2, . . . , Ik , k = 1,2.

(i) The effects of S1 on X 1i

For S1 → X 1i , applying the discussion in Sect. 7.3 to (7.7), by using a method similar
to (7.11) we have the total (direct) effects of S 1 = s1 on X 1i = x1i as

(x1i − ν1i )β1i (s1 − μ1 ), i = 1,2, . . . , I1 . (7.20)

Summarizing and standardizing the above effects, we have

β1i Cov(X 1i , S1 )
eT (S1 → X 1i ) = , i = 1,2, . . . , I1 . (7.21)
β1i Cov(X 1i , S1 ) + 1

(ii) The effects of S1 on S2

In a way similar to (7.21), in (7.8) we have



the total (direct) effect of S1 = s1 on S2 = s2 = (s2 − μ2 )δ1 (s1 − μ1 ), (7.22)

δ1 Cov(S1 , S2 )
eT (S1 → S2 ) = . (7.23)
δ1 Cov(S1 , S2 ) + 1

(iii) The effects of S1 and S2 on X 2i

From logit model (7.7), the total effects of (S1 , S2 ) = (s1 , s2 ) on X 2i = x2i are given
by

(x2i − ν2i )β2i (s2 − μ2 ), i = 1,2, . . . , I2 . (7.24)

According to (7.12), the total (direct) effects of S2 = s2 on X 2i = x2i at S1 = s1


are calculated as follows:

(x2i − ν2i (s1 ))β2i (s2 − μ2 (s1 )), i = 1,2, . . . , I2 , (7.25)

where ν2 (s1 ) and μ2i (s1 ) are the conditional expectation of X 2i and S2 given S1 = s1 ,
respectively. The variables are binary, so it follows that

μ2 (s1 ) = P(S2 = 1|S1 = s1 ), ν2i (s1 ) = P(X 2i = 1|S1 = s1 ).

Since the direct effects of S1 = s1 on X 2i = x2i are zero, by subtracting (7.25)


from (7.24), the total (indirect) effects of S1 = s1 on X 2i = x2i through S2 = s2 are
obtained as follows:
7.4 Path Analysis in Multiple-Indicator, Multiple-Cause Models 171

(x2i − ν2i )β2i (s2 − μ2 ) − (x2i − ν2i (s1 ))β2i (s2 − μ2 (s1 )), i = 1,2, . . . , I2. .

Summarizing and standardizing the above effects, we have

β2i Cov(X 2i , S2 )
eT ((S1 , S2 ) → X 2i ) = , i = 1,2, . . . , I2 , (7.26)
1 + β2i Cov(X 2i , S2 )
β2i Cov(X 2i , S2 |S1 )
eT (S2 → X 2i ) = e D (S2 → X 2i ) = , i = 1,2, . . . , , (7.27)
1 + β2i Cov(X 2i , S2 )

eT (S1 → X 2i ) = e I (S1 → X 2i ) = eT ((S1 , S2 ) → X 2i ) − eT (S2 → X 2i )


β2i Cov(X 2i , S2 ) − β2i Cov(X 2i , S2 |S1 )
= , i = 1,2, . . . , I2 .
1 + β2i Cov(X 2i , S2 )

More complicated models as shown in Fig. 7.5 can also be considered. The partial
path system of S1 , S2 , (X 1i ), and (X 2i ) can be analyzed as above, so it is sufficient
to discuss the partial path system of S1 , S2 , S3 , and (X 3i ) as shown in Fig. 7.6. The
diagram is a special case of Fig. 7.3. Hence, the discussion on the path diagram in
Fig. 7.3 can be directly employed.

Fig. 7.5 Path diagram of


manifest variables
X ki , i = 1,2, . . . , Ik and
latent variables Sk , i = 1,2, 3

Fig. 7.6 Partial path


diagram of manifest
variables X 3i and latent
variables Sk , i = 1,2, 3
172 7 Path Analysis in Latent Class Models

7.4.2 The Multiple-Indicator, Multiple-Cause Model


in Fig. 7.2b

For the path diagram in Fig. 7.2b, the effects of variables are calculated as follows:

(i) The effects of S1 and S2 on X 1i

From logit model (7.7), the total effects of (S1 , S2 ) = (s1 , s2 ) on X 1i are given by

(x1i − ν1i )β1i (s1 − μ1 ), i = 1,2, . . . , I1 . (7.28)

Since the direct effects of S2 = s2 on X 1i = x1i are zero, the above effects are
also the total effects of S1 = s1 . With a method similar to (iii) in Subsection 7.4.1,
the direct effects of S1 = s1 on X 1i = x1i at S2 = s2 are given as follows:

(x1i − ν1i (s2 ))β1i (s1 − μ1 (s2 )), i = 1,2, . . . , I1 , (7.29)

where ν1 (s2 ) and μ1i (s2 ) are the conditional expectation of X 1i and S1 given S2 = s2 ,
that is,

μ1 (s2 ) = P(S1 = 1|S2 = s2 ), ν1i (s2 ) = P(X 1i = 1|S2 = s2 ). (7.30)

By subtracting (7.29) from (7.28), the indirect effects of S1 = s1 on X 1i = x1i


through the association with S2 = s2 are obtained as follows:

(x1i − ν1i )β1i (s1 − μ1 ) − (x1i − ν1i (s2 ))β1i (s1 − μ1 (s2 )), i = 1,2, . . . , I1 .

The above effects are also the indirect effects of S2 = s2 on X 1i = x1i as well.
Summarizing and standardizing the above effects, we have

β1i Cov(X 1i , S1 )
eT ((S1 , S2 ) → X 1i ) = eT (S1 → X 1i ) = , i = 1,2, . . . , I1 ;
1 + β1i Cov(X 1i , S1 )
β1i Cov(X 1i , S1 |S2 )
e D (S1 → X 1i ) = , i = 1,2, . . . , I1 ; (7.31)
1 + β1i Cov(X 1i , S1 )

e I (S1 → X 1i )(= e I (S2 → X 1i )) = eT (S1 → X 1i ) − e D (S1 → X 1i )


β1i Cov(X 1i , S1 ) − β1i Cov(X 1i , S1 |S2 )
= , i = 1,2, . . . , I1 .
1 + β1i Cov(X 1i , S1 )

e D (S2 → X 1i ) = 0, i = 1,2, . . . , I1 .

(ii) The effects of S1 and S2 on X 2i


7.4 Path Analysis in Multiple-Indicator, Multiple-Cause Models 173

By a method similar to the calculation of the effects of S1 and S2 on X 1i , substituting


S1 , S2 , and X 1i in (i) for S2 , S1 , and X 2i , respectively, we can obtain the effects of S1
and S2 on X 2i .

7.5 Numerical Illustration I

7.5.1 Model I (Fig. 7.2a)

Table 7.1 shows artificial parameters of a multiple-indicator, multiple-cause model


with two binary latent variables and three indicator manifest variables for each latent
variable, for demonstrating the path analysis in Fig. 7.7. By using the parameters, we
can get the regression coefficients β and δ in (7.21) and (7.23), as shown in Tables 7.2,
respectively. Table 7.3 illustrates the means of latent variables St and manifest ones
X ti , and the conditional means of S2 and X 2i given S1 = 0 or 1 are given in Table
7.4. By using the parameters, the effects of variables are calculated. First, the effects
of latent variable S1 on manifest variables X 1i , i = 1,2, 3 are obtained in Table 7.5.
According to the path diagram in Fig. 7.7, the effects are direct ones, and also total
ones, that is,

the total effects of S1 = 1 on X1i = the direct effects of S1 = 1 on X1i .

From the table, for example,

Table 7.1 Parameters of a multiple-indicator, multiple-cause model in Fig. 7.7


Latent class Proportion Latent positive item response probability
X 11 X 12 X 13 X 21 X 22 X 23
(1,1) 0.32 0.9 0.7 0.6 0.8 0.8 0.7
(0,1) 0.18 0.2 0.1 0.3 0.8 0.8 0.7
(1,0) 0.08 0.9 0.7 0.6 0.1 0.3 0.1
(0,0) 0.42 0.2 0.1 0.3 0.1 0.3 0.1

Fig. 7.7 Path diagram of


manifest and latent variables
for Numerical Illustration I
174 7 Path Analysis in Latent Class Models

Table 7.2 The estimated regression coefficients in (7.28), (7.30), and (7.32)
δ β11 β12 β13 β21 β22 β23
2.234 3.584 3.045 1.253 3.584 2.234 3.045

Table 7.3 The means of variables latent and manifest variables, St and X ti
μ1 μ2 ν11 ν12 ν13 ν21 ν22 ν23
0.4 0.5 0.48 0.34 0.42 0.45 0.55 0.4

Table 7.4 The conditional means of S2 and X 2i given S1 = 0or1


s1 μ2 (s1 ) ν21 (s1 ) ν22 (s1 ) ν23 (s1 )
1 0.8 0.66 0.7 0.58
0 0.3 0.31 0.45 0.28

Table 7.5 The total (direct) effects of Latent variable S1 on Manifest variables X 1i , i = 1,2, 3
S1 X 11 X 12 X 13 S1 → X 11 S1 → X 12 S1 → X 13
1 1 1 1 1.118 1.206 0.436
0 1 1 1 −0.745 −0.804 −0.291
1 0 0 0 −1.032 −0.621 −0.316
0 0 0 0 0.688 0.414 0.210
Mean effect β1i Cov(X 1i , S1 ) 0.602 0.438 0.090
eT (S1 → X 1i ) = e D (S1 → X 1i ) 0.376 0.305 0.083

the total effects of S1 = 1 on X 11 = 1 = 1.118,


the total effects of S1 = 1 on X 12 = 1 = 1.206,
the total effects of S1 = 1 on X 13 = 1 = 0.436.

The above effects are the changes of information in X 1i for latent variable S1 =
1 as explained in (7.10), and the following exponentials of the quantities can be
interpreted as the odds ratios:

exp(1.118) = 3.058, exp(1.206) = 3.340, exp(0.436) = 1.547.

For baselines ν1 and μ1i , i = 1,2, 3 in Table 7.3, the odds ratios with respect
to the variables (S1 , X 1i ), i = 1,2, 3 are 3.058, 3.340, and 1.547, respectively. The
standardized effects (7.21) are shown in the seventh row of Table 7.5. For example,
eT (S1 → X 11 ) = 0.376 implies that 37.6% of the variation of manifest variable X 11
in entropy is explained by latent variable S1 . By using (7.22) and (7.23), the effects
of latent variable S1 on S2 are calculated in Table 7.6, and the explanation of the table
can be given as in Table 7.5. In Table 7.7, the total effects of latent variables (S1 , S2 )
7.5 Numerical Illustration I 175

Table 7.6 The total (direct)


S1 S2 S1 → S2
effects of latent variable S1 on
S2 1 1 −0.137
0 1 0.206
1 0 0.303
0 0 −0.104
Mean effect δ1 Cov(S1 , S2 ) 0.268
eT (S1 → S2 ) = e D (S1 → S2 ) 0.211

Table 7.7 The total effects of latent variables (S1 , S2 ) on manifest variables X 2i , i = 1,2, 3
S1 S2 X 21 X 22 X 23 (S1 , S2 ) → X 21 (S1 , S2 ) → X 22 (S1 , S2 ) → X 23
1 1 1 1 1 0.985 0.503 0.913
0 1 1 1 1 0.985 0.503 0.913
1 0 1 1 1 −0.985 −0.503 −0.913
0 0 1 1 1 −0.985 −0.503 −0.913
1 1 0 0 0 −0.806 −0.614 −0.609
0 1 0 0 0 −0.806 −0.614 −0.609
1 0 0 0 0 0.806 0.614 0.609
0 0 0 0 0 0.806 0.614 0.609
Mean effect β2i Cov(X 2i , S2 ) 0.627 0.279 0.457
eT ((S1 , S2 ) → X 2i ) 0.385 0.218 0.314

on manifest variables X 2i , i = 1,2, 3 are calculated. According to the path diagram in


Fig. 7.7, latent variable S1 and manifest variables X 2i are conditionally independent,
given S2 , so the total effects of (S1 , S2 ) = (1, s), s = 0,1 on X 2i , i = 1,2, 3 are equal
to those of (S1 , S2 ) = (0, s) on X 2i , that is, the effects depend only on S2 , as shown
in Table 7.7. The effects are calculated according to (7.24), for example,

the total effects of (S1 , S2 ) = (s, 1) on X 21 = 1 = 0.985, s = 0, 1;

the total effects of (S1 , S2 ) = (s, 0) on X 21 = 1 = −0.985, s = 0, 1;

the total effects of (S1 , S2 ) = (s, 1) on X 21 = 0 = 0.806, s = 0, 1;

the total effects of (S1 , S2 ) = (s, 0) on X 21 = 0 = 0.806, s = 0, 1.

The exponentials of the above effects can be interpreted as odds ratios as in Table
7.5. The standardized effects are obtained through (7.26).
Remark 7.4 In Table 7.7, the absolute values of effects of (S1 , S2 ) = (i, j), i = 0,1
on X 2k are the same for j = 0,1; k = 1,2, 3, because of (7.24) and the mean of
S2 (= μ2 ) = 21 (Table 7.3). 
Table 7.8 shows the total (direct) effects of latent variable S2 on manifest variables
X 2i , i = 1,2, 3. The effects are calculated with (7.24) and Table 7.4. The standardized
176 7 Path Analysis in Latent Class Models

Table 7.8 The total (direct) effects of latent variable S2 on manifest variables X 2i , i = 1,2, 3
S1 S2 X 21 X 22 X 23 S2 → X 21 S2 → X 22 S2 → X 23
1 1 1 1 1 0.244 0.134 0.256
1 1 0 0 0 −0.473 −0.313 −0.353
1 0 1 1 1 −0.975 −0.536 −1.023
1 0 0 0 0 1.892 1.251 1.413
0 1 1 1 1 1.731 0.860 1.534
0 1 0 0 0 −0.778 −0.704 −0.597
0 0 1 1 1 −0.742 −0.369 −0.658
0 0 0 0 0 0.333 0.302 0.256
Mean effect β2i Cov(X 2i , S2 |S1 ) 0.477 0.212 0.347
eT (S2 → X 2i ) 0.293 0.166 0.238

Table 7.9 The indirect effects of latent variable S1 on manifest variables X 2i , i = 1,2, 3
S1 S2 X 21 X 22 X 23 S2 → X 21 S2 → X 22 S2 → X 23
1 1 1 1 1 0.741 0.369 0.657
1 1 0 0 0 −0.333 −0.301 −0.256
1 0 1 1 1 −0.011 0.033 0.110
1 0 0 0 0 −1.086 −0.637 −0.804
0 1 1 1 1 −0.746 −0.357 −0.621
0 1 0 0 0 −0.028 0.090 −0.012
0 0 1 1 1 −0.243 −0.134 −0.256
0 0 0 0 0 0.473 0.312 0.353
Mean effect 0.151 0.067 0.110
β2i Cov(X 2i , S2 ) − β2i Cov(X 2i , S2 |S1 )
eT (S1 → X 2i ) = e I (S1 → X 2i ) 0.093 0.052 0.075

effects are given by (7.27). By subtracting Table 7.8 from Table 7.7, we have the
indirect effects of latent variable S1 on manifest variables X 2i , i = 1,2, 3, as shown
in Table 7.9.

7.5.2 Model II (Fig. 7.2b)

In order to compare the results of path analysis in Figs. 7.7 and 7.8, the same param-
eters in Table 7.1 are used. In this case, the total effects of latent variable S1 on
manifest variables X 1i , i = 1,2, 3 are the same as in Table 7.5; however, the direct
effects of S1 are calculated according to (7.29), (7.30), and (7.31) and we have Table
7.10. Since according to Fig. 7.8, the total effects of (S1 , S2 ) on X 1i are the same
7.5 Numerical Illustration I 177

Fig. 7.8 Path diagram of


manifest and latent variables
for Numerical Illustration I

Table 7.10 The direct effects of latent variable S1 on manifest variables X 1i , i = 1,2, 3
S2 S1 X 11 X 12 X 13 S1 → X 11 S1 → X 12 S1 → X 13
1 1 1 1 1 0.245 0.228 0.191
1 0 1 1 1 −0.436 −0.974 −0.292
1 1 0 0 0 −1.045 −0.350 −0.304
1 0 0 0 0 1.858 1.492 0.466
0 1 1 1 1 2.228 1.885 0.744
0 0 1 1 1 −0.424 −0.662 −0.145
0 1 0 0 0 −0.783 −0.368 −0.304
0 0 0 0 0 0.149 0.129 0.059
Mean effect β1i Cov(X 1i , S1 |S2 ) 0.458 0.340 0.066
e D (S1 → X 1i ) 0.286 0.250 0.060

as those of S1 on X 1i , by subtracting Table 7.10 from Table 7.5, we have the indi-
rect effects of S1 on X 1i , which are shown in Table 7.11. The effects are also those
of S2 on X 1i . Similarly, the total effects of latent variable S2 on manifest variables
X 2i , i = 1,2, 3 are the same as those of (S1 , S2 ) shown in Table 7.7, and are given in

Table 7.11 The indirect effects of latent variable S1 (S2 ) on manifest variables X 1i , i = 1,2, 3
S2 S1 X 11 X 12 X 13 S1 → X 11 S1 → X 12 S1 → X 13
1 1 1 1 1 0.873 0.977 0.245
1 0 1 1 1 −0.310 0.170 0.001
1 1 0 0 0 2.163 1.556 0.740
1 0 0 0 0 −2.603 −2.296 −0.757
0 1 1 1 1 −3.260 −2.506 −1.060
0 0 1 1 1 1.112 1.076 0.355
0 1 0 0 0 −0.249 −0.253 −0.012
0 0 0 0 0 0.539 0.285 0.151
Mean effect 0.144 0.079 0.024
β1i Cov(X 1i , S1 ) − β1i Cov(X 1i , S1 |S2 )
e I (S1 → X 1i ) = e I (S2 → X 1i ) 0.090 0.055 0.022
178 7 Path Analysis in Latent Class Models

Table 7.12 The total effects of latent variables S2 on manifest variables X 2i , i = 1,2, 3
S2 X 21 X 22 X 23 S2 → X 21 S2 → X 22 S2 → X 13
1 1 1 1 0.985 0.503 0.913
0 1 1 1 −0.985 −0.503 −0.913
1 0 0 0 −0.806 −0.614 −0.609
0 0 0 0 0.806 0.614 0.609
Mean effect β2i Cov(X 2i , S2 ) 0.627 0.279 0.457
eT (S2 → X 2i ) 0.385 0.218 0.314

Table 7.12. The direct effects of S2 on X 2i , i = 1,2, 3 are the same as those in Table
7.8. The indirect effects of S1 on X 2i , i = 1,2, 3 are the same as those calculated in
Table 7.9, and the effects are also the indirect effects of S2 on X 2i , i = 1,2, 3, based
on Fig. 7.8.
The above method is applied to McHugh data, which were on test on creative
ability in machine design (Chap. 2). Assuming latent skills Si for solving subtests
X i , i = 1,2, 3,4, a confirmatory latent class model for explaining a learning structure
is used, and the results of the analysis are given in Table 4.2. From the results, two
latent skills S1 (= S2 ) and S3 (= S4 ) for solving the test can be assumed. In Chap. 4,
assuming learning processes in a population, a path analysis has been performed.
In this section, the model is viewed as a multiple-indicator, multiple-cause model,
and the present method of path analysis is applied. The path diagram of the manifest
and latent variables is shown in Fig. 7.9. By using the present approach, we have the
mean effects (Table 7.13). From the table, the mean total effects of (S1 , S2 ) on X 1
and X 2 are equal to those of S1 , and the mean total effects of (S1 , S2 ) on X 3 and X 4
are equal to those of S2 . The indirect effects of S1 on X i , i = 1,2, 3,4 are equal to
those of S2 . Thus, in the path diagram in Fig. 7.9, the indirect effects are induced by
the association between the latent variables S1 on S2 . Using the mean effects in Table
7.13, the standardized effects are calculated according to the present method (Table
7.14). In order to interpret the effects based on entropy as ECD, Table 7.14 illustrates
the standardized effects of the mean effects shown in Table 7.13. As shown in the
table, the indirect effects are relatively small, for example, in the standardized effects
of S1 on X 1 and X 2 , the indirect effects are moderate, and about 1/4 times the direct
effects.

Fig. 7.9 Path diagram of


manifest and latent variables
for McHugh data
7.6 Path Analysis of the Latent Markov Chain Model 179

Table 7.13 Mean effects of latent variables S1 and S3 on manifest variables X i , i = 1,2, 3,4
Mean effect Manifest variable
X 11 X 21 X 12 X 22
Total effect of S1 and S2 0.519 1.204 1.278 0.454
Total effect of S1 0.519 1.204 0.277 0.099
Direct effect of S1 0.406 0.943 0 0
Indirect effect of S1 0.113 0.261 0.277 0.009
Total effect of S2 0.113 0.261 1.278 0.454
Direct effect of S2 0 0 1.001 0.356
Indirect effect of S2 0.113 0.261 0.277 0.009

Table 7.14 Standardized effects of latent variables S1 and S3 on manifest variables X i , i = 1,2, 3,4
Standardized effect Manifest variable
X 11 X 21 X 12 X 22
Total effect of S1 and S2 0.342 0.546 0.561 0.312
Total effect of S1 0.342 0.546 0.122 0.068
Direct effect of S1 0.268 0.428 0 0
Indirect effect of S1 0.074 0.118 0.122 0.068
Total effect of S2 0.074 0.118 0.561 0.312
Direct effect of S2 0 0 0.439 0.245
Indirect effect of S2 0.074 0.118 0.122 0.068

7.6 Path Analysis of the Latent Markov Chain Model

The present path analysis is applied to the latent Markov chain model treated in
Chap. 5. As in Sect. 5.2 in Chap. 5, let X t be manifest variables that take values on
sample space mani f est = {1,2, . . . , J } at time points t = 1,2, . . . , and let St be the
corresponding latent variables on sample space latent = {1,2, . . . , A}, which are
assumed a first-order time-homogeneous Markov chain. Let m ab , a, b = 1,2, . . . , A
be the transition probabilities; let qa , a = 1,2, . . . , A be the probabilities of S1 = a,
that is, the initial state distribution; and let pa j be the probabilities of X t = j, given
St = a, that is, pa j = P(X t = j|St = a) and the probabilities are independent of
time t. In order to make a general discussion, the following dummy variables for
manifest and latent categories are introduced. Let
 
1 for X t = j, 1 for St = a,
Xt j = and Sta =
0 otherwise, 0 otherwise.

Then, manifest and latent variables X t and St are identified to the following
dummy variable vectors, respectively:
180 7 Path Analysis in Latent Class Models

X t = (X t1 , X t2 , . . . , X t J )T and St = (St1 , St2 , . . . , St A )T .

For convenience of the discussion, based on the above identification, transition


probabilities m ab and response probabilities pa j are expressed as m st−1 st and pst x t ,
respectively, that is, if dummy state vectors st−1 and st have elements st−1,a = 1
and stb = 1, respectively, it implies that m st−1 st = m ab ; and if dummy state vector
st and dummy response vector x t have elements sta = 1 and xt j = 1, it means that
pst x t = pa j . Let the transition matrix and the response matrix be denoted by
 
M = (m ab ) and P = pa j ,

respectively. Then, the probabilities are re-expressed as follows:


   
exp αx t + stT Bx t exp γ st + st−1T
st
p st x t =   and m st−1 st =   ,
st exp αx t + s t Bx t st exp γ s t + s t−1 s t
T T

where
⎛ ⎞ ⎛ ⎞
α1 β11 β12 · · · β1J
⎜ α2 ⎟ ⎜ β21 β22 · · · β2J ⎟
⎜ ⎟ ⎜ ⎟
α=⎜ .. ⎟, B = ⎜ .. .. . . .. ⎟, (7.32)
⎝ . ⎠ ⎝ . . . . ⎠
αJ β A1 β A1 · · · β A J
⎛ ⎞ ⎛ ⎞
γ1 δ11 δ12 · · · δ1A
⎜ γ2 ⎟ ⎜ δ21 δ22 · · · δ2 A ⎟
⎜ ⎟ ⎜ ⎟
γ =⎜ .. ⎟,  = ⎜ .. .. . . .. ⎟. (7.33)
⎝ . ⎠ ⎝ . . . . ⎠
γA δ A1 δ A1 · · · δ A A

Figure 7.10 shows the path diagram of the latent Markov chain model treated
above. According to the model, the following sequence is a Markov chain:

S1 → S2 → S3 → · · · → St → X t . (7.34)

Fig. 7.10 The latent Markov chain model


7.6 Path Analysis of the Latent Markov Chain Model 181

First, the effects of latent variables Su , u = 1,2, . . . , t on manifest variable X t


are discussed. For simplicity of the discussion, setting t = 3, the effects of the
latent variables are calculated. According to path analysis [9], the total effect of
(S1 , S2 , S3 ) = (s1 , s2 , s3 ) on X 3 = x 3 is computed as follows:
 
s3T − μ3T B(x 3 − ν 3 ), (7.35)

where

μt = E(St ), ν t = E(X t ).

The total effects of (S2 , S3 ) = (s2 , s3 ) on X 3 = x 3 at S1 = s1 are calculated by


 
s3T − μ3T (s1 ) B(x 3 − ν 3 (s1 )), (7.36)

where

μ3 (s1 ) = E(S3 |S1 = s1 ), ν 3 (s1 ) = E(X 3 |S1 = s1 ).

The sequence (7.34) is a Markov chain, so the direct effects of St = st , t = 1,2


on X 3 = x 3 are zeroes. Subtracting (7.36) from (7.35), it follows that

The total (indirect) effect of S1 = s1 on X 3 = x 3 through (S2 , S3 ) = (s2 , s3 )


= (the total effect of (S1 , S2 , S3 ) = (s1 , s2 , s3 ) on X 3 = x 3 )
− (the total effect of (S2 , S3 ) = (s2 , s3 ) on X 3 = x 3 at S1 = s1 )
   
= s3T − μ3T B(x 3 − ν 3 ) − s3T − μ3T (s1 ) B(x 3 − ν 3 (s1 )). (7.37)

Remark 7.5 Since sequence in (7.34) is a Markov chain, it follows that

The total (indirect) effect of S1 = s1 on X 3 = x 3 through (S2 , S3 ) = (s2 , s3 )


= The total (indirect) effect of S1 = s1 on X 3 = x 3 through S3 = s3 .


Since the total effect of S3 = s3 on X 3 = x 3 at (S1 , S2 ) = (s1 , s2 ) is given by
 
s3T − μ3T (s2 ) B(x 3 − ν 3 (s2 )), (7.38)

the total (indirect) effect of S2 = s2 on X 3 = x 3 at S1 = s1 through S3 = s3 is


calculated by
182 7 Path Analysis in Latent Class Models

(the total effect of (S2 , S3 ) = (s2 , s3 ) on X 3 = x 3 at S1 = s1 )


− (the total effect of S3 = s3 on X 3 = x 3 at (S1 , S2 ) = (s1 , s2 ))

   
= s3T − μ3T (s1 ) B(x 3 − ν 3 (s1 )) − s3T − μ3T (s2 ) B(x 3 − ν 3 (s2 )).

Similarly, in the following sequence:

S1 → S2 → S3 ,

the effects of S1 and S2 on S3 are computed as follows. Since the above sequence
is a Markov chain, the direct effects of S1 on S3 are zeroes. The total effect of
(S1 , S2 ) = (s1 , s2 ) on S3 = s3 is given by
 
s2T − μ2T (s3 − μ3 ). (7.39)

The total (direct) effect of S2 = s2 on S3 = s3 at S1 = s1 is calculated by


 
s2T − μ2T (s1 ) (s3 − μ3 (s1 )). (7.40)

From this, the total (indirect) effect of S1 = s1 on S3 = s3 through S2 = s2 is


obtained by subtracting (7.40) from (7.39), that is,
   
s2T − μ2T (s3 − μ3 ) − s2T − μ2T (s1 ) (s3 − μ3 (s1 )).

Finally, we have the total (direct) effect of S1 = s1 on S2 = s2 as


 
s1T − μ1T (s2 − μ2 ).

In the next section, the above path analysis for the latent Markov chain is
demonstrated by using artificial data.
Remark 7.6 In order to determine regression parameters βa j and δab in (7.32) and
(7.33), respectively, we have to put a constraint on the parameters. In this section,
we set

β j1 = β1 j = 0, j = 1,2, . . . , J ; δa1 = δ1a = 0, a = 1,2, . . . , A.

Then, we have
pa j p11 m ab m 11
βa j = log , j = 2,3, . . . , J ; δab = log , b = 2,3, . . . , A. (7.41)
pa1 p1 j m a1 m 1b


Remark 7.7 In sequence (7.34), we have
7.6 Path Analysis of the Latent Markov Chain Model 183

The total effect of S1 = s1 on X t = x t through (S2 , . . . , St ) = (s2 , . . . , st )


   
= stT − μtT B(x t − ν t ) − stT − μtT (s1 ) B(x t − ν t (s1 )),

and the total effect of Su = su on X t = x t at (S1 , . . . , Su−1 ) = (s1 , . . . , su−1 )


through St = st is calculated by
   
stT − μtT (su−1 ) B(x t − ν t (su−1 )) − stT − μtT (su ) B(x t − ν t (su )), (7.42)

where

μtT (sk ) = skT M t−k , ν t (sk ) = skT M t−k P, k = 1,2, . . . , t − 1. (7.43)

Since state vector skT is one of the following A unit vectors:

(1,0, . . . , 0), (0,1, 0, . . . , 0), (0, . . . , 0.1),

the conditional distribution vectors μtT (sk ) and ν t (sk ) are given by appropriate rows
of matrices M t−k and M t−k P, respectively. For example, for skT = (1,0, . . . , 0),
μtT (sk ) and ν t (sk ) are obtained as the first rows of the matrices, respectively. If the
Markov chain with transition matrix M is irreducible and recurrent, we have
⎛ ⎞
π1 π2 · · · πA
⎜ π1 π2 · · · πA ⎟
⎜ ⎟
M t−k →  = ⎜ .. .. .. ⎟, as t → ∞, (7.44)
⎝ . . ··· . ⎠
π1 π2 · · · πA

where


A
πa ≥ 0, a = 1,2, . . . , A; πa = 1.
a=1

Hence, for fixed integer u the effects in (7.42) tend to zeroes as t → ∞.

7.7 Numerical Illustration II

In the model treated in the previous section, for A = J = 3, let the response
probability and latent transition matrices be set as
184 7 Path Analysis in Latent Class Models
⎛ ⎞ ⎛ ⎞
p11 p12 p13 0.8 0.1 0.1
⎝ p21 p22 p23 ⎠ = ⎝ 0.2 0.7 0.1 ⎠,
p31 p32 p33 0.1 0.2 0.7
⎛ ⎞ ⎛ ⎞
m 11 m 12 m 13 0.6 0.3 0.1
⎝ m 21 m 22 m 23 ⎠ = ⎝ 0.2 0.7 0.1 ⎠, (7.45)
m 31 m 32 m 33 0.1 0.3 0.6

respectively; and for the initial distribution of latent state S1 = (S11 , S12 , S13 ),

μ1 = (μ11 , μ12 , μ13 )T = (0.3,0.6,0.1)T . (7.46)

The path analysis in Sect. 7.6 is demonstrated. According to the eigenvalue


decomposition of transition matrix (7.45), we have
⎛ ⎞ ⎛ ⎞
m 11 m 12 m 13 0.577 0.236 0.577
⎝ m 21 m 22 m 23 ⎠ = ⎝ 0.577 0.236 −0.577 ⎠
m 31 m 32 m 33 0.577 −0.943 0.577
⎛ ⎞⎛ ⎞−1
1 0 0 0.577 0.236 0.577
⎝ 0 0.5 0 ⎠⎝ 0.577 0.236 −0.577 ⎠ ,
0 0 0.4 0.577 −0.943 0.577

and thus, for integer t, it follows that


⎛ ⎞t ⎛ ⎞⎛ ⎞⎛ ⎞−1
m 11 m 12 m 13 0.577 0.236 0.577 1 0 0 0.577 0.236 0.577
⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟
⎝ m 21 m 22 m 23 ⎠ = ⎝ 0.577 0.236 −0.577 ⎠⎝ 0 0.5t 0 ⎠⎝ 0.577 0.236 −0.577 ⎠ ,
m 31 m 32 m 33 0.577 −0.943 0.577 0 0 0.4t 0.577 −0.943 0.577
⎛ ⎞
0.2 × 0.5t + 0.5 × 0.4t + 0.3 −0.5 × 0.4t + 0.5 −0.2 × 0.5t + 0.2
⎜ ⎟
= ⎝ 0.2 × 0.5t − 0.5 × 0.4t + 0.3 0.5 × 0.4t + 0.5 −0.2 × 0.5t + 0.2 ⎠.
0.5 × 0.4t − 0.8 × 0.5t + 0.3 −0.5 × 0.4t + 0.5 0.8 × 0.5 + 0.2
t

The above matrices imply the conditional distribution of St+u =


 T
St+u,1 , St+u,2 , St+u,3 given Su . As shown in (7.44), as integer t goes to infinity,
we have
⎛ ⎞t ⎛ ⎞
m 11 m 12 m 13 0.3 0.5 0.2
⎝ m 21 m 22 m 23 ⎠ → ⎝ 0.3 0.5 0.2 ⎠.
m 31 m 32 m 33 0.3 0.5 0.2

At time t, the distribution of St = (St1 , St2 , St3 , )T is calculated as


7.7 Numerical Illustration II 185

⎛ ⎞t−1
m 11 m 12 m 13
μtT = (μt1 , μt2 , μt3 ) = μ1T ⎝ m 21 m 22 m 23 ⎠
m 31 m 32 m 33
 
= 0.1 × 0.5 − 0.1 × 0.4t−1 + 0.3,0.1 × 0.4t−1 + 0.5,0.2 − 0.1 × 0.5t−1 ,
t−1

and the marginal distribution of X t = (X t1 , X t2 , X t3 , ) is calculated as


⎛ ⎞t−1 ⎛ ⎞
m 11 m 12 m 13 p11 p12 p13
ν tT = μ1T ⎝ m 21 m 22 m 23 ⎠ ⎝ p21 p22 p23 ⎠
m 31 m 32 m 33 p31 p32 p33

= 0.07 × 0.5 − 0.06 × 0.4t−1 + 0.36, 0.06 × 0.4t−1
t−1

− 0.02 × 0.5t−1 + 0.44, 0.2 − 0.05 × 0.5t−1 .

By using (7.41), regression matrix B can be obtained as follows:


⎛ ⎞
0 0 0
B = ⎝ 0 3.332 1.386 ⎠.
0 2.773 4.025

In the present example, state vector skT in (7.43) is one of the following unit
vectors:

(1,0, 0), (0,1, 0), (0,0.1).

To demonstrate the approach explained in the previous section, for t = 3 in (7.34),


the effects of latent variables
 Si , i = 1,2, 3 on X 3 are calculated. First, the total effects
of the latent variables, s3T − μ3T B(x 3 − ν 3 ), in (7.35) are computed by using the
following matrix calculation:
⎛ ⎞⎛ ⎞⎛ ⎞
0.7 −0.6 −0.1 0 −2.079 −2.079 0.632 −0.368 −0.368
⎝ −0.3 0.4 −0.1 ⎠⎝ 0 1.253 −0.693 ⎠⎝ −0.427 0.573 −0.427 ⎠
−0.3 −0.6 0.9 0 0.693 1.946 −0.205 −0.205 0.795
⎛ ⎞
1.225 −1.051 −0.009
= ⎝ −0.482 0.574 −0.330 ⎠. (7.47)
−0.784 −0.288 2.007

The above effects are those of (S1 , S2 , S3 ) on X 3 , for example, for S3 = (0,1, 0)T ,
the effect on X 3 = (0,0, 1)T is −0.330, which is in the second row and the third
column. Since sequence S1 , S2 , S3 , and X 3 is a Markov chain, the effects in (7.47) are
independent of latent states S1 and S2 . The mean total effect of elements in (7.47) can
be obtained as 0.633. Hence, the entropy coefficient of determination of explanatory
186 7 Path Analysis in Latent Class Models

variables Si , i = 1,2, 3 for response variable X 3 is

0.633
EC D((S1 , S2 , S3 ), X 3 ) = EC D((S3 , X 3 ), X 3 ) = = 0.387.
0.633 + 1

From path analysis [9], the above quantity is the standardized summary total
effect of (S1 , S2 , S3 ) on X 3 , and is denoted by eT ((S1 , S2 , S3 ) → X 3 ). Similarly,
from (7.36) the total effects of (S2 , S3 ) = (s2 , s3 ) on X 3 = x 3 are given as follows:
⎛ ⎞
0.892 −0.922 −0294
⎝ −0.595 0.922 −0.394 ⎠ at S1 = (1,0, 0)T ; (7.48a)
−0.891 0.066 1.949
⎛ ⎞
1.355 −0.994 −0.053
⎝ −0.451 0.532 −0.473 ⎠ at S1 = (0,1, 0)T ; (7.48b)
−0.694 0.270 1.924
⎛ ⎞
1.729 −0.7804 −0.464
⎝ −0.045 0.775 −0.855 ⎠ at S1 = (0,0, 1)T . (7.48c)
−0.727 0.463 1.106

In the above effects, for example, for s1 = (0,1, 0)T , s3 = (0,0, 1)T , and x 3 =
(1,0, 0)T , the total effect is −0.694 which is in the third row and first column of the
matrix (7.48b). Sequence S1 , S2 , S3 , and X 3 is a Markov chain, so the above effects
of (S2 , S3 ) = (s2 , s3 ) on X 3 = x 3 are independent of latent state S2 . The conditional
means of the above effects given S1 = s1 are obtained by
⎧  
 S1 = (1,0, 0)T 
T
   ⎨ 0.652
E S3T − μ3T (s1 ) B(X 3 − ν 3 (s1 ))|s1 = 0.584 .
⎩  S1 = (0,1, 0)T 
0.658 S1 = (0,0, 1)

Since the distribution of S1 is in (7.46), we have that the mean total effect of
(S2 , S3 ) on X 3 is computes by

0.652 × 0.3 + 0.584 × 0.6 + 0.658 × 0.3 = 0.612. (7.49)

Subtracting the effects in (7.48a)–(7.48c) from those in (7.47), from (7.37), the
total (indirect) effects of S1 = s1 on X 3 = x 3 through (S2 , S3 ) = (s2 , s3 ) are
obtained by
⎛ ⎞
0.334 −0.127 0.285  
⎝ 0.113 −0.348 0.064 ⎠ S1 = (1,0, 0)T ; (7.50a)
0.107 −0.354 0.058
7.7 Numerical Illustration II 187
⎛ ⎞
−0.130 −0.058 0.044  
⎝ −0.031 0.041 0.143 ⎠ S1 = (0,1, 0)T ; (7.50b)
−0.090 −0.018 0.083
⎛ ⎞
−0.503 −0.271 0.455  
⎝ −0.433 −0.201 0.525 ⎠ S1 = (0,0, 1)T . (7.50c)
−0.057 0.175 0.901

In the above matrices, for s1 = (0,0, 1)T , s3 = (0,0, 1)T , and x 3 = (0,1, 0)T ,
the total effect is calculated in the third row the second column in matrix (7.50c)
and given by 0.175. The above effects are independent of latent states S2 , because
sequence S1 , S2 , S3 , and X 3 is a Markov chain. The summary total effect of S1 on
X 3 through (S2 , S3 ) is given by subtracting that of (S2 , S3 ), i.e., 0.612, from that of
(S1 , S2 , S3 ), i.e., 0.633. Hence, the effect is

0.633 − 0.612 = 0.021.

From this, the standardized summary total (indirect) effect of S1 on X 3 is


computed as

0.021
eT (S1 → X 3 ) = = 0.013.
0.633 + 1

By using (7.38), the total (direct) effect of S3 = s3 on X 3 = x 3 at (S1 , S2 ) =


(s1 , s2 ) is calculated as follows:
⎛ ⎞
0.501 −0.776 −0.317
⎝ −0.687 1.368 −0.119 ⎠ at S2 = (1,0, 0)T , (7.51a)
−0.947 0.549 2.260
⎛ ⎞
1.603 −1.007 0.230
⎝ −0.385 0.337 −0.372 ⎠ at S2 = (0,1, 0)T , (7.51b)
−0.511 −0.348 2.142
⎛ ⎞
2.208 −0.455 −0.623
⎝ 0.437 1.106 −1.008 ⎠ at S2 = (0,0, 1)T . (7.51c)
−0.587 −0.477 0.608

In the above effects, for s2 = (0,1, 0)T , s3 = (1,0, 0)T , and x 3 = (0,0, 1)T , the
effect is in the first row and the third column in matrix (7.51b) and is 0.230. Since
the marginal distribution of S2 is given by

μ2T = μ1T M = (0.31,0.54,0.15),


188 7 Path Analysis in Latent Class Models

from the same way as in (7.49), we have the conditional mean of the effects in
(7.51a)–(7.51c) as

0.577 × 0.31 + 0.464 × 0.54 + 0.557 × 0.15 = 0.513.

From this we have


0.513
eT (S3 → X 3 ) = = 0.314.
0.633 + 1

The total (indirect) effects of S2 = s2 on X 3 = x 3 at S1 = s1 through S3 = s3


are calculated as follows. Subtracting the effects in (7.51a) from those in (7.48a), we
have the total (indirect) effects of S2 = s2 on X 3 = x 3 at S1 = (1,0, 0)T through
S3 = (1,0, 0)T :
⎛ ⎞
0.390 −0.148 0.023
⎝ 0.092 −0.446 −0.275 ⎠ at S1 = (1,0, 0)T through S3 = (1,0, 0)T .
0.056 −0.482 −0.312

In the above effects, the effect of S2 = (0,1, 0)T on X 3 = (1,0, 0) is in the


second row and the first column, and is given by 0.092. Similarly, the other effects
can be calculated. For example, the total (indirect) effects of S2 = s2 on X 3 = x 3
at S1 = (1,0, 0)T through S3 = (0,1, 0)T are obtained by subtracting the effects in
(7.51a) from those in (7.48b), and we have
⎛ ⎞
0.854 −0.218 0.264
⎝ 0.236 −0.836 −0.354 ⎠ at S1 = (1,0, 0)T through S3 = (0,1, 0)T .
0.253 −0.279 −0.336

From the effects calculated above, the summary effect is calculated as

0.612 − 0.513 = 0.099.

and the standardized summary total (indirect) effect is given by

0.099
eT (S2 → X 3 ) = = 0.061.
0.633 + 1

As shown above, calculation of the effects of latent variables St , t = 1,2, 3 on X 3


has been demonstrated. Similarly, the effects of Si on S j can also be computed. The
summary total effects eT (St → X 3 ) in this example, as expected from the following
relation:

S1 → S2 → S3 → X 3 ;
7.7 Numerical Illustration II 189

the following inequality holds true:

eT (S1 → X 3 ) < eT (S2 → X 3 ) < eT (S3 → X 3 ).

7.8 Discussion

In this chapter, path analysis has been made in latent class models, i.e., multiple-
indicator, multiple-cause models and the latent Markov chain model. In path analysis,
it is critical how the effects of variables are measured and also how the total effects of
variables are decomposed into the sums of the direct and indirect effects. Although
the approach is significant for discussing causal systems of categorical variables,
path analysis of categorical variables is more complicated than that of continuous
variables, because in the former analysis the effects of categories of parent vari-
ables on those of descendant variables have to be calculated. In order to assess the
effects and to summarize them, in this chapter, an entropy-based path analysis [9]
has been applied to latent class models in a GLM framework. In the approach, the
total and direct effects are defined through log odds ratio and the effects can be inter-
preted in information (entropy). From this, although the indirect effects are defined
by subtracting the direct effects from the total effects, the effects can also be inter-
preted in information. This point is significant for putting the analysis into practice.
Measuring pathway effects based on the method of path analysis is important as
well, and further development of pathway effect analysis is left to readers. More-
over, applications of the present approach to practical latent class analyses are also
expected in future studies.

References

1. Albert, J. M., & Nelson, S. (2011). Generalized causal mediation analysis. Biometrics, 1028–
1038.
2. Bentler, P. M., & Weeks, D. B. (1980). Linear structural equations with latent variables.
Psychometrika, 45, 289–308.
3. Christoferson, A. (1975). Factor analysis of dichotomous variables. Psychometrika, 40, 5–31.
4. Eshima, N., & Tabata, M. (1999). Effect analysis in loglinear model approach to path analysis
of categorical variables. Behaviormetrika, 26, 221–233.
5. Eshima, N., & Tabata, M. (2007). Entropy correlation coefficient for measuring predictive
power of generalized linear models. Statistics and Probability Letters, 77, 588–593.
6. Eshima, N., & Tabata, M. (2010). Entropy coefficient of determination for generalized linear
models. Computational Statistics and Data Analysis, 54, 1381–1389.
7. Eshima, N., Asano, C., & Obana, E. (1990). A latent class model for assessing learning
structures. Behaviormetrika, 28, 23–35.
8. Eshima, N., Tabata, M., & Geng, Z. (2001). Path analysis with logistic regression models:
Effect analysis of fully recursive causal systems of categorical variables. Journal of the Japan
Statistical Society, 31, 1–14.
190 7 Path Analysis in Latent Class Models

9. Eshima, N., Tabata, M., Borroni, C. G., & Kano, Y. (2015). An entropy-based approach to path
analysis of structural generalized linear models: A basic idea. Entropy, 17, 5117–5132.
10. Fienberg, S. E. (1991). The analysis of cross-classified categorical data (2nd ed.). Cambridge,
England: The MIT Press.
11. Goodman, L. A. (1973b). The analysis of multidimensional contingency tables when some
variables are posterior to others: A modified path analysis approach. Biometrika, 60, 179–192.
12. Goodman, L. A. (1973a). Causal analysis of data from panel studies and other kinds of surveys.
American Journal of Sociology, 78, 1135–1191.
13. Goodman, L. A. (1974). The analysis of systems of qualitative variables when some of the
variables are unidentifiable: Part I. A modified latent structure approach. American Journal of
Sociology, 79, 1179–1259.
14. Hagenaars, J. A. (1998). Categorical causal modeling: Latent class analysis and directed
loglinear models with latent variables. Sociological Methods & Research, 26, 436–489.
15. Jöreskog, K.G., & Sörbom, D. (1996). LISREL8: user’s reference guide (2nd ed.). Chicago:
Scientific Software International.
16. Kuha, J., & Goldthorpe, J. H. (2010). Path analysis for discrete variables: The role of education
in social mobility. Journal of Royal Statistical Society, A, 173, 351–369.
17. Lazarsfeld, P. F. (1948). The use of panels in social research. Proceedings of the American
Philosophical Society, 92, 405–410.
18. Macready, G. B. (1982). The use of latent class models for assessing prerequisite relations and
transference among traits, Psychometrika, 47, 477-488.
19. McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London: Chapman
and Hall.
20. Muthen, B. (1978). Contribution of factor analysis of dichotomous variables. Psychometrika,
43, 551–560.
21. Muthen, B. (1984). A general structural equation model with dichotomous ordered categorical
and continuous latent variable indicators. Psychometrika, 49, 114–132.
22. Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear model. Journal of the Royal
Statistical Society A, 135, 370–384.
23. Owston, R. D. (1979). A maximum likelihood approach to the “test of inclusion.” Psychome-
trika, 44, 421–425.
24. White, R. T., & Clark, R. M. (1973). A test of inclusion which allows for errors of measurement.
Psychometrika, 38, 77–86.
25. Wright, S. (1934). The method of path coefficients. The Annals of Mathematical Statistics, 5,
161–215.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy