Factor Analysis
Factor Analysis
• The modern beginnings of factor analysis lie in the early 20th-century attempts of Karl
Pearson, Charles Spearman, and others to define and measure intelligence.
• The essential purpose of factor analysis is to describe the covariance relationships among
many variables in terms of a few underlying, but unobservable, random quantities called
factors.
• For example, correlations from the group of test scores in classics, French, English, math-
ematics, and music collected by Spearman suggested an underlying "intelligence" factor.
A second group of variables, representing physical-fitness scores, if available, might cor-
respond to another factor.
• Factor analysis can be considered an extension of principal component analysis. Both can
be viewed as attempts to approximate the covariance matrix Σ. However, the approxima-
tion based on the factor analysis model is more elaborate.
The coefficient ℓij is called the loadings of the ith variable on the jth factor.
• In matrix notation
X − µ = LF + ϵ
wher L is the matrix of factor loadings.
• Compare with the multivariate regression model:
(notice: observable v.s. unobservable)
• Additional assumptions:
1
X − µ = LF + ϵ
µi : mean of variable i
ϵ: ith specific factor
Fj : jth common factor
ℓij : loadings of the ith variable on the jth factor
1. Cov(X) = LL′ + Ψ
or
V ar(Xi ) = ℓ2i1 + · · · + ℓ2im + ψi
Cov(Xi , Xk ) = ℓi1 ℓk1 + · · · + ℓim ℓkm
2. Cov(X, F) = F
or
Cov(Xi , Fj ) = ℓij
2.
1 0.9 0.7
Σ = 0.9 1 0.4
0.7 0.4 1
3 Model Calibration
3.1 Methods of Estimation
• Given observations x1 , x2 , · · · , xn on p generally correlated variables.
• From covariance structure, Σ = LL′ + Ψ, we esitimate L and Ψ
• Given the estimates of L and Ψ, apply the linear factor model, we got the esti-
mates of F (factor scores).
• Most popular methods of estimating L and Ψ are:
2
– The pincipal component method
– The maximum likelihood method
Σ = λ1 e1 e′1 + · · · + λp ep e′p
• We prefer models that explain the covariance structure in terms of just a few
common factors:
√ ψ1 0 · · · 0
λ1 e1 0 ψ2 · · ·
. p p .. 0
Σ = [ λ1 e1 · · · λm em ] + .. .. .. ..
√ . . . . .
λm em
0 0 ··· ψp
LL′ + Ψ
=
Pm
where ψi = σii − j=1 ℓ2ij , i = 1, 2, · · · , p.
3
5 A Modified Approach: the Iterated Principal Factor Solution
Rr is factored as
.
Rr = L∗r L∗r ′
7 Comments:
• Choices for initial estimates of specific variances : ψi∗ = 1/rii , where, rii is the
ith diagonal element of R−1 . The initial communality estimates then become
h∗2 ∗
i = 1 − ψi = 1 − r ii which is equal to the square of the multiple correlation
1
4
8 The Maximum Likelihood Method
• Assume F ∼ Nm (0, Im ), ϵ ∼ Np (0, Ψ) and F and ϵ are independent. Then
X ∼ Np (µ, LL′ + Ψ).
1X
n
n
− log|LL′ + Ψ| − (xi − µ)′ (LL′ + ψ)−1 (xi − µ)
2 2 i=1
• The maximum likelihood estimators L̂, Ψ̂ and µ = x̄ maximize the log likeli-
hood subject to L̂′ Ψ̂L̂ being diagonal.
• Proportion of total sample variance due to jth factor:
ℓ̂21j + · · · + ℓ̂2pj
s11 + · · · + spp
• The variables are standardized , Z = V−1/2 (X − µ), then the covariance matrix
ρ has the representation
where V̂−1/2 and L̂ are the maximum likelihood estimators of V−1/2 and L
respectively.
• Proportion of total standardized sample variance due to jth factor is
ℓ̂21j + · · · + ℓ̂2pj
p
5
9 A Large Sample Test for the Number of Common Factors
|L̂L̂′ + Ψ̂|
(n − 1 − (2p + 4m + 5)/6)ln( ) > χ21/2[(p−m)2 −p−m] (α)
|Sn |
√
provided that m < 1/2(2p + 1 − 8p + 1)
• Kaiser proposed varimax criterion: define ℓe∗ij = ℓ̂∗ij /ĥi to be the rotated coef-
ficients scaled by the square root of the communalities. Select the orthogonal
Pm Pp Pp e∗2 2
transformation T that makes V = p1 j=1 [ i=1 ℓe∗4 ij − ( i=1 ℓij ) /p] as large
as possible.
6
9.2 Factor Scores
• Factor scores are estimates of values for the unobserved random factor vectors
Fj , j = 1, 2, · · · , n. That is, factor scores
• The estimation situation is complicated by the fact that the unobserved quantities
fj and ϵj outnumbered the observed xj .
• Treat the estimated factor loadings ℓ̂ij and specific variances ψ̂i as if they were
the true values.
• we will not differentiate between original estimated loadings and estimated ro-
tated loadings.
X − µ = LF + ϵ
Xp
ϵ2i
= (x − µ − Lf )′ Ψ−1 (x − µ − Lf )
i=1
ψ i
• If rotated loadings L∗ = L̂T are used in place of the original loadings, the
subsequent factor scores, f̂j∗ , are related to f̂j by f̂j∗ = T′ f̂j , j = 1, 2, · · · , n.
• If the factor loadings are estimated by the principal component method, it is
customary to generate factor scores using unweighted (ordinary) least squares
procedure.
e ′ L)
f̂j = (L e −1 L
e ′ (xj − x̄)
• When the common factors F and the specific factors ϵ are jointly normally dis-
tributed:
X − µ = LF + ϵ ∼ Np (0, LL′ + Ψ)
7
• Moreover, the joint distribution of (X − µ) and F is Nm+p (0, Σ∗ ), where
′
LL + Ψ L
Σ∗ =
L′ I
• Given any vector of observations, and taking the MLE L and Ψ as the true values,
we see that the jth factor score vector is given by
• The relationship between weighted least squares estimates f̂jLS and the regres-
sion estimates f̂jR is
f̂jLS = (I + (L̂′ Ψ̂−1 L̂)−1 )f̂jR
• Example 9.12
• Example 9.13
• Repeat the first three steps for other numbers of common factors m
• For large data sets, split them in half and perform a factor analysis on each part.