0% found this document useful (0 votes)
8 views8 pages

Factor Analysis

The document provides an overview of factor analysis, detailing its historical context, mathematical models, and estimation methods. It explains the orthogonal factor model, methods for estimating factor loadings and specific variances, and introduces the principal component solution and maximum likelihood method. Additionally, it discusses factor rotation and the estimation of factor scores, emphasizing the importance of understanding covariance structures in various data applications.

Uploaded by

vanjunxin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views8 pages

Factor Analysis

The document provides an overview of factor analysis, detailing its historical context, mathematical models, and estimation methods. It explains the orthogonal factor model, methods for estimating factor loadings and specific variances, and introduces the principal component solution and maximum likelihood method. Additionally, it discusses factor rotation and the estimation of factor scores, emphasizing the importance of understanding covariance structures in various data applications.

Uploaded by

vanjunxin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1 Introduction

• The modern beginnings of factor analysis lie in the early 20th-century attempts of Karl
Pearson, Charles Spearman, and others to define and measure intelligence.
• The essential purpose of factor analysis is to describe the covariance relationships among
many variables in terms of a few underlying, but unobservable, random quantities called
factors.
• For example, correlations from the group of test scores in classics, French, English, math-
ematics, and music collected by Spearman suggested an underlying "intelligence" factor.
A second group of variables, representing physical-fitness scores, if available, might cor-
respond to another factor.
• Factor analysis can be considered an extension of principal component analysis. Both can
be viewed as attempts to approximate the covariance matrix Σ. However, the approxima-
tion based on the factor analysis model is more elaborate.

2 The Orthogonal Factor Model


• The observable random vector X, with p components, has mean µ and covari-
ance matrix Σ. The factor model postulates that X is linearly dependent upon a
few unobservable random variables F1 , F2 , · · · , Fm , called common factors,
and p additional sources of variation ϵ1 , ϵ2 , · · · , ϵp , called errors or, sometimes,
specific factors.
• The factor model is:

X1 − µ1 = ℓ11 F1 + ℓ12 F2 + · · · + ℓ1m Fm + ϵ1


X2 − µ2 = ℓ21 F1 + ℓ22 F2 + · · · + ℓ2m Fm + ϵ2
..
.
Xp − µp = ℓp1 F1 + ℓp2 F2 + · · · + ℓpm Fm + ϵp

The coefficient ℓij is called the loadings of the ith variable on the jth factor.

• In matrix notation
X − µ = LF + ϵ
wher L is the matrix of factor loadings.
• Compare with the multivariate regression model:
(notice: observable v.s. unobservable)

• Additional assumptions:

– E(F) = 0, Cov(F) = E[FF′ ] = I


– E(ϵ) = 0, Cov(ϵϵ′ ) = E[ϵϵ′ ] = Ψ = diag(ψ1 , ψ2 , · · · , ψp )
– Cov(ϵ, F) = E(ϵF′ ) = 0

1
X − µ = LF + ϵ
µi : mean of variable i
ϵ: ith specific factor
Fj : jth common factor
ℓij : loadings of the ith variable on the jth factor

The unobservable random vectors F and ϵ satisfy the following conditions:


E(F) = 0, Cov(F) = I
E(ϵ) = 0, Cov(ϵ) = Ψ, where Ψ is a diagonal matrix.
F and ϵ are independent.

1. Cov(X) = LL′ + Ψ
or
V ar(Xi ) = ℓ2i1 + · · · + ℓ2im + ψi
Cov(Xi , Xk ) = ℓi1 ℓk1 + · · · + ℓim ℓkm

2. Cov(X, F) = F
or
Cov(Xi , Fj ) = ℓij

Consider the following covariance matrix:


1.  
19 30 2 12
30 57 5 23
Σ=
2

5 38 47
12 23 47 68

2.  
1 0.9 0.7
Σ = 0.9 1 0.4
0.7 0.4 1

3 Model Calibration
3.1 Methods of Estimation
• Given observations x1 , x2 , · · · , xn on p generally correlated variables.
• From covariance structure, Σ = LL′ + Ψ, we esitimate L and Ψ

• Given the estimates of L and Ψ, apply the linear factor model, we got the esti-
mates of F (factor scores).
• Most popular methods of estimating L and Ψ are:

2
– The pincipal component method
– The maximum likelihood method

• Let Σ have eigenvalue-eigenvector paris (λi , ei ) with λ1 ≥ λ2 · · · λp ≥ 0, then

Σ = λ1 e1 e′1 + · · · + λp ep e′p

• We prefer models that explain the covariance structure in terms of just a few
common factors:
 
√  ψ1 0 · · · 0
λ1 e1  0 ψ2 · · · 
. p p  ..   0 
Σ = [ λ1 e1 · · · λm em ]   +  .. .. .. .. 
√ .  . . . . 
λm em
0 0 ··· ψp
LL′ + Ψ
=
Pm
where ψi = σii − j=1 ℓ2ij , i = 1, 2, · · · , p.

4 Pricipal Component Solution of The Factor Model


• Given a data set x1 , x2 , · · · , xn . First, centered the observations: xj −x̄, j = 1, 2, · · · , n,
x −x̄1 x −x̄p ′
or standardized the data zj = [ j1 √
s11
, · · · , jp

spp
] , j = 1, 2, · · · , n
• The representation of covariance structure applied to the sample covariance matrix S or
the sample correlation matrix R, is known as the principal component solution.
• The eigenvalue-eigenvector pairs for sample covariance matrix S: (λ̂1 , ê1 ), · · · , (λ̂p , êp ),
where λ̂1 ≥ · · · ≥ λ̂p . Let m < p be the number of common factors. Then,
p p
– the estimated factor loadings: Le = [ λ̂1 ê1 , · · · , λ̂m êm ]
P
– the estimated specific variances are : ψei = sii − m 2
j=1 ℓij

– the estimated communalities are e


h2i = ℓe2i1 + · · · + ℓe2im
If the number of common factors is not determined by a priori considerations, such
as by theory or the work of other researchers, the choice of m can be based on the
following rules
• Choose m to minimize the elements in the residual matrix: S − (LL′ + Ψ)
∑m
j=1 λ̂j
• Choose m to make s11 +···+spp suitable

• Set m equal to the number of eigenvalues of R greater than 1 if the sample


correlation matrix is factored, or equal to the number of positive eigenvalues of
S if the sample covariance matrix if factored.

• Factor analysis of consumer-preference data


• Factor analysis of stock-price data

3
5 A Modified Approach: the Iterated Principal Factor Solution

• We describe the reasoning in terms of a factor analysis of R, although the pro-


cedure is also appropriate for S.

• If the factor model ρ = LL′ + Ψ is correctly specified, the m common factors


should account for the off-diagonal elements of ρ, as well as the communality
portions of the diagonal elements ρii = 1 = h2i + ψ.
• If the specified factor contribution ψi is removed from the diagonal or, equiva-
lently, the 1 replaced by h2i , the resulting matrix is ρ − ψ = LL′ .

6 The Iterated Principal Factor Solution


1. Suppose, now, that the initial estimates ψi∗ of the specific variances are available.
Then replacing the ith diagonal element of R by h∗2 ∗
i = 1 − ψi , we obtain a
"reduced" sample correlation matrix
 ∗2 
h1 r12 · · · r1p
 r12 h∗2 · · · r2p 
 2 
Rr =  . .. .. .. 
 .. . . . 
r1p r2p ··· h∗2
p

Rr is factored as
.
Rr = L∗r L∗r ′

2. Update the estimate of Ψ, Ψ = diag{R − L∗r L∗r ′ }

3. Repeat the above two steps until converge.

7 Comments:
• Choices for initial estimates of specific variances : ψi∗ = 1/rii , where, rii is the
ith diagonal element of R−1 . The initial communality estimates then become
h∗2 ∗
i = 1 − ψi = 1 − r ii which is equal to the square of the multiple correlation
1

coefficient between Xi and the other p − 1 variables.


• The principal component method for R can be regarded as a principal factor
method with initial communality estimates of unity, or specific variances equal
to zero.
• Rr is not always positive, some eigenvalues might be negative; the principal
factor solution is sensitive to the number of common factors; if m is vary big,
the communality might be bigger than 1.

4
8 The Maximum Likelihood Method
• Assume F ∼ Nm (0, Im ), ϵ ∼ Np (0, Ψ) and F and ϵ are independent. Then
X ∼ Np (µ, LL′ + Ψ).

• The log likelihood is

1X
n
n
− log|LL′ + Ψ| − (xi − µ)′ (LL′ + ψ)−1 (xi − µ)
2 2 i=1

• Impose the computationally convenient uniqueness condition: L′ Ψ−1 L = ∆ a


diagonal matrix

• The maximum likelihood estimators L̂, Ψ̂ and µ = x̄ maximize the log likeli-
hood subject to L̂′ Ψ̂L̂ being diagonal.
• Proportion of total sample variance due to jth factor:

ℓ̂21j + · · · + ℓ̂2pj
s11 + · · · + spp

• The variables are standardized , Z = V−1/2 (X − µ), then the covariance matrix
ρ has the representation

ρ = V−1/2 ΣV−1/2 = (V−1/2 L)(V−1/2 L)′ + V−1/2 ΨV−1/2

• By the invariance property of maximum likelihood estimators, the maximum


likelihood estimator of ρ is

ρ̂ = (V̂−1/2 L̂)(V−1/2 L̂)′ + V̂−1/2 Ψ̂V̂−1/2 = L̂z L̂′z + Ψ̂z

where V̂−1/2 and L̂ are the maximum likelihood estimators of V−1/2 and L
respectively.
• Proportion of total standardized sample variance due to jth factor is
ℓ̂21j + · · · + ℓ̂2pj
p

where ℓ̂ij ’s denote the elements of L̂z .

• Factor analysis of stock-price data using the maximum likelihood method


• Factor analysis of Olympic decathlon data

5
9 A Large Sample Test for the Number of Common Factors

• Test the adequacy of the m common factor model:

H0 : Σ = LL′ + Ψ v.s. H1 : any other positive def inite matrix.

• The likelihood ratio statistic for testing H0 is

|L̂L̂′ + Ψ̂| −n/2 approx.


−2lnΛ = −2ln( ) ∼ χ2df
|Sn |

where df = 1/2[(p − m)2 − p − m].


• Using Barlett’s correction, we reject H0 at the α level of significance if

|L̂L̂′ + Ψ̂|
(n − 1 − (2p + 4m + 5)/6)ln( ) > χ21/2[(p−m)2 −p−m] (α)
|Sn |

provided that m < 1/2(2p + 1 − 8p + 1)

9.1 Factor Rotation


• Factor loadings L are determined only up to an orthogonal matrix T. Thus, the
loadings
L∗ = LT and L
both give the same representation. The communalities, given by the diagonal
elements of LL′ = L∗ L∗′ are also unaffected by the choice of T.
• If L̂ is the p × m matrix of estimated factor loadings, then

L̂∗ = L̂T, where TT′ = T′ T = I

is a matrix of "rotated" loadings.

• Kaiser proposed varimax criterion: define ℓe∗ij = ℓ̂∗ij /ĥi to be the rotated coef-
ficients scaled by the square root of the communalities. Select the orthogonal
Pm Pp Pp e∗2 2
transformation T that makes V = p1 j=1 [ i=1 ℓe∗4 ij − ( i=1 ℓij ) /p] as large
as possible.

• Rotated loadings for subject tests


• Rotated loadings for the consumer-preference data

• Rotated loadings for the stock-price data


• Rotated loadings for the Olympic decathlon data

6
9.2 Factor Scores
• Factor scores are estimates of values for the unobserved random factor vectors
Fj , j = 1, 2, · · · , n. That is, factor scores

f̂j = estimate of the values fj attained by Fj (jth case)

• The estimation situation is complicated by the fact that the unobserved quantities
fj and ϵj outnumbered the observed xj .

• Treat the estimated factor loadings ℓ̂ij and specific variances ψ̂i as if they were
the true values.
• we will not differentiate between original estimated loadings and estimated ro-
tated loadings.

The Weighted Least Squares Method


• Suppose first that the mean vector µ, the factor loadings L, and the specific variance Ψ
are known for the factor model

X − µ = LF + ϵ

• Since V ar(ϵi ) = ψi , i = 1, 2, · · · , p, need not be equal. Bartlett proposed to minimize

Xp
ϵ2i
= (x − µ − Lf )′ Ψ−1 (x − µ − Lf )
i=1
ψ i

The solution is f̂ = (L′ Ψ−1 L)−1 L′ Ψ−1 (x − µ)


• We take the MLE estimates L̂, Ψ̂ and µ̂ = x̄ as the true values and obtain the factor
scores for the jth case as

f̂j = (L̂′ Ψ̂−1 L̂)−1 L̂′ Ψ̂−1 (xj − x̄)

• If rotated loadings L∗ = L̂T are used in place of the original loadings, the
subsequent factor scores, f̂j∗ , are related to f̂j by f̂j∗ = T′ f̂j , j = 1, 2, · · · , n.
• If the factor loadings are estimated by the principal component method, it is
customary to generate factor scores using unweighted (ordinary) least squares
procedure.
e ′ L)
f̂j = (L e −1 L
e ′ (xj − x̄)

f̂j are the first m scaled principal components, evaluated at xj

The Regression Method

• When the common factors F and the specific factors ϵ are jointly normally dis-
tributed:
X − µ = LF + ϵ ∼ Np (0, LL′ + Ψ)

7
• Moreover, the joint distribution of (X − µ) and F is Nm+p (0, Σ∗ ), where
 ′ 
LL + Ψ L
Σ∗ =
L′ I

• The conditional distribution of F|x is multivariate normal with

E(F|x) = L′ (LL′ + Ψ)−1 (x − µ)

Cov(F|x) = I − L′ (LL′ + Ψ)−1 L

• Given any vector of observations, and taking the MLE L and Ψ as the true values,
we see that the jth factor score vector is given by

f̂j = L̂′ (L̂L̂′ + Ψ)−1 (xj − x̄)

• The relationship between weighted least squares estimates f̂jLS and the regres-
sion estimates f̂jR is
f̂jLS = (I + (L̂′ Ψ̂−1 L̂)−1 )f̂jR

• In an attempt to reduce the effects of a possibly incorrect determination of the


number of factors, we use the sample variance S to replace Σ̂ = L̂L̂′ + Ψ̂. i.e.
fj = L̂′ S −1 (xj − x̄).

• Example 9.12
• Example 9.13

10 Perspectives and a Strategy For Factor Analysis


• Perform a principal component factor analysis

• Perform a maximum likelihood factor analysis, including a varimax rotation


• Compute the solutions obtained from the two factor analysis
– Do the loadings group in the same manner?
– Plot factor scores obtained for principal components against scores from
the maximum likelihood analysis

• Repeat the first three steps for other numbers of common factors m
• For large data sets, split them in half and perform a factor analysis on each part.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy