Factor in R PDF
Factor in R PDF
i=1
1i
X
i
(n)
FS
2
(n) =
10
i=1
2i
X
i
(n)
5 How many number of factors should be extracted?
There are some criteria, but no 100% foolproof statistical test exists.
Drawing screeplot: Connect eigenvalues (as representing variances explained by each factor, so that sometimes
sums of squared factor loadings are used instead) for many possible factors from maximum to minimum. The
adequate number of factors is before the sudden downward inextion of the plot.
Parallel analysis: Compare actual screeplot with the possible screeplot based on randomly resampled data. The
adequate number of factors is at the crossing point of the two plots.
Eigenvalues > 1: Eigenvalues sum to the number of items, so an eigenvalue more than 1 is more informative
than a single average item.
6 Checking adequacy of factor analysis
There are some method to check the adequacy of the factor analysis.
Criteria of sample size adequacy: sample size 50 is very poor, 100 poor, 200 fair, 300 good, 500 very good, and
more than 1,000 excellent (Comfrey and Lee, 1992, p.217).
*
4
In principal component analysis, each components can be formulated as the linear function of measured variables, so that it doesnt
need iterative estimation.
2
Kaiser-Meyer-Olkins sampling adequacy criteria (usually abbreviated as KMO) with MSA (individual measures
of sampling adequacy for each item): Tests whether there are a signicant number of factors in the dataset:
Technically, tests the ratio of item-correlations to partial item correlations. If the partials are similar to the raw
correlations, it means the item doesnt share much variance with other items. The range of KMO is from 0.0
to 1.0 and desired values are > 0.5
*
5
. Variables with MSA being below 0.5 indicate that item does not belong
to a group and may be removed form the factor analysis.
Prof. Shigenobu Aoki provides the following function to calculate KMO and MSA at his web page:
kmo <- function(x)
{
x <- subset(x, complete.cases(x)) # Omit missing values
r <- cor(x) # Correlation matrix
r2 <- r^2 # Squared correlation coefficients
i <- solve(r) # Inverse matrix of correlation matrix
d <- diag(i) # Diagonal elements of inverse matrix
p2 <- (-i/sqrt(outer(d, d)))^2 # Squared partial correlation coefficients
diag(r2) <- diag(p2) <- 0 # Delete diagonal elements
KMO <- sum(r2)/(sum(r2)+sum(p2))
MSA <- colSums(r2)/(colSums(r2)+colSums(p2))
return(list(KMO=KMO, MSA=MSA))
}
Bartletts sphericity test: Tests the hypothesis that correlations between variables are greater than would be
expected by chance: Technically, tests if the matrix is an identity matrix. The p-value should be signicant:
i.e., the null hypothesis that all o-diagonal correlations are zero is falsied.
Prof. Shigenobu Aoki provides the following function to conduct Bartletts sphericity test at his web page:
Bartlett.sphericity.test <- function(x)
{
method <- "Bartletts test of sphericity"
data.name <- deparse(substitute(x))
x <- subset(x, complete.cases(x)) # Omit missing values
n <- nrow(x)
p <- ncol(x)
chisq <- (1-n+(2*p+5)/6)*log(det(cor(x)))
df <- p*(p-1)/2
p.value <- pchisq(chisq, df, lower.tail=FALSE)
names(chisq) <- "X-squared"
names(df) <- "df"
return(structure(list(statistic=chisq, parameter=df, p.value=p.value,
method=method, data.name=data.name), class="htest"))
}
7 Functions to conduct factor analysis in R
factanal This function is included in standard installation. It uses maximum likelihood estimation (mle) to nd
the factor loadings. The number of factors to be extracted must be explicitly specied. Varimax and promax
rotations are possible. Input data may be a matrix or a dataframe.
paf This function is included in rela package. It uses principal axis method to nd the factor loadings. The
*
5
According to the criteria suggested by Kaiser (1974), less than 0.5 is unacceptable, [0.5, 0.6) is miserable, [0.6, 0.7) is mediocre, [0.7,
0.8) is middling, [0.8, 0.9) is meritorious, [0.9, 1.0) is marvelous.
3
adequate number of factors will be automatically determined by the criteria of eigenvalues (you can specify its
criterion by eigencrit= option: default is 1). KMO and MSA are automatically calculated. Rotation is not
provided. Input data must be a matrix.
fa This function is included in psych package. The fm= option can specify the method of estimation ("minres"
for minimum residual, "ml" for maximum likelihood estimate, and "pa" for principal axis method). The
number of extracted factors must be specied by nfactors= option. Various rotation methods can be specied
by rotate= option ("none", "varimax", "quartimax", "bentlerT", "geominT", "oblimin", "simplimax",
"bentlerQ", "geominQ", and "cluster" will be possible).
alpha This function is included in psych package. This calculates Cronbachs .
cortest.bartlett This function is included in psych package. This conducts Bartletts sphericity test.
fa.parallel This function is included in psych package. Return the adequate number of extracted factors as $nfact.
8 Example 1
Lets analyze the variable p1-p40 in the factorexdata05.txt, which is converted from Prof. Timothy Bates SPSS
data
*
6
. Prof. Bates provides the pdf documents for undergraduate students
*
7
.
The easiest way is the following. Number of factors can be automatically determined. Factor loadings are saved as
res$Factor.Loadings.
library(foreign)
y <- read.spss("http://www.subjectpool.com/ed_teach/y3method/factorexdata05.sav")
x <- as.data.frame(y)
for (i in 1:length(x)) { x[,i] <- ifelse(x[,i]==999,NA,x[,i]) }
# The data \verb!x! consists of 538 cases with 102 variables.
# it can be saved as "factorexdata05.txt" by the following line
# write.table(x,"factorexdata05.txt",quote=FALSE,sep="\t",row.names=FALSE)
# if so, the data can be read by:
# x <- read.delim("factorexdata05.txt")
Ps <- x[,4:43] # Extract variables p1-p40
Ps <- subset(Ps, complete.cases(Ps)) # Omit missings (511 cases remain)
library(rela)
res <- paf(as.matrix(Ps))
summary(res) # Automatically calculate KMO with MSA, determine the number of factors,
# calculate chi-square of Bartletts sphericity test, communalities and
# factor loadings. Communalities are 1 minus uniquenesses.
barplot(res$Eigenvalues[,1]) # First column of eigenvalues.
resv <- varimax(res$Factor.Loadings) # Varimax rotation is possible later.
print(resv)
barplot(sort(colSums(loadings(resv)^2),decreasing=TRUE)) # screeplot using rotated SS loadings.
scores <- as.matrix(Ps) %*% as.matrix(resv$loadings) # Get factor scores in a simple manner.
library(psych)
cortest.bartlett(Ps) # Bartletts sphericity test.
res2 <- fa.parallel(Ps)
res3 <- fa(Ps, fm="minres", nfactors=8, rotate="oblimin")
print(res3) # Factor loadings as $loadings
*
6
http://www.subjectpool.com/ed_teach/y3method/factorexdata05.sav
*
7
http://www.subjectpool.com/ed_teach/y3method/factorex05.pdf and http://www.subjectpool.com/ed_teach/y3method/fa.pdf
4