0% found this document useful (0 votes)

41 views38 pages

Bia b350f Unit 4

The document discusses principal component analysis (PCA), a technique used to simplify complex data sets by reducing the number of variables while retaining as much information as possible. PCA transforms the data into a set of values called principal components. It works by calculating the eigenvalues and eigenvectors of the covariance matrix and using them to change the basis of the data. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. PCA is useful for reducing dimensionality in multi-variable problems and as a pre-processing step before other analyses like regression.

Uploaded by

Nile Seth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views38 pages

Bia b350f Unit 4

Uploaded by

Nile Seth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

1

BIA B350F Applied Multivariate Analysis for Business

Unit 4 Principal Component Analysis

Principal Component Analysis

Sometimes data are collected on a large number of variables from a single population. With
a large number of variables, the corresponding covariance matrix may be too large to
analyze and interpret.

Thus, there is a need to reduce the number of variables to a few significant linear
combinations of the data that are easier to interpret and analyze. Under this context, each
linear combination will correspond to a principal component.

Principal component analysis is a technique that is used to simplify a data set. It can be
used to reduce dimensionality by eliminating principal components that are considered to
be relatively less important.

In general, a principal component analysis is to explain the variance–covariance structure

of a set of variables through a few significant linear combinations of these variables.
3

Principal Component Analysis

If there are k original variables involved, normally k principal components are
needed to reproduce the variability of the data collected. However, it is often
the case that the total variability can be accounted for by p principal
components such that p < k. In such case, the p principal components is
capable of replacing the initial k variables. Principal component analysis often
is treated as an intermediate tools to reduce complexity of data. Its outputs
could become inputs of multiple regression analysis, factor analysis and so
on.
4

Framework of principal Component

Given a random vector x and the population covariance matrix Σ of x with eigenvalues λ1
≥ λ2 ≥ λ3 ≥ … ≥ λk ≥ 0
and corresponding normalized eigenvectors e1, e2, e3, … ek.
 X1 
X  11 12  1k 
 2    2k 
21  22
x   X3 Var x   Σ   
      
   
 X k   k1  k 2   kk 

Consider the linear combination:

Y1  a1' x  a11 X 1  a12 X 2    a1k X k
Y2  a '2 x  a21 X 1  a22 X 2    a2 k X k
 
Var Yi   Var a 'i x  a 'i Σ a i i  1,2, , k

  
Cov Yi , Y j  a 'i Σ a j i, j  1,2, , k

Yk  a 'k x  ak1 X 1  ak 2 X 2    akk X k

The principal components are those uncorrelated linear combinations

Y1 , Y2 , ... Yk whose variances are as large as possible.

In other words, the k principal components are those uncorrelated linear combinations
with the top k largest variances in Var Y   a '
i Σ a i from different
one couldi generated
choices of the coefficient vector ai provided that .
a 'i a i  1

Y1 = First principal component = linear combination a1' maximizes

that x  
Var a1' x
subject to a1' a1  1

Y2 = Second principal component = linear combination a '2maximizes

that x  
Var a '2 x
a '2 a 2  1and  
Cov a1' x , a '2 x  0

subject to

Yi = i principal component = linear combination a 'i xmaximizes

that  
Var a 'i x
subject to a 'i a i  1and  
Cov a 'i x , a 'j x  0 for j  i
6

Determine the coefficient vector ai

for the i th principal component
How do we find the coefficient vector ai for the i th principal component?
The solution involves the eigenvalues and eigenvectors of the covariance matrixΣ.
Recall that the population covariance matrix Σ of the random vector x has eigenvalues λ1
≥ λ2 ≥ λ3 ≥ … ≥ λk ≥ 0 and corresponding normalized eigenvectors e1, e2, e3, … ek.

It turns out that the i th eigenvectors will be the i th coefficients vectors of the i th
principal components and that the variance for the i th principal component is
equal to the i th eigenvalue. The resulting principal components are uncorrelated
with one another.

Yi  e 'i x  ei1 X 1  ei 2 X 2    eik X k e 'i  ei1 ei 2  eik 

 
Var Yi   Var e i'x  i
 
Cov Yi , Y j  0 for i  j
7

The Spectral Decomposition

Let A be a k x k symmetric matrix. Then A can be expressed in terms

of its k eigenvalue–eigenvector pairs (λi, ei) such that
k
A   ie ie i'  1e 1e 1'  2e 2e 2'    ke ke k'
i 1
This implies any symmetric matrix can be reconstructed from its eigenvalues
and eigenvectors.
8

The Spectral Decomposition of the Covariance Matrix Σ

The population covariance matrix Σ of the random vector x is symmetric.

Hence, Σ can be decomposed as:
k
Σ   ie ie i'  1e 1e 1'  2e 2e 2'    ke ke k'
i 1
p
  ie ie i'  1e 1e 1'  2e 2e 2'     pe pe 'p
i 1

The second expression is a useful approximation if λp+1, λ p+2, … , λk are

relatively small. We might approximate Σ by it.
9

Total Variance of the Principal Components

Let x   X 1 X2  X k  have covariance matrix Σ with eigenvalue - eigenvecto r
pairs 1 , e 1 , 2 , e 2 ,  , k , e k  where λ1  λ2    λk  0.

Let Y1  e 1x, Y2  e 2 x,  , Yk  e k x be the principal components. Then

k k
 11   22     kk   Var ( X i ) 1  2    k   Var (Yi )
i 1 i 1
That is, total population variance   11   22     kk  1  2    k

Hence, the proportion of total variance due to (explained by) the i th principal
component is given by:

λi
i  1,2, , k
λ1  λ2    λk
10

Correlation Coefficient between Yi and Xj

Each entry in the vector ei  ei1 ei 2  eij  eikhas
 meaningful
interpretation. That is, the magnitude of eij measures the contribution of the jth variable

to the ith principal component. In particular, eij is proportional to the correlation

coefficient between Yi and Xj.

For Y1  e 1x, Y2  e 2 x,  , Yk  e k x are the principal components
with respect to the covariance matrix Σ , the correlatio n coefficien ts
between principal component Yi and the variables X j is given by
eij i
Corr Yi , X j   This simple correlation between the original and the principal
 jj component variables, also called loadings.
11

Example 4.1
Suppose the random variables X1, X2, and X3 have the covariance matrix Σ.

1 0 0 1  5.236, e 1  0  0.8507 0.5257

Σ  0 4  2 with 2  1.000, e 2  1 0 0
0  2 2  3  0.764, e 3  0 0.5257 0.8507
Therefore, the principal components of Σ are:
Y1  e 1x  0.8507 X 2  0.5257 X 3
Y2  e 2 x  X 1
Y3  e 3 x  0.5257 X 2  0.8507 X 3

Var Y1   Var  0.8507 X 2  0.5257 X 3 

  0.8507 2Var X 2   0.5257 2Var X 3   2 0.8507 0.5257 Cov X 2 , X 3 
  0.8507 2 4   0.5257 2 2   2 0.8507 0.5257  2   5.236 = λ1
Example 4.1 12
1 0 0 1  5.236, e 1  0  0.8507 0.5257 Y1  e 1x  0.8507 X 2  0.5257 X 3
Σ  0 4  2 with 2  1.000, e 2  1 0 0 Y2  e 2 x  X 1
0  2 2  3  0.764, e 3  0 0.5257 0.8507 Y3  e 3 x  0.5257 X 2  0.8507 X 3

Var Y2   Var X 1   1 = λ2

Var Y3   Var 0.5257 X 2  0.8507 X 3 
 0.52572Var X 2   0.85072Var X 3   20.52570.8507Cov X 2 , X 3 
 0.52572 4   0.85072 2   20.52570.8507 2   0.764 = λ3
Total population variance
 11   22   33  1  4  2  1  2  3  5.236  1  0.764  7
The proportion of total variance accounted for The proportion of total variance accounted for by
by the first principal component is: the first two principal components is:
λ1 5.236 λ1  λ2 5.236  1
  0.748   0.8909
λ1  λ2  λ3 5.236  1  0.764 λ1  λ2  λ3 5.236  1  0.764

Thus, the principal components Y1 and Y2 could replace the original three random variables with
negligible loss of information.
Example 4.1 13
1 0 0 1  5.236, e 1  0  0.8507 0.5257 Y1  e 1x  0.8507 X 2  0.5257 X 3
Σ  0 4  2 with 2  1.000, e 2  1 0 0 Y2  e 2 x  X 1
0  2 2  3  0.764, e 3  0 0.5257 0.8507 Y3  e 3 x  0.5257 X 2  0.8507 X 3

e11 1 0 5.236
Corr Y1 , X 1    0
 11 1

e12 1  0.8507 5.236

Corr Y1 , X 2     0.8489
 22 4

e13 1 0.5257 5.236

Corr Y1 , X 3     0.8506
 33 2
Consider the first principal component Y1, the coefficient of variable X2 is –0.8507 which is the
largest weight in Y1. The coefficient of variable X3 is 0.5257. This indicates that X2 contributes
more than X3 to the formation of Y1. X1 does not contribute to Y1 hence their correlation is zero.

The correlation between Y1 and X2, Y1 and X3 are –0.8489 and 0.8506 respectively, almost
identical. This indicates that the two variables are equally important toY1.
Example 4.1 14
1 0 0 1  5.236, e 1  0  0.8507 0.5257 Y1  e 1x  0.8507 X 2  0.5257 X 3
Σ  0 4  2 with 2  1.000, e 2  1 0 0 Y2  e 2 x  X 1
0  2 2  3  0.764, e 3  0 0.5257 0.8507 Y3  e 3 x  0.5257 X 2  0.8507 X 3

e21 2 1 1
Corr Y2 , X 1    1
 11 1

e22 2 0 1
Corr Y2 , X 2    0
 22 4

e23 2 0 1
Corr Y2 , X 3    0
 33 2

Since Y2 equals to X1, the correlation between Y2 and X1 is 1.

Since the third principal component is not important, its respective correlations could be
ignored.
15

Desired Number of Principal Components to Keep

There is no “clear-cut” about how many principal components should be retained.
However, you can use the following guidelines to help you determine the desired
number of retained components :
1. The variances of each principal components (sizes of the eigenvalues) divided by the total
variance. You should choose the first n principal components where they explain most of the
total variances
2. Cattell’s Scree Test and Horn’s Parallel Analysis (more detailed will be explained later)
3. Apply the Kaiser rule (ONLY applicable when principal component analysis on correlation
matrix)
4. The interpretations of the components with respect to the original given scenario.
16

Principal Components of Sample Data

With reference to the framework of principal components established for

population data, the principal components using sample data could be determined
in the same way.

Let x1, x2, x3, … xn be n independent drawings from k -dimensional population with
mean vector µ and covariance matrix Σ. The samples mean vector and sample
covariance matrix are respectively. x and S
17

Principal Components of Sample Data

If the sample covariance matrix S has eigenvalues ˆ1  ˆ2    and
ˆ 0
k

corresponding normalized eigenvectors eˆ1 , eˆ2 ,with

 , eˆxk be any observations on the

variables X1, X2, …, Xk, then the ith sample principal component is given by

yˆ i  eî x  eî1 x1  eî 2 x2    eîk xk , i  1,2,  , k

such that

Sample Variance  yˆ i   ˆi , i  1,2,  , k ; Sample Covarianceyˆ i , yˆ j   0, i j

Total sample variance  s11  s22    skk  ˆ1  ˆ2    ˆk
eˆij ˆi
Corr yˆ i , x j   , i, j  1,2,  , k
s jj
18

Example 4.2: Places Rated

In the Places Rated Almanac, Boyer and Savageau rated 329 communities in the United
States according to the following nine criteria:
1. Climate and Terrain 4. Crime 7. The Arts
2. Housing 5. Transportation 8. Recreation
3. Health Care & the Environment 6. Education 9. Economics

For housing and crime, the lower the score the better. For the rest of the variables, the
higher the score the better.
With nine variables, the covariance matrix may be too large to analyze and interpret in a
proper manner. There would be too many pairwise covariance between the variables to study.
Graphical display of data also may not too helpful if the data set is very large. To interpret the
data in a more meaningful way, it is therefore necessary to reduce the number of variables to a
few, interpretable linear combinations or principal components of the data.
19

Example 4.2: Places Rated

Principal Components Analysis Observations 329
Extracted R output Variables 9

Covariance Matrix
Climate Housing Health Crime Trans Educate Arts Recreate Econ
Climate 0.0128923499 0.0032677528 0.0054792649 0.0043741176 0.000385724 0.0004415009 0.0106885887 0.0025732595 –.0009661793
7
Housing 0.0032677528 0.0111161410 0.0145962100 0.0024830608 0.005278579 0.0010695852 0.0292263029 0.0091269830 0.0026458304
9
Health 0.0054792649 0.0145962100 0.1027278915 0.0099549524 0.0211534636 0.0074778111 0.1184843654 0.0152994310 0.0014633998

Crime 0.0043741176 0.0024830608 0.0099549524 0.0286107020 0.007298931 0.0004713186 0.0319465684 0.0092846815 0.0039464274
7
Trans 0.0003857247 0.0052785799 0.0211534636 0.0072989317 0.024828868 0.0024618893 0.0470407089 0.0115674940 0.0008343588
8
Educate 0.0004415009 0.0010695852 0.0074778111 0.0004713186 0.002461889 0.0025199764 0.0095204087 0.0008772470 0.0005464533
3
Arts 0.0106885887 0.0292263029 0.1184843654 0.0319465684 0.047040708 0.0095204087 0.2971731520 0.0508599879 0.0062060281
9
Recreate 0.0025732595 0.0091269830 0.0152994310 0.0092846815 0.0115674940 0.0008772470 0.0508599879 0.0353078256 0.0027924140

Econ –.0009661793 0.0026458304 0.0014633998 0.0039464274 0.000834358 0.0005464533 0.0062060281 0.0027924140 0.0071365383
Use sum(diag(R)) to obtain total variance 8
0.5223134457
20

Example 4.2: Places Rated

If you sum up all terms in “SS loadings”, it gives
you 0.5223, which is the total variance

PC1 PC2 PC3 PC4 PC5

SS loadings 0.3774624 0.05105221 0.02791958 0.02296708 0.01677125
Proportion Var 0.7226740 0.09774248 0.05345370 0.04397184 0.03210956
Cumulative Var 0.7226740 0.82041652 0.87387021 0.91784205 0.94995161 Table 4.1
Proportion Explained 0.7607483 0.10289207 0.05626992 0.04628850 0.03380125 Eigenvalues of the
Cumulative Proportion 0.7607483 0.86364033 0.91991024 0.96619875 1.00000000
covariance matrix

PC1 PC2 PC3 PC4 PC5

Climate 0.03507 0.008878 0.140875 0.152745 -0.39751
Housing 0.09335 0.009231 0.128850 -0.178382 -0.17531
Health 0.40776 -0.858532 0.276058 -0.035161 -0.05032 Table 4.2
Crime 0.10045 0.220424 0.592688 0.723663 0.01346 Normalized eigenvectors
Trans 0.15010 0.059201 0.220898 -0.126205 0.86997
Educate 0.03215 -0.060589 0.008145 -0.005197 0.04780
Arts 0.87434 0.303806 -0.363287 0.081116 -0.05507
Recreate 0.15900 0.333993 0.583626 -0.628226 -0.21329
Econ 0.01949 0.056101 0.120853 0.052170 -0.02965
21

Example 4.2: Places Rated

PC1 PC2 PC3 PC4 PC5
SS loadings 0.3774624 0.05105221 0.02791958 0.02296708 0.01677125
Proportion Var 0.7226740 0.09774248 0.05345370 0.04397184 0.03210956
Cumulative Var 0.7226740 0.82041652 0.87387021 0.91784205 0.94995161
Proportion Explained 0.7607483 0.10289207 0.05626992 0.04628850 0.03380125
Cumulative Proportion 0.7607483 0.86364033 0.91991024 0.96619875 1.00000000

“SS loadings” represents the eigenvalue for each PC. The sum of SS loadings = 0.5223. The
proportion of variation explained by each eigenvalue is highlighted in blue bracket. For
example, 0.377462 divided by 0.5223 equals 0.7227, or, about 72% of the total variation is
explained by this first eigenvalue.

The cumulative percentage explained is obtained by adding the successive proportions of

variation explained to obtain the running total. For instance, 0.7227 plus 0.0977 equals 0.8204,
and so forth. Therefore, about 82% of the variation is explained by the first two eigenvalues
together.
22

Example 4.2: Places Rated

If you compute the differences of SS loadings between two adjacent PCs, you can see the
magnitude of difference is decreasing.
Subtracting the second eigenvalue 0.051 from the first eigenvalue, 0.377 we get a difference
of 0.326. The difference between the second and third eigenvalues is 0.0232; the next difference
is 0.0049. Subsequent differences are even smaller. A sharp drop from one eigenvalue to the
next may serve as another indicator of how many eigenvalues to consider.
The first three principal components explain 87% of the variation. This actually is an acceptable
large percentage.
23

Example 4.2: Places Rated

Scree Plot
Another way to determine the number of principal components to employ is to look at a Scree
Plot. With the eigenvalues ordered from largest to the smallest, a scree plot is the plot of
λi versus i. The number of component is determined at the point, beyond which the remaining
eigenvalues are all relatively small and of comparable size.

The Scree plot is utilized in parallel analysis. A parallel analysis simulates a random dataset
(or resampled dataset), with the same sample size as the input dataset. Then, eigenvalues are
computed from this simulated dataset (or resampled dataset). Finally, these simulated
eigenvalues are plotted along with eigenvalues from the input dataset, on the same scree plot.
Simulated eigenvalues are plotted in dash lines.
24

Example 4.2: Places Rated

Kaiser rule
Kaiser (1960) recommends that we should
retain eigenvalues only if they are at least
equal to one. This is also known as
“eigenvalue > 1” rule.

However, scholar had argued that Kaiser

rule is only valid when using correlation
matrix to compute PCs. Therefore, the
Kaiser rule does not apply on example 4.2

(Source: https://www.rasch.org/rmt/rmt191h.htm)
25

Example 4.2: Places Rated

Cattell’s Scree Test and Horn’s Parallel Analysis
Cattell (1966)’s scree test is performed by searching
for a “bend” or “elbow” in the plot, or an abrupt
transition from large to small eigenvalues. From the
scree plot on the right, the sharp break appears at the
2nd component. Therefore, using Cattell’s suggestion,
only the 1st PC should be retained.

Horn (1965)’s parallel analysis is an equally

compelling procedure where each point on the actual
data is above the simulated data or resampled data
lines is a component to extract. Again, only the 1 st PC
should be retained.

(Source: https://www.rasch.org/rmt/rmt191h.htm)
26

Example 4.2: Places Rated

Interpretation of the Principal Components

To interpret each component, the correlations between each original variable and each
principal component will be computed. These correlations between the principal
components and the original variables will be used to interpret these principal
components. Note that among the principal components themselves there is zero
correlation between the components.
27

First Principal Component Analysis

Traditionally, researchers have used a loading of 0.5 or above as
the cutoff point. Principal Component

The first principal component is strongly correlated with five of Variable 1 2 3

the original variables. The first principal component increases with Climate 0.190 0.017 0.207
scores in Arts, Health, Transportation, Housing and Recreation Housing 0.544 0.020 0.204
increase. This suggests that these five criteria vary together. Health 0.782 -0.605 0.144
This component can be viewed as a measure of the quality of Arts, Crime 0.365 0.294 0.585
Health, Transportation, and Recreation, and the lack of quality in Transportation 0.585 0.085 0.234
Housing. Education 0.394 -0.273 0.027
It could be stated that based on the largest correlation of 0.985 that Arts 0.985 0.126 -0.111
this principal component is primarily a measure of the Arts. Recreation 0.520 0.402 0.519
Communities with high overall scores would tend to have a lot of Economy 0.142 0.150 0.239
arts available, in terms of theaters, orchestras, etc. Whereas
communities with small values would have very few of these types
of facilities.
28

Second Principal Component Analysis

The second principal component increases with only one Principal Component
of the values, decreasing Health. This component can be Variable 1 2 3
viewed as a measure of how unhealthy the location is in
Climate 0.190 0.017 0.207
terms of available health care including doctors, hospitals,
etc. Housing 0.544 0.020 0.204
Health 0.782 -0.605 0.144
Third Principal Component Analysis Crime 0.365 0.294 0.585
Transportation 0.585 0.085 0.234
The third principal component increases with
increasing Crime and Recreation. This suggests that Education 0.394 -0.273 0.027
places with high scores in crime also tend to have Arts 0.985 0.126 -0.111
better recreation facilities. Recreation 0.520 0.402 0.519
Economy 0.142 0.150 0.239
29

Computation of Principal Component Scores

PC1 PC2 PC3 PC4 PC5 The scores of principal component
Climate 0.03507 0.008878 0.140875 0.152745 -0.39751
Housing 0.09335 0.009231 0.128850 -0.178382 -0.17531 can be computed using the
Health 0.40776 -0.858532 0.276058 -0.035161 -0.05032 elements of the eigenvectors.
Crime 0.10045 0.220424 0.592688 0.723663 0.01346
Trans 0.15010 0.059201 0.220898 -0.126205 0.86997
Educate 0.03215 -0.060589 0.008145 -0.005197 0.04780
Arts 0.87434 0.303806 -0.363287 0.081116 -0.05507
Recreate 0.15900 0.333993 0.583626 -0.628226 -0.21329
Econ 0.01949 0.056101 0.120853 0.052170 -0.02965

For example, the first two principal component scores for an individual community of interest
can be computed using the elements of the eigenvector and the values of that community for
each of the nine variables :

yˆ1  0.03507Climate  0.09335Housing  0.40776Health    0.01949Econ 

yˆ 2  0.00888Climate  0.00923Housing  0.85853Health    0.05610Econ 

Example 4.2: Places Rated - Scatter plots of the Principal Components

The scatter plot is plotting the first principal component
against the second principal component. Each dot in this
plot represents one community. Each of their location on
the plot.

#213 has a very high score for the first principal component
and it is expected that this community possesses high values
for the Arts, Health, Housing, Transportation and
Recreation.

#195 has a very high value for the second component. One
can deduce expect that these two communities would be
bad for Health.

Conversely, #255 on the bottom represents that the

corresponding community would have high values for
Health.
31

Example 4.3: Using R to analyze U.S lawyers ratings

Here, we use an R data set that contains the raw data describing 43 US lawyers’ ratings on
the following 13 variables.

Variable Meaning of the variable

JUDGE Record identification variable
CONT Number of contacts of lawyer with judge
INTG Judicial integrity
DMNR Demeanor
DILG Diligence
CFMG Case flow managing
DECI Prompt decisions
PREP Preparation for trial
FAMI Familiarity with law
ORAL Sound oral rulings
WRIT Sound written rulings
PHYS Physical ability
RTEN Worthy of retention
32

Example 4.3: Using R to analyze U.S lawyers ratings

The R program of the principal component analysis (PCA) is given below
# read the data
dat <- read.csv("judgeratings.csv") Extract only the
dat2 <- dat[, 3:13] numerical parts

# use the package “psych” to build the PCA model

library(psych)
out <- principal(dat2, nfactors=11, rotate = "none", covar = TRUE, cor="cov")
out Summarize the output

## extract the eigenvalues and eigenvectors Eigenvalues and eigenvectors are stored
eigen.values <- out$values explicitly. Call them out if needed.
eigen.vectors <- out$loadings

# use “fa.parallel” to generate a scree plot

fa.parallel(dat2, fa="pc", n.iter=100, cor = "cov", show.legend = FALSE,
main = "Scree plot with parallel analysis")
abline(h = 1)
33

Example 4.3: Using R to analyze U.S lawyers ratings

Output from the PCA model - eigenvalues
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11
SS loadings 9.180 0.383 0.226 0.076 0.030 0.017 0.014 0.008 0.005 0.003 0.002
Proportion Var 0.923 0.039 0.023 0.008 0.003 0.002 0.001 0.001 0.001 0.000 0.000
Cumulative Var 0.923 0.962 0.984 0.992 0.995 0.997 0.998 0.999 0.999 1.000 1.000
Proportion Explained 0.923 0.039 0.023 0.008 0.003 0.002 0.001 0.001 0.001 0.000 0.000
Cumulative Proportion 0.923 0.962 0.984 0.992 0.995 0.997 0.998 0.999 0.999 1.000 1.000

The row “SS loadings” represents the eigenvalues for each

component. The first eigenvalue = 9.18; the second Since the first principal component
eigenvalue = 0.383, etc. accounts for 92.3% which is most of
variance and the second principal
components only accounts for 3.9%,
which is less than 5%, only the first
principal component will be retained.
34

Example 4.3: Using R to analyze U.S lawyers ratings

output from the PCA model - eigenvectors
Unstandardized loadings (pattern matrix) based upon covariance matrix
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 h2
INTG 0.71 -0.26 0.09 0.00 0.08 0.06 0.03 0.04 0.02 0.01 0.00 0.59
DMNR 1.05 -0.43 -0.04 0.06 -0.06 -0.06 -0.01 0.01 0.00 0.00 0.00 1.31
DILG 0.87 0.12 0.14 0.09 0.10 -0.04 -0.03 0.00 -0.03 0.00 0.00 0.81
…

Standardized loadings (pattern matrix)

item PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 h2
INTG 1 0.92 -0.34 0.11 0.00 0.10 0.07 0.04 0.05 0.02 0.01 0.00 1
DMNR 2 0.92 -0.38 -0.03 0.05 -0.05 -0.05 -0.01 0.01 0.00 0.00 0.00 1
DILG 3 0.96 0.13 0.16 0.10 0.11 -0.05 -0.04 0.00 -0.03 0.00 0.00 1
…

The eigenvectors are labelled as “loadings” in R. The Unstandardized loadings are the coefficient of
principal components (in original units i.e., ) and the standardized loadings are the correlation between the
principal components and the variables (i.e., ) . So, sum of squares of the unstandardized loadings for each
principal component gives the eigenvalue (i.e., ) .
35

Example 4.3: Using R to analyze U.S lawyers ratings

Output from the scree plot using fa.parallel()

Scree plot with parallel analysis

The scree plot shows a sharp break at the second
PC Actual Data component. This suggests that one PC should be
PC Simulated Data
retained.
eigen values of principal components

PC Resampled Data
8

In addition, each point on the blue line (i.e., the

observed eigenvalues) that lies above the

simulated data or the resampled data line is a
4

component to extract.
2
0

2 4 6 8 10

Component Number
36

Example 4.3: Using R to analyze U.S lawyers ratings

Use the following R program to interpret the retained principal components
## What is the correlation between the first
## and the second principal components? Load the R package “ltm” to conduct a
comp <- out1$scores[,1:2] Pearson correlation test between the first
library(ltm) 2 PCs.
rcor.test(cbind(comp,dat2))

## b. What does the first principal components represent?

biplot.psych(out1, choose = c(1,2), main = "Pattern of the first two
components")

This biplot helps you visualize the

relationship between the first 2 PCs.
37

Example 4.3: Using R to analyze U.S lawyers ratings

Output from rcor.test
PC1 PC2 INTG DMNR DILG CFMG DECI PREP FAMI ORAL WRIT PHYS
PC1 ***** -0.000 0.923 0.921 0.965 0.959 0.956 0.982 0.974 0.996 0.990 0.895
PC2 >0.999 ***** -0.337 -0.377 0.129 0.210 0.227 0.118 0.123 0.019 0.034 0.121
INTG <0.001 0.027 ***** 0.965 0.872 0.814 0.803 0.878 0.869 0.911 0.909 0.742
DMNR <0.001 0.013 <0.001 ***** 0.837 0.813 0.804 0.856 0.841 0.907 0.893 0.789
DILG <0.001 0.411 <0.001 <0.001 ***** 0.959 0.956 0.979 0.957 0.954 0.959 0.813
CFMG <0.001 0.178 <0.001 <0.001 <0.001 ***** 0.981 0.958 0.935 0.951 0.942 0.879
DECI <0.001 0.144 <0.001 <0.001 <0.001 <0.001 ***** 0.957 0.943 0.948 0.946 0.872
PREP <0.001 0.450 <0.001 <0.001 <0.001 <0.001 <0.001 ***** 0.990 0.983 0.987 0.849
FAMI <0.001 0.432 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 ***** 0.981 0.991 0.844
ORAL <0.001 0.903 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 ***** 0.993 0.891

The correlation between the first two PC

The loadings of the first principal component are all greater than 0.5.
is zero. In fact, this is true for any two
So, the first principal component is highly positively correlated (or
PCs.
high positive loadings) with all variables. This component seems to
The correlation figures in this table are reflect all aspects of the judge since it shows approximately positive
identical to the standardized loadings at loadings with all variables. It could be used as an index for rating the
p.34. judges.
38

Example 4.3: Using R to analyze U.S lawyers ratings

INTG DMNR DILG CFMG DECI PREP FAMI ORAL WRIT PHYS
PC1 0.923 0.921 0.965 0.959 0.956 0.982 0.974 0.996 0.990 0.895
PC2 -0.337 -0.377 0.129 0.210 0.227 0.118 0.123 0.019 0.034 0.121

Loadings

PCA
scores of
samples

Scores

Workbook – Unit 4 Q1

Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Week 04
No ratings yet
Week 04
86 pages
Principal Component Analysis Slides
No ratings yet
Principal Component Analysis Slides
26 pages
Cable Laying Specification
No ratings yet
Cable Laying Specification
16 pages
Lecture FPCA
No ratings yet
Lecture FPCA
67 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
39 pages
Lecture Note5
No ratings yet
Lecture Note5
53 pages
Aprendizaje Estadistico Final
No ratings yet
Aprendizaje Estadistico Final
71 pages
Ch8-Principal Components
No ratings yet
Ch8-Principal Components
77 pages
Principal Component Analysis: Random Vector
No ratings yet
Principal Component Analysis: Random Vector
20 pages
Multivariate Analysis Notes
No ratings yet
Multivariate Analysis Notes
6 pages
L08 PrincipalComponentAnalysis
No ratings yet
L08 PrincipalComponentAnalysis
36 pages
Principal Components
No ratings yet
Principal Components
5 pages
ML15 Pca
No ratings yet
ML15 Pca
12 pages
Adobe Dimension CC Classroom in A Book (2019 Release) (PDFDrive) - 1
No ratings yet
Adobe Dimension CC Classroom in A Book (2019 Release) (PDFDrive) - 1
150 pages
Chapter2 PCA
No ratings yet
Chapter2 PCA
65 pages
RES805-RM-Module 2
No ratings yet
RES805-RM-Module 2
26 pages
Factor Analysis
No ratings yet
Factor Analysis
8 pages
AE - Tema 2 - Principal Component Analysis
No ratings yet
AE - Tema 2 - Principal Component Analysis
4 pages
Design of Inverted Strip Fdn. Beam
100% (7)
Design of Inverted Strip Fdn. Beam
7 pages
CBSE Class12 PYQs Electric Charges and Fields-1
No ratings yet
CBSE Class12 PYQs Electric Charges and Fields-1
2 pages
Unit 2 - Data Visualization Techniques
No ratings yet
Unit 2 - Data Visualization Techniques
101 pages
09 Pca
No ratings yet
09 Pca
22 pages
Unit III 1
No ratings yet
Unit III 1
11 pages
(Template) As WEEK 5&6
No ratings yet
(Template) As WEEK 5&6
3 pages
Principal Components Analysis: Hal Whitehead BIOL4062/5062
No ratings yet
Principal Components Analysis: Hal Whitehead BIOL4062/5062
29 pages
Catia V5 Bending Torsion Tension Shear Tutorial
No ratings yet
Catia V5 Bending Torsion Tension Shear Tutorial
18 pages
Topic 2 - Innovation in Content and Sources of Innovation
No ratings yet
Topic 2 - Innovation in Content and Sources of Innovation
55 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Lecture 9 PRINCIPAL COMPONENTS
No ratings yet
Lecture 9 PRINCIPAL COMPONENTS
7 pages
EastWestAirlines Cluster
100% (1)
EastWestAirlines Cluster
6 pages
Emeng 3131 Electrical Power Systems: Power System Transients, Power System Stability & Load Flow Studies
No ratings yet
Emeng 3131 Electrical Power Systems: Power System Transients, Power System Stability & Load Flow Studies
25 pages
6 2 Reflections (Day 1) Lesson Plan
No ratings yet
6 2 Reflections (Day 1) Lesson Plan
3 pages
Topic 1 - Understand Innovation and Its Importance
No ratings yet
Topic 1 - Understand Innovation and Its Importance
41 pages
Ch9-Factor Analysis Model
No ratings yet
Ch9-Factor Analysis Model
44 pages
PX-760/PX-860/AP-260/AP-460/PX-160 MIDI Implementation: Casio Computer Co., LTD
No ratings yet
PX-760/PX-860/AP-260/AP-460/PX-160 MIDI Implementation: Casio Computer Co., LTD
51 pages
Aditya Kaplash Research Paper (Ground Improvement Using Stone Column)
No ratings yet
Aditya Kaplash Research Paper (Ground Improvement Using Stone Column)
22 pages
Tut 2
No ratings yet
Tut 2
5 pages
Multivariate Statistics PCA
No ratings yet
Multivariate Statistics PCA
19 pages
Battery Charger Rs-1000: Downloaded From Manuals Search Engine
No ratings yet
Battery Charger Rs-1000: Downloaded From Manuals Search Engine
32 pages
MV - Principal Components Using SAS
No ratings yet
MV - Principal Components Using SAS
69 pages
Unit 1 - Introduction To Business Intelligence and Big Data Analytics
No ratings yet
Unit 1 - Introduction To Business Intelligence and Big Data Analytics
36 pages
The Mathematics Behind Principal Component Analysis
No ratings yet
The Mathematics Behind Principal Component Analysis
9 pages
Week 2 Notes
No ratings yet
Week 2 Notes
23 pages
Learning Piano by Yourself
No ratings yet
Learning Piano by Yourself
2 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
Physics 2020 QP Set 1 English
No ratings yet
Physics 2020 QP Set 1 English
10 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
12 pages
Unit 3
No ratings yet
Unit 3
28 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
M1L3 LN
No ratings yet
M1L3 LN
7 pages
Steps For PCA
No ratings yet
Steps For PCA
5 pages
Topic 3 - Generating Innovative Ideas - Developing Inno Strats
No ratings yet
Topic 3 - Generating Innovative Ideas - Developing Inno Strats
61 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
CEM3 Manual
No ratings yet
CEM3 Manual
76 pages
Advanced Materials For Space Applications
No ratings yet
Advanced Materials For Space Applications
9 pages
Topic 4 - Implementing Inno and Operations - Organ Inno
No ratings yet
Topic 4 - Implementing Inno and Operations - Organ Inno
37 pages
Pca Portfolio Selection
No ratings yet
Pca Portfolio Selection
18 pages
Problems in Quantum Mechanics: Third Edition
From Everand
Problems in Quantum Mechanics: Third Edition
D. ter Haar
3/5 (2)
Lecture 10
No ratings yet
Lecture 10
7 pages
Ch. 10 Principal Components Analysis (PCA)
No ratings yet
Ch. 10 Principal Components Analysis (PCA)
17 pages
Pca
No ratings yet
Pca
10 pages
Elements of Tensor Calculus
From Everand
Elements of Tensor Calculus
A. Lichnerowicz
3.5/5 (2)
Principal Component Analysis & Factor Analysis: Psych 818 Deshon
No ratings yet
Principal Component Analysis & Factor Analysis: Psych 818 Deshon
52 pages
Mobile Antenna System Handbook
No ratings yet
Mobile Antenna System Handbook
15 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
53 pages
2022 Lutomirski - Strength Reduction Factors
No ratings yet
2022 Lutomirski - Strength Reduction Factors
9 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
BIA B452F - 2023 Spring - Assignment 1 Rubrics (OLE)
No ratings yet
BIA B452F - 2023 Spring - Assignment 1 Rubrics (OLE)
1 page
Using R For Basic Statistical Analysis
No ratings yet
Using R For Basic Statistical Analysis
11 pages
R Programming For BIA B452F
No ratings yet
R Programming For BIA B452F
21 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Pac
No ratings yet
Pac
70 pages
Selective Determination of Fe (III) in Fe (II) Samples by UV-spectrophotometry With The Aid of Quercetin and Morin
No ratings yet
Selective Determination of Fe (III) in Fe (II) Samples by UV-spectrophotometry With The Aid of Quercetin and Morin
8 pages
Jolliffe 2014
No ratings yet
Jolliffe 2014
5 pages
FIN B488F - 2022 Autumn - Specimen Exam Paper
No ratings yet
FIN B488F - 2022 Autumn - Specimen Exam Paper
4 pages
Principal Component Analysis (PCA) Final
No ratings yet
Principal Component Analysis (PCA) Final
37 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
Maxime Cohen Promo Paper Final
No ratings yet
Maxime Cohen Promo Paper Final
58 pages
Ch11 Properties of Stock Options Fall 2022-20221101
No ratings yet
Ch11 Properties of Stock Options Fall 2022-20221101
67 pages
Principal Component Analysis: 2.1 Definition of Principal Components
No ratings yet
Principal Component Analysis: 2.1 Definition of Principal Components
8 pages
CPA 200 COmponents
No ratings yet
CPA 200 COmponents
11 pages
Factor Analysis
No ratings yet
Factor Analysis
26 pages
Factor Analysis
No ratings yet
Factor Analysis
57 pages
Terminn - Kjal Saved Output
No ratings yet
Terminn - Kjal Saved Output
49 pages
Principal Components Analysis (PCA) : 2.1 Outline of Technique
No ratings yet
Principal Components Analysis (PCA) : 2.1 Outline of Technique
21 pages
Reliability
No ratings yet
Reliability
10 pages
BIA B350F - 2022 Autumn - Specimen Exam Paper
No ratings yet
BIA B350F - 2022 Autumn - Specimen Exam Paper
13 pages
Ch17 Index and Currency Options Fall 2022
No ratings yet
Ch17 Index and Currency Options Fall 2022
12 pages
FIN B488F-Tutorial Answers - Ch03 - Autumn 2022
No ratings yet
FIN B488F-Tutorial Answers - Ch03 - Autumn 2022
7 pages
FIN B488F - 2022 Autumn - Exam Formula Booklet - SV
No ratings yet
FIN B488F - 2022 Autumn - Exam Formula Booklet - SV
7 pages
Edb Postgres Architecture Deep Dive
No ratings yet
Edb Postgres Architecture Deep Dive
5 pages
BIA B350F - 2022 Autumn - Specimen Exam Sample Answers
No ratings yet
BIA B350F - 2022 Autumn - Specimen Exam Sample Answers
6 pages
PC Regression
No ratings yet
PC Regression
25 pages
23 July 2024 - Comprehensive Review of Depression Detection Techniques Based On Machine Learning Approach
No ratings yet
23 July 2024 - Comprehensive Review of Depression Detection Techniques Based On Machine Learning Approach
25 pages
Ch03 Exercise Fall 2022
No ratings yet
Ch03 Exercise Fall 2022
5 pages
IR, Postprandial Drowsiness, Diabetes
No ratings yet
IR, Postprandial Drowsiness, Diabetes
5 pages
Atg - Format
No ratings yet
Atg - Format
8 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Component Analysis Is A Dimension-Reduction Tool That Can
No ratings yet
Component Analysis Is A Dimension-Reduction Tool That Can
2 pages
Chapter-4 Principal Component Analysis-Based Fusion
No ratings yet
Chapter-4 Principal Component Analysis-Based Fusion
27 pages
Relevance of Free Jet Model For Soil Erosion by Impinging Jets
No ratings yet
Relevance of Free Jet Model For Soil Erosion by Impinging Jets
15 pages
BIA B350F - 2022 Autumn - Assignment 1 Rubrics (OLE)
No ratings yet
BIA B350F - 2022 Autumn - Assignment 1 Rubrics (OLE)
3 pages
Fingerprint Identification and Verification System Using Minuate Matching
No ratings yet
Fingerprint Identification and Verification System Using Minuate Matching
4 pages
328 Requirements
No ratings yet
328 Requirements
2 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.