0% found this document useful (0 votes)
28 views58 pages

Computer Vision: Spring 2006 15-385,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm - 4:20pm

This document summarizes a lecture on principal component analysis (PCA). PCA is a technique used to reduce the dimensionality of data while retaining as much information as possible. It works by transforming a set of observations of possibly correlated variables into a set of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. PCA is useful for data visualization and reducing noise.

Uploaded by

rashed azad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views58 pages

Computer Vision: Spring 2006 15-385,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm - 4:20pm

This document summarizes a lecture on principal component analysis (PCA). PCA is a technique used to reduce the dimensionality of data while retaining as much information as possible. It works by transforming a set of observations of possibly correlated variables into a set of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. PCA is useful for data visualization and reducing noise.

Uploaded by

rashed azad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 58

Computer Vision

Spring 2006 15-385,-685

Instructor: S. Narasimhan

Wean 5403
T-R 3:00pm – 4:20pm

Lecture #19
Principal Components Analysis

Lecture #19
Data Presentation

• Example: 53 Blood and


urine measurements (wet • Spectral Format
chemistry) from 65 1000
people (33 alcoholics, 32 900
800
non-alcoholics). 700
• Matrix Format 600

Value
500
400
H - W B C H - R B C H - H g b H - H c t H - M C V H - M C H H - M C H C
300
200
A 1 8 . 0 0 0 0 4 . 8 2 0 0 1 4 . 1 0 0 0 4 1 . 0 0 0 0 8 5 . 0 0 0 0 2 9 . 0 0 0 0 3 4 . 0 0 0 0

A 2 7 . 3 0 0 0 5 . 0 2 0 0 1 4 . 7 0 0 0 4 3 . 0 0 0 0 8 6 . 0 0 0 0 2 9 . 0 0 0 0 3 4 . 0 0 0 0

A 3 4 . 3 0 0 0 4 . 4 8 0 0 1 4 . 1 0 0 0 4 1 . 0 0 0 0 9 1 . 0 0 0 0 3 2 . 0 0 0 0 3 5 . 0 0 0 0
100
00
A 4 7 . 5 0 0 0 4 . 4 7 0 0 1 4 . 9 0 0 0 4 5 . 0 0 0 0 1 0 1 . 0 0 0 0 3 3 . 0 0 0 0 3 3 . 0 0 0 0

A 5 7 . 3 0 0 0 5 . 5 2 0 0 1 5 . 4 0 0 0 4 6 . 0 0 0 0 8 4 . 0 0 0 0 2 8 . 0 0 0 0 3 3 . 0 0 0 0
10 20 30 40 50 60
A 6 6 . 9 0 0 0 4 . 8 6 0 0 1 6 . 0 0 0 0 4 7 . 0 0 0 0 9 7 . 0 0 0 0 3 3 . 0 0 0 0 3 4 . 0 0 0 0
measurement
Measurement
A 7 7 . 8 0 0 0 4 . 6 8 0 0 1 4 . 7 0 0 0 4 3 . 0 0 0 0 9 2 . 0 0 0 0 3 1 . 0 0 0 0 3 4 . 0 0 0 0

A 8 8 . 6 0 0 0 4 . 8 2 0 0 1 5 . 8 0 0 0 4 2 . 0 0 0 0 8 8 . 0 0 0 0 3 3 . 0 0 0 0 3 7 . 0 0 0 0

A 9 5 . 1 0 0 0 4 . 7 1 0 0 1 4 . 0 0 0 0 4 3 . 0 0 0 0 9 2 . 0 0 0 0 3 0 . 0 0 0 0 3 2 . 0 0 0 0
Data Presentation

1.8 Univariate Bivariate


550
1.6 500
1.4 450
1.2 400

C-LDH
H-Bands

1 350
0.8 300
0.6 250
0.4 200
0.2 150
100
0 50
0 10 20 30 40 50 60 70 0 50 150 250 350 450
Person Trivariate C-Triglycerides
4

3
M-EPI

0
600
400 500
400
200 300
C-LDH 00
100
200
C-Triglycerides
Data Presentation

• Better presentation than ordinate axes?


• Do we need a 53 dimension space to view data?
• How to find the ‘best’ low dimension space that
conveys maximum useful information?
• One answer: Find “Principal Components”
Principal Components

30
• All principal components
25
(PCs) start at the origin of

Wavelength 2
20
the ordinate axes. 15
PC 1
• First PC is direction of 10

maximum variance from 5

origin 0 0 5 10 15 20 25 30
Wavelength 1
• Subsequent PCs are
30
orthogonal to 1st PC and
25
describe maximum Wavelength 2
20
residual variance 15
PC 2

10

0 0 5 10 15 20 25 30
Wavelength 1
The Goal

We wish to explain/summarize the underlying variance-


covariance structure of a large set of variables through a
few linear combinations of these variables.
Applications

• Uses: • Examples:
– How many unique “sub-sets” are in the
– Data Visualization sample?
– Data Reduction – How are they similar / different?
– What are the underlying factors that
– Data Classification influence the samples?
– Trend Analysis – Which time / temporal trends are
(anti)correlated?
– Factor Analysis – Which measurements are needed to
– Noise Reduction differentiate?
– How to best present what is “interesting”?
– Which “sub-set” does this new sample
rightfully belong?
Trick: Rotate Coordinate Axes
Suppose we have a population measured on p random
variables X1,…,Xp. Note that these random variables
represent the p-axes of the Cartesian coordinate system in
which the population resides. Our goal is to develop a new
set of p axes (linear combinations of the original p axes) in
the directions of greatest variability:
X2

X1

This is accomplished by rotating the axes.


FLASHBACK:

“BINARY IMAGES” LECTURE!


Geometric Properties of Binary Images

• Orientation: Difficult to define! b(x, y)

• Axis of least second moment


• For mass: Axis of minimum inertia

y y
(x, y)
x
r

 x

Minimize: E   r b( x, y ) dx dy
2
Which equation of line to use?
y
(x, y)

r

 x

y  mx  b ? 0m
We use:
 
x sin   y cos     0 are finite
Minimizing Second Moment

Find  and  that minimize E for a given b(x,y)

We can show that: r  x sin   y cos   

So, E   ( x sin   y cos   ) b( x, y ) dx dy


2

dE
Using  0 we get: A( x sin   y cos    )  0
d

Note: Axis passes through the center ( x, y )


So, change co-ordinates: x '  x  x, y'  y  y
Minimizing Second Moment

We get: E  a sin 2   b sin  cos  c cos 2 


where,

a   ( x' ) 2 b( x, y ) dx' dy '

b  2 ( x' y ' ) b( x, y ) dx' dy '

c   ( y ' ) 2 b( x, y ) dx' dy '


- second moments w.r.t ( x, y )

We are not done yet!!


Minimizing Second Moment

E  a sin   b sin  cos  c cos 


2 2

dE b
Using  0 we get: tan 2 
d ac

b ac
sin 2   cos 2  
b 2  (a  c) 2 b 2  (a  c) 2

Solutions with +ve sign must be used to minimize E. (Why?)

Emin

 roundedness
Emax
END of FLASHBACK!
Algebraic Interpretation

• Given m points in a n dimensional space, for large n,


how does one project on to a low dimensional space
while preserving broad trends in the data and
allowing it to be visualized?
Algebraic Interpretation – 1D

• Given m points in a n dimensional space, for large n, how does


one project on to a 1 dimensional space?

• Choose a line that fits the data so the points are spread out well
along the line
Algebraic Interpretation – 1D

• Formally, minimize sum of squares of distances to the line.

• Why sum of squares? Because it allows fast minimization, assuming the


line passes through 0
Algebraic Interpretation – 1D

• Minimizing sum of squares of distances to the line is the


same as maximizing the sum of squares of the
projections on that line, thanks to Pythagoras.
Algebraic Interpretation – 1D

• How is the sum of squares of projection lengths


expressed in algebraic terms?

Line P P P… P Point 1 L
t t t … t Point 2 i
1 2 3… m Point 3 n
: e
Point m
xT BT B x
Algebraic Interpretation – 1D

• How is the sum of squares of projection lengths


expressed in algebraic terms?

max( xTBT Bx), subject to xTx = 1


Algebraic Interpretation – 1D

• Rewriting this:
xTBTBx = e = e xTx = xT (ex)
<=> xT (BTBx – ex) = 0

• Show that the maximum value of xTBTBx is obtained for x satisfying


BTBx=ex

• So, find the largest e and associated x such that the matrix BTB when applied to x
yields a new vector which is in the same direction as x, only scaled by a factor e.
Algebraic Interpretation – 1D

• (BTB)x points in some other direction in general

(BTB)x

x is an eigenvector and e an eigenvalue if

ex=(BTB)x
x
Algebraic Interpretation – 1D
• How many eigenvectors are there?
• For Real Symmetric Matrices

– except in degenerate cases when eigenvalues repeat, there are n eigenvectors


x1…xn are the eigenvectors
e1…en are the eigenvalues

– all eigenvectors are mutually orthogonal and therefore form a new basis
• Eigenvectors for distinct eigenvalues are mutually orthogonal
• Eigenvectors corresponding to the same eigenvalue have the property that any linear combination is also
an eigenvector with the same eigenvalue; one can then find as many orthogonal eigenvectors as the
number of repeats of the eigenvalue.
Algebraic Interpretation – 1D

• For matrices of the form BTB

– All eigenvalues are non-negative (show this)


PCA: General

From k original variables: x1,x2,...,xk:


Produce k new variables: y1,y2,...,yk:

y1 = a11x1 + a12x2 + ... + a1kxk


y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk
PCA: General
From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
...
yk = ak1x1 + ak2x2 + ... + akkxk

such that:

yk's are uncorrelated (orthogonal)


y1 explains as much as possible of original variance in data set
y2 explains as much as possible of remaining variance
etc.
5
2nd Principal
Component, y2 1st Principal
4
Component, y1

2
4.0 4.5 5.0 5.5 6.0
PCA Scores

xi2 4 yi,1 yi,2

2
4.0 4.5 5.0 xi1 5.5 6.0
PCA Eigenvalues

5
λ1 λ2

2
4.0 4.5 5.0 5.5 6.0
PCA: Another Explanation
From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk:
y1 = a11x1 + a12x2 + ... + a1kxk
y2 = a21x1 + a22x2 + ... + a2kxk
yk's are
...
Principal Components
yk = ak1x1 + ak2x2 + ... + akkxk

such that:

yk's are uncorrelated (orthogonal)


y1 explains as much as possible of original variance in data set
y2 explains as much as possible of remaining variance
etc.
Principal Components Analysis on:

• Covariance Matrix:
– Variables must be in same units
– Emphasizes variables with most variance
– Mean eigenvalue ≠1.0

• Correlation Matrix:
– Variables are standardized (mean 0.0, SD 1.0)
– Variables can be in different units
– All variables have same impact on analysis
– Mean eigenvalue = 1.0
PCA: General
{a11,a12,...,a1k} is 1st Eigenvector of correlation/covariance
matrix, and coefficients of first principal component

{a21,a22,...,a2k} is 2nd Eigenvector of correlation/covariance


matrix, and coefficients of 2nd principal component

{ak1,ak2,...,akk} is kth Eigenvector of correlation/covariance


matrix, and coefficients of kth principal component
PCA Summary until now

• Rotates multivariate dataset into a new


configuration which is easier to interpret

• Purposes
– simplify data
– look at relationships between variables
– look at patterns of units
PCA: Yet Another Explanation
Classification in Subspace

convert x into v1, v2 coordinates

What does the v2 coordinate measure?


- distance to line
- use it for classification—near 0 for orange pts
What does the v1 coordinate measure?
- position along line
- use it to specify which orange point it is

• Classification can be expensive


– Must either search (e.g., nearest neighbors) or store large probability density functions.

• Suppose the data points are arranged as above


– Idea—fit a line, classifier measures distance to line
Dimensionality Reduction

• Dimensionality reduction
– We can represent the orange points with only their v1 coordinates
• since v2 coordinates are all essentially 0
– This makes it much cheaper to store and compare points
– A bigger deal for higher dimensional problems
Linear Subspaces
Consider the variation along direction v
among all of the orange points:

What unit vector v minimizes var?

What unit vector v maximizes var?

Solution: v1 is eigenvector of A with largest eigenvalue


v2 is eigenvector of A with smallest eigenvalue
Higher Dimensions
• Suppose each data point is N-dimensional
– Same procedure applies:

– The eigenvectors of A define a new coordinate system


• eigenvector with largest eigenvalue captures the most variation among
training vectors x
• eigenvector with smallest eigenvalue has least variation
– We can compress the data by only using the top few eigenvectors
• corresponds to choosing a “linear subspace”
– represent points on a line, plane, or “hyper-plane”
• these eigenvectors are known as the principal components
End of Yet Another Explanation
A 2D Numerical Example
PCA Example –STEP 1

• Subtract the mean


from each of the data dimensions. All the x
values have x subtracted and y values have y
subtracted from them. This produces a data set
whose mean is zero.
Subtracting the mean makes variance and
covariance calculation easier by simplifying their
equations. The variance and co-variance values
are not affected by the mean value.
PCA Example –STEP 1
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf

ZERO MEAN DATA:


DATA:
x y
x y
2.5 2.4 .69 .49
0.5 0.7 -1.31 -1.21
2.2 2.9 .39 .99
1.9 2.2 .09 .29
3.1 3.0 1.29 1.09
2.3 2.7 .49 .79
2 1.6 .19 -.31
1 1.1 -.81 -.81
1.5 1.6 -.31 -.31
1.1 0.9 -.71 -1.01
PCA Example –STEP 1
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
PCA Example –STEP 2

• Calculate the covariance matrix


cov = .616555556 .615444444
.615444444 .716555556

• since the non-diagonal elements in this


covariance matrix are positive, we should expect
that both the x and y variable increase together.
PCA Example –STEP 3

• Calculate the eigenvectors and


eigenvalues of the covariance matrix
eigenvalues = .0490833989
1.28402771
eigenvectors = -.735178656 -.677873399
.677873399 -.735178656
PCA Example –STEP 3
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
•eigenvectors are plotted
as diagonal dotted lines on
the plot.
•Note they are
perpendicular to each
other.
•Note one of the
eigenvectors goes through
the middle of the points,
like drawing a line of best
fit.
•The second eigenvector
gives us the other, less
important, pattern in the
data, that all the points
follow the main line, but
are off to the side of the
main line by some
amount.
PCA Example –STEP 4

• Reduce dimensionality and form feature vector


the eigenvector with the highest eigenvalue is the principle
component of the data set.

In our example, the eigenvector with the larges eigenvalue


was the one that pointed down the middle of the data.

Once eigenvectors are found from the covariance matrix,


the next step is to order them by eigenvalue, highest to
lowest. This gives you the components in order of
significance.
PCA Example –STEP 4

Now, if you like, you can decide to ignore the


components of lesser significance.

You do lose some information, but if the eigenvalues are


small, you don’t lose much

• n dimensions in your data


• calculate n eigenvectors and eigenvalues
• choose only the first p eigenvectors
• final data set has only p dimensions.
PCA Example –STEP 4

• Feature Vector
FeatureVector = (eig1 eig2 eig3 … eign)
We can either form a feature vector with both of
the eigenvectors:
-.677873399 -.735178656
-.735178656 .677873399
or, we can choose to leave out the smaller, less
significant component and only have a single
column:
- .677873399
- .735178656
PCA Example –STEP 5

• Deriving the new data


FinalData = RowFeatureVector x RowZeroMeanData
RowFeatureVector is the matrix with the eigenvectors in
the columns transposed so that the eigenvectors are
now in the rows, with the most significant eigenvector at
the top
RowZeroMeanData is the mean-adjusted data
transposed, ie. the data items are in each
column, with each row holding a separate
dimension.
PCA Example –STEP 5
FinalData transpose:
dimensions along columns
x y
-.827970186 -.175115307
1.77758033 .142857227
-.992197494 .384374989
-.274210416 .130417207
-1.67580142 -.209498461
-.912949103 .175282444
.0991094375 -.349824698
1.14457216 .0464172582
.438046137 .0177646297
1.22382056 -.162675287
PCA Example –STEP 5
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
Reconstruction of original Data
• If we reduced the dimensionality, obviously,
when reconstructing the data we would lose
those dimensions we chose to discard. In our
example let us assume that we considered only
the x dimension…
Reconstruction of original Data
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf

x
-.827970186
1.77758033
-.992197494
-.274210416
-1.67580142
-.912949103
.0991094375
1.14457216
.438046137
1.22382056
Next Class

• Principal Components Analysis (continued)


• Reading  Notes, Online reading material

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy