PCALDAICA
PCALDAICA
• Representation methods
– Representation in frequency domain : linear transform
• DFT, DCT, DST, DWT, …
• Used as compression methods
– Subspace derivation
• PCA, ICA, LDA
• Linear transform derived from training data
– Feature extraction methods
• Edge(Line) Detection
• Feature map obtained by filtering
• Gabor transform
• Active contours (Snakes)
• …
What is subspace? (1/2)
• Find a basis in a low dimensional sub-space:
− Approximate vectors by projecting them in a low dimensional
sub-space:
(1) Original space representation:
• Motive
– Find bases which has high variance in data
– Encode data with small number of bases with low MSE
Derivation of PCs
Assume that
1
E[x] = 0 a=x q=q x
T T || q ||= (q q)
T 2 =1
Rq = q
Dimensionality Reduction (1/2)
Can ignore the components of less significance.
25
20
Variance (%)
15
10
0
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
You do lose some information, but if the eigenvalues are small, you
don’t lose much
– n dimensions in original data
– calculate n eigenvectors and eigenvalues
– choose only the first p eigenvectors, based on their eigenvalues
– final data set has only p dimensions
Dimensionality Reduction (2/2)
Variance
Dimensionality
Reconstruction from PCs
Original
q=16 q=32 q=64 q=100… Image
LINEAR DISCRIMINANT
ANALYSIS (LDA)
Limitations of PCA
Are the maximal variance dimensions the
relevant dimensions for preservation?
Linear Discriminant Analysis (1/6)
• What is the goal of LDA?
− Perform dimensionality reduction “while preserving as much of the
class discriminatory information as possible”.
− Seeks to find directions along which the classes are best separated.
− Takes into consideration the scatter within-classes but also the
scatter between-classes.
− For example of face recognition, more capable of distinguishing
image variation due to identity from variation due to other sources
such as illumination and expression.
Linear Discriminant Analysis (2/6)
c ni
projection matrix
y =U x T
| S%
b | | U T SbU |
max = max T products of eigenvalues !
%
| Sw | | U S wU |
S w−1Sb = U U T
S%b , S%w : scatter matrices of the projected data y
Linear Discriminant Analysis (3/6)
• Does Sw-1 always exist?
− If Sw is non-singular, we can obtain a conventional eigenvalue
problem by writing:
S w−1Sb = U U T
− c.f. Since Sb has at most rank C-1, the max number of eigenvectors
with non-zero eigenvalues is C-1 (i.e., max dimensionality of sub-
space is C-1)
Linear Discriminant Analysis (4/6)
• Does Sw-1 always exist? – cont.
− To alleviate this problem, we can use PCA first:
1) PCA is first applied to the data set to reduce its dimensionality.
• ICA
– Focus on independent and non-Gaussian components
– Higher-order statistics
– Non-orthogonal transformation
Independent Component
Analysis (1/5)
• Concept of ICA
– A given signal(x) is generated by linear mixing(A) of independent
components(s)
– ICA is a statistical analysis method to estimate those independent
components(z) and Mixing rule(W)
x1 Wij z1
Aij
s1
x2 z2
s2
x3 z3
z = Wx = WAs
s3
We do not know
xM Both unknowns
zM ➔ Some optimization
sM Function is required!!
s A x W z
Independent Component
Analysis (2/5)
W X = U X = A U
W −1 = A
Independent Component
Analysis(3/5)
• What is independent component??
– If one variable can not be estimated from other
variables, it is independent.
– By Central Limit Theorem, a sum of two
independent random variables is more gaussian
than original variables ➔ distribution of
independent components are nongaussian
– To estimate ICs, z should have nongaussian
distribution, i.e. we should maximize
nonguassianity.
Independent Component
Analysis(4/5)
• What is nongaussianity?
– Supergaussian
– Subgaussian
– Low entropy