0% found this document useful (0 votes)
74 views25 pages

Applications of Random Matrix Theory To Principal Component Analysis (PCA)

This document discusses applications of random matrix theory to principal component analysis (PCA). It begins with an overview of the basic model where the data matrix X is modeled as X = Tx, where T represents signals and x represents independent random variables. Some key questions are outlined, such as the limiting spectral distribution and rigidity of eigenvalues, delocalization of eigenvectors, and behaviors of outliers. The document then discusses previous results on eigenvalues and eigenvectors in various settings. The main results presented include rigidity of eigenvalues, delocalization of non-outlier eigenvectors, and properties of outlier eigenvectors. The strategy involves representing the data matrix in terms of a signal matrix S with no outlier eigenvalues. Isotropic laws are discussed for general settings

Uploaded by

soumensahil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views25 pages

Applications of Random Matrix Theory To Principal Component Analysis (PCA)

This document discusses applications of random matrix theory to principal component analysis (PCA). It begins with an overview of the basic model where the data matrix X is modeled as X = Tx, where T represents signals and x represents independent random variables. Some key questions are outlined, such as the limiting spectral distribution and rigidity of eigenvalues, delocalization of eigenvectors, and behaviors of outliers. The document then discusses previous results on eigenvalues and eigenvectors in various settings. The main results presented include rigidity of eigenvalues, delocalization of non-outlier eigenvectors, and properties of outlier eigenvectors. The strategy involves representing the data matrix in terms of a signal matrix S with no outlier eigenvalues. Isotropic laws are discussed for general settings

Uploaded by

soumensahil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Applications of random matrix theory to

principal component analysis(PCA)


Jun Yin
IAS, UW-Madison

IAS, April-2014
Joint work with A. Knowles and H. T Yau.
1

Basic picture:
Let H be a Wigner (symmetric) random matrix:
H = (Hij )16i,j6N ,

H = H ,

Hij = N 1/2hij

is a random matrix, whose upper right entries hij s are independent random variables with mean 0 and variance 1.
1
E Hij = 0, E |Hij |2 = , 1 6 i, j 6 N
N
Let A = AN N be a (full rank) deterministic symmetric matrix.
Most of the e.values of A are O(1).
What can we say about
H +A

[Knowles and Y, 2011-12]: on the rank A = O(1) case.


, where d = 10, d = 1 for 2 6 k 6
Example: A = N
d
v
v
1
k
k
k
k=1
k
N/2 and other dk = 2.

Some basic questions:


Limiting spectral distributions: HA Voiculescu 1986, Speicher 1994 (Free probablity)
Local density or rigidity.


|k k | 6 N k k1 ,

Z
k

HA = k/N

holds with 1 N D for any small > 0 and D > 0.


Delocalization of e.vectors (in any direction). Let uk be `2
normalized eigenvectors of H + A, (1 6 k 6 N ) , and w be
deterministic vector
maxhuk , wi2 6 N 1+
k

holds with 1 N D for any small > 0 and D > 0.


3

Some basic questions:

Behaviors of outlier.

Behaviors of the e.vector of outlier.

Joint distribution of the largest k non-outlier eigenvalues.

k-point correlation functions of H + A in bulk.

Wigner matrix has two important properties:


1. Independent entries.
2. Isotropic rows/columns (without diagonal entries): let hk =
{Hkl }l6=k and w RN 1 be deterministic vector, then
hhk , wi N(0, N 1kwk2
2)
Why is 2 important, think about a growing matrix
Clearly H + A does not have the second property.
Two exceptional cases:

A is diagonal
H is GOE.
5

Sample covariance matrix


Let XX be a sample covariance matrix:
X = (Xij )16i6M 0,16j6N ,

Xij = (M 0N )1/4xij

is a random matrix, where xij s are independent random variables


with mean 0 and variance 1.
Let T = TM M 0 (M 0 > M ) be a deterministic matrix, T T I has
full rank and most of the e.values of T T are O(1).
What can we say about
T XXT

Furthermore, let e be M 1/2(1, 1, , 1, 1). How about


T X(1 ee)XT
6

Real life question:


Let y = (y(1), y(2), , y(M )) be some random vector with unknown distribution. For example: price changes of M stocks.
How can one get the covariance matrix:
M

Cov(y(i), y(j))
i,j=1

Stat 101: Mesure y N times independently: y1, y2, , yN


N
X
1
Cov(y(i), y(j)) = lim
(y(i) y(i)) (y(j) y(j))
N N 1
=1

where
N
1 X
y(i)
y(i) =
N =1

Model: we assume that y is linear mix of some fundamental



independent random variable x = x(1), x(2), , x(M 0)

y = T x,

TM M 0

If you measure y N times then


Y = (y1, y2 , yN ) = T X = T (x1, x2 , xN )
We know
1
T X(1 ee)XT
Cov(y(i), y(j)) = lim
N N 1
e is random vector and
For example: Let v is a fixed vector, x
is random variable, and they are independent.
e + v
y=x

Then

y = (I, v)

x
e

= Tx
8

Model:

y = T x,

x = (x(1), x(2), , x(M 0))

TM M 0 ,

1
T X(1 ee)XT , ()
Cov(y(i), y(j)) = lim
N N 1
Without loss of generality, we assume that Ex(i) = 0 (by
defining x0(i) = x(i) Ex(i)). And we assume E|x(i)|2 = 1 (by
rescaling T ). With this setting


Cov(y(i), y(j)) = T T ij

Recall the previous example:


2 = 1,
e
E|x(i)|

e + v,
y=x

E||2 = 1

Here kvk2  1,

y = (I, v)

x
e

= T x,

T T = I + vv

As we can see T T has a large e. value (1 + kvk2


2 ) with e. vector
v, and they are related to "signals"
9

Though

1
T X(1 ee)XT
Cov(y(i), y(j)) = T T ij = lim
N N 1
in most case it does not work very well, since N needs to be very
large (like N  M ).

The basic idea is estimating the principal component (large e.values


and their e.vectors) of the matrix
TT
with those of the matrix
T X(1 ee)XT
or just
T XXT

10

PCA model:
P

T T = d v v ,

TT =

|signal| = O(1)
X

d v v +

signal

TM M 0 ,

d v v ,

noise

M 0 = M + O(1)

Noise are comparable


c 6 d 6 C,

d noise

log N log M .
d and v could depend on N or M
11

Basic picture (noise = 1 and only one signal case)



1/2
Let d := (N/M )
signal noise . Outlier appears when d > 1

and outlier satisfies:


= + 2 + d + d1 + error.

13

Detection of vsignal.
Let u be the eigenvector of (outlier), then for any fixed normalized w, we have
(w, u)2 = f(w, vsignal)2 + error
Distribution of u?
1. (vsignal, u) = arccos

f + error

2. Delocalization in any direction orthogonal to vsignal, i.e., if


we have (w, vsignal) = 0, then (w, u) 6 M 1/2+.
Briefly speaking, u (u, vsignal)vsignal is random and isotropic.
Two outliers cases: see graph.
14

Application of delocalization
f non-zero components,
Assume we know vsignal RM only has M
f  M and
M
f )1/2,
vsignal(i) (M

if

vsignal(i) 6= 0

Then
1. if vsignal(i) = 0, delocalization property shows |u(i)| 6 M 1/2+
f )1/2
2. if vsignal(i) 6= 0, parallel property shows |u(i)| > (M

Using this method, we can know that which components of


vsignal are non-zero.

15

Some previous results: Eigenvalues:


T XX T ,

TM M 0 ,

P
P

T T = signal d v v + noise d v v

Baik and Silverstein (2006): p = q, noise = 1, for fixed d, they


obtain the limit of .
Bai and Yao (2008): p = q, noise =
! 1, T is of the form (where
A 0
A is O(1) O(1) matrix) T =
for fixed d, they obtain the
0 I
CLT of .
Bai and Yao (2008): p = q, T is symmetric !
matrix of the form
A 0
(where A is O(1) O(1) matrix) T =
, for fixed d, they
0 Te
obtain the limit of .


Nadler (2008): The spiked covariance model. T = I e1


16

Eigenvector:
Then for any fixed w, we have (w, u)2 = fd(w, vsignal)2 + error
Paul (2007): p = q, noise = 1, T is diagonal and Xij is Gaussian.
Shi (2013): p = q, noise = 1, T is diagonal.
Benaych-Georges, Nadakuditi (2010), p = q, noise = 1, T is
random symmetric matrix, T is independent of X, either Xij is
Gaussian or T is isotropic.


Benaych-Georges, Nadakuditi (2012): T = I v with random


isotropic v.
Results: limit of (u, vsignal)2, except the first one.
17

Main results
1. Rigidity of e.values (including outliers): (up to N factor)
i i = error
2. Delocalization of e.vectors of non-outliers.
3. Direction of the e.vectors of outliers. u (u, vsignal)vsignal is
random and isotropic.
4. Some eigenvector information can be detected even if d =
1 o(1)
5. TW distribution of the largest k non-outliers.
(2007): Gaussian case.

El Karoui

6. Isotropic law of (H + A) or T XXT .


7. Bulk universality with 4 moment match (for (H + A) or
T XXT ).
18

Strategy:
With VM M ,
T = V D(IM

UM 0M 0 and diagonal DM M
, 0)U 0,

TT =

d v v +

signal

d v v

noise

Define
S = S ,

SS =

1v v +

signal

d v v

noise

Note: S has no signal. Represent




G := T X(1 ee)X T z

1

with
GS ,


where GS = SXX S z

GS X,

XGS X,

etc

1

.
19

Question:
Let A be a matrix with only one non-zero entry and X 0 = X + A.
Then

(XX z)1 (X 0X 0 z)1,

20

Isotropic law:
Wigner: Let H be a Wigner matrix, G = (H z)1, then for any
fixed w and v, we have
(w, (H z)1v) = m(z) + O((N )1(log N )C ),
and m(z) =

= Im z

sc(x)(x z)1dz. Knowles and Y (2011).

PCA: For fixed w and v, what are the behaviors of


(w, GS v),

(w, GS X v),

(w, X GS X v),

GS = (SXX S z)1

Bloemendal, Erdos, Knowles, Yau and Y (2013): S = IM M


case.

21

Isotropic law for general S or general A


(SXX S z)1,

(H + A)1

Knowles and Y (2014):


Let A = U DU with D = diag(d1, d2, , dN ). Here |di| 6 C.
Define
1X
1
mj
mi = (di z + m) , m :=
N j
Then for fixed w and v RN ,
(w, (H z)1 v) = (w, (A z + m)1 v) + error
Based on this result: rigidity, delocalization, TW law. (Capitaine, Peche 2014: GOE+A)

22

Basic idea of proving the isotropic law of H + A.


1. The isotropic law of GOE + A. (polynomialization method)
Bloemendal, Erdos, Knowles, Yau and Y (2013)
2. Compare (H + A)1 with (GOE + A)1 with Newton method.
Let be the distribution density of Hij and G be the distribution
density of the entries of GOE. Let H t be the Wigner matrix
whose entries having distributions
t = t + (1 t) G
Continuous bridge between H and GOE.

23

Recall
EF (H) =

Z 0
1

t EF (H t)dt

Let H t,k,l,0 be the Wigner matrix whose entries having the same
distribution as H t except that (k, l) entry of H t,k,l0 has the distribution of .
Let H t,k,l,1 be the Wigner matrix whose entries having the same
distribution as H t except that (k, l) entry of H t,k,l,1 has the distribution of G.
Then
t EF (H t) =

X


t,k,l,1
t,k,l,0
EF (H
) EF (H
)

kl

Note: H t,k,l,0 and H t,k,l,1 are very close to H t.

24

For example:
i
1 2p
Fi,j,p(H) = (H z)ij
h

Goal: Create a self-consistent differential equation of

EFi,j,p(H t)

N
ij=1

which is stable.

25

Thank you

26

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy