0% found this document useful (0 votes)
4 views15 pages

Note 8

This document discusses the Singular Value Decomposition (SVD) as a method for solving linear systems, particularly for non-square matrices. It outlines the definitions, properties, and methods for computing both full and compact SVD, as well as the outer product form of SVD. The document emphasizes the applications of SVD in data science, such as recommendation systems, and provides theorems related to the linear algebra of SVD.

Uploaded by

alex05sim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views15 pages

Note 8

This document discusses the Singular Value Decomposition (SVD) as a method for solving linear systems, particularly for non-square matrices. It outlines the definitions, properties, and methods for computing both full and compact SVD, as well as the outer product form of SVD. The document emphasizes the applications of SVD in data science, such as recommendation systems, and provides theorems related to the linear algebra of SVD.

Uploaded by

alex05sim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

EECS 16A Foundations of Signals, Systems, & Info.

Processing
Spring 2025 Note 8

8.1 Singular Value Decomposition


For the past few weeks we’ve been talking about solving linear systems of equations. For instance, we might have
a system Ax = b where we want to find b. Using that same equation, we can look at it another way: sometimes we
want to solve for x given an output vector b and a matrix of data points A. That should be familiar to you from the
context of matrix inverses and QR decomposition manipulations. But we’ve mostly been looking at square matrices
when solving such problems. What if we wanted to solve Ax = b where A is non-square? If you recall, only square
matrices have true inverses, so how would that even work?

Enter, the Singular Value Decomposition, or SVD. Just like how QR "factors" a matrix into components that we
can easily manipulate, SVD does the same thing, but more: the SVD can also give insight into the structure of our
data. We use the SVD to invert non-square matrices, pare down large datasets, and pick out important features in
machine learning. For those of you interested in Data Science, SVD is one of the most effective algorithms for
building recommendation systems (think Netflix movies, YouTube videos, and Spotify songs), and if you know a bit
of Python, you can build your own recommendation systems!

8.1.1 Full SVD


Definition 8.1 (Full SVD): Let A ∈ Rm×n have rank r ≤ min{m, n}. A (full) SVD of A is a decomposition

A = UΣV⊤ (1)

where

(i) U = Ur Um−r ∈ Rm×m is a square orthonormal matrix whose columns are the so-called left singular vectors
 

of A; here Ur ∈ Rm×r and Um−r ∈ Rm×(m−r) are tall matrices with orthonormal columns.

(ii) V = Vr Vn−r ∈ Rn×n is a square orthonormal matrix whose columns are the so-called right singular vectors
 

of A; here Vr ∈ Rn×r and Vn−r ∈ Rn×(n−r) are tall matrices with orthonormal columns.
 
Σr 0r×(n−r)
(iii) Σ = ∈ Rm×n is a non-square diagonal matrix whose diagonal entries σ1 ≥ · · · ≥ σr >
0(m−r)×r 0(m−r)×(n−r)
σr+1 = · · · = σmin{m,n} = 0 are non-negative ordered real numbers, and are the so-called singular values of A.

EECS 16A, Spring 2025, Note 8 1


First, notice that m and n are not necessarily equal, meaning SVD works for any matrix, not just for square ones.
Second, notice that U and V are orthonormal matrices, which we’ve seen are very nice to work with. We now present
some useful linear algebra properties from the SVD.

Theorem 8.1: Linear Algebra of the Full SVD: Let A ∈ Rm×n have rank r ≤ min{m, n}. Let A = UΣV⊤ be an SVD
of A.

(i) Col(Ur ) = Col(A).

(ii) Col(Vr ) = Col(A⊤ ).

(iii) Col(Um−r ) = Null(A⊤ ).

(iv) Col(Vn−r ) = Null(A).

(v) AA⊤ = UΣΣ⊤ U⊤ .

(vi) A⊤ A = VΣ⊤ ΣV⊤ .

(vii) AVr = Ur Σr .

The proof of the Linear Algebra of the Full SVD is on the longer side and may distract from the overall flow of this
note, so it is left to the appendix.

What do these properties mean? The first property takes the first r columns of the U matrix and claims they span the
same vector space as the columns of A. What that means is we can boil down the column space of A to Ur , where
Ur is much easier to work with since the columns of this matrix are orthonormal. You can draw a similar conclusion
about property (ii) with respect to the row space of A.

Where things get confusing is properties (iii) and (iv). Why are the columns of Ur related to the columns of Vn−r ,
even though Vn−r is related to the row space of A? By the rank nullity theorem, dim(Col(A)) + dim(Null(A)) = n, the
number of columns. Since Col(A) ≡ Col(Ur ), and Null(A) ≡ Col(Vn−r ), the dimensions work out. Mathematically,
we show that:

A = UΣV⊤ (2)
AV = UΣV⊤ V (3)
AV = UΣ (4)

Next, we consider V as a block matrix of Vr and Vn−r . Looking only at the last n − r columns of V, that means we
only look at the last n − r singular values of Σ, which are 0.

Avi = 0 ∀ i ∈ [r + 1, n] (5)

EECS 16A, Spring 2025, Note 8 2


We’ve shown that the last n − r columns of Vn−r are in the null space of A.

But conceptually, this is quite difficult to grapple with, and is discussed in more detail in the Full SVD Intuition
section.

Properties (v) and (vi) become relevant when talking about inverting non-square matrices, which will be covered in
the next note!

Now that we know how useful the SVD is, how can we factorize an arbitrary matrix A into its SVD form?

8.1.2 Finding the SVD


Let’s walk through the process of finding the SVD for an arbitrary matrix A ∈ Rm×n :

1. Compute the symmetric matrix: First, find either the matrix A⊤ A (Case I) or AA⊤ (Case II). Choose whichever
product gives you a smaller matrix.

2. Compute the eigenvector-eigenvalue pairs of your chosen symmetric matrix.

3. Construct V or U:

• (Case I) Normalize each eigenvector to form the columns of V .


• (Case II) Normalize each eigenvector to form the columns of U.
• For both cases, if you don’t have enough eigenvectors to span the entire space (n for V , m for U), extend
basis. Add in unit vectors and apply Gram Schmidt to ensure orthonormality.

4. Construct Σ: Take the square root of each eigenvalue, and place them along the diagonal of your Σ matrix in
descending order. Fill the remaining entries with zeros.

5. Construct U or V :
1
• (Case I) Each column vector of U is given by ui = σi Avi
• (Case II) Each column vector of V is given by vi = σ1i A⊤ ui

6. Write out the full SVD: Take the transpose of your V matrix, then write A = UΣV ⊤ . Make sure to order
the columns of your eigenvectors in U and V to match with their eigenvalues. For V , since you eventually
transpose it, notice the first singular value maps to the first row of V ⊤ , and so on.

Theorem 8.2: Correctness of Construction of Full SVD: Let A ∈ Rm×n have rank r ≤ min{m, n}. Let U, Σ, V :=
F ULL SVD(A) be the output of the full SVD construction algorithm. Then A = UΣV⊤ is a full SVD of A.

The proof of the Correctness of Construction of Outer Product Form of SVD is on the longer side and may distract
from the overall flow of this note, so it is left to the appendix.

EECS 16A, Spring 2025, Note 8 3


Why do we use A⊤ A and AA⊤ for the SVD? It turns out that they have special properties with respect to the eigen-
values and eigenvectors!

Let A ∈ Rm×n have rank r ≤ min{m, n}. Then A⊤ A ∈ Rn×n and AA⊤ ∈ Rm×m are symmetric matrices of rank r. Each
has exactly r nonzero eigenvalues, which are real and positive.

See the appendix for proof.

A⊤ A and AA⊤ generate eigenvalues that we can turn into our Σ. The corresponding eigenvectors are orthogonal,
which we leveraged to generate U and V ⊤ !

8.1.3 Compact SVD


What if we didn’t care about null spaces? We can also rewrite the SVD in a compact form, representing only the
linearly independent rows or columns of our matrix A.

Definition 8.2 (Compact SVD): Let A ∈ Rm×n have rank r ≤ min{m, n}. A compact SVD of A is a decomposition

A = Ur Σr V⊤
r (6)

where

(i) Ur ∈ Rm×r is a matrix with orthonormal columns, which are so-called left singular vectors of A.

(ii) Vr ∈ Rn×r is a matrix with orthonormal columns, which are so-called right singular vectors of A.

(iii) Σr ∈ Rr×r is a diagonal matrix whose diagonal entries σ1 ≥ · · · ≥ σr > 0 are positive ordered real numbers, and
are so-called singular values of A.

8.1.4 Finding the Compact SVD


To find the Compact SVD, follow the same process as the full SVD, except all matrices have r columns, where r is
the rank of A.

The compact SVD is connected to the full SVD via the following calculation.

EECS 16A, Spring 2025, Note 8 4


 
Σr 0r×(n−r) ⊤
UΣV⊤ = Ur Um−r
  
Vr Vn−r (7)
0(m−r)×r 0(m−r)×(n−r)
  ⊤ 
  Σr 0r×(n−r) Vr
= Ur Um−r (8)
0(m−r)×r 0(m−r)×(n−r) V⊤ n−r
    
  Σr ⊤ 0r×(n−r) ⊤
= Ur Um−r V + V (9)
0(m−r)×r r 0(m−r)×(n−r) n−1
 
Σr
V⊤
 
= Ur Um−r (10)
0(m−r)×r r
 Σr V⊤
 
r

= Ur Um−r (11)
0(m−r)×n
= Ur Σr V⊤
r + Um−r 0(m−r)×n (12)
= Ur Σr V⊤
r . (13)

Thus, the sub-matrices of a full SVD are a compact SVD. The compact SVD has the same properties as the full
SVD, except without the null spaces.

Theorem 8.3: Linear Algebra of the Compact SVD: Let A ∈ Rm×n have rank r ≤ min{m, n}. Let A = Ur Σr V⊤
r be
an compact form SVD of A.

(i) Col(Ur ) = Col(A).

(ii) Col(Vr ) = Col(A⊤ ).

(iii) AA⊤ Ur = Ur Σ2r .

(iv) A⊤ AVr = Vr Σ2r .

(v) AVr = Ur Σr .

Check your understanding: Prove the takeaways from the Linear Algebra of the Compact SVD theorem.

Theorem 8.4: Correctness of Construction of the Compact SVD: Let A ∈ Rm×n have rank r ≤ min{m, n}. Let
Ur , Σr , Vr := C OMPACT SVD(A) be the output of the compact SVD construction algorithm. Then A = Ur Σr V⊤
r is a
compact SVD of A.

Check your understanding: Prove the Correctness of the Construction of the Compact SVD.

8.1.5 Outer Product Form of the SVD


Now what if we wanted to break down the SVD in terms of individual vectors? Outer Product Form does just that!

EECS 16A, Spring 2025, Note 8 5


Definition 8.3 (Outer Product Form of the SVD): Let A ∈ Rm×n have rank r ≤ min{m, n}. An outer product form
of an SVD of A is a decomposition
r
A = ∑ σi ui v⊤
i (14)
i=1

where

(i) {u1 , . . . , ur } ⊆ Rm is an orthonormal set of vectors, and are so-called left singular vectors of A.

(ii) {v1 , . . . , vr } ⊆ Rn is an orthonormal set of vectors, and are so-called right singular vectors of A.

(iii) σ1 ≥ · · · ≥ σr > 0 are positive ordered scalars, and are so-called singular values of A.

While this form looks quite unassuming, it actually reveals several linear-algebraic properties of A, as we will see in
the following theorem.

Theorem 8.5: Linear Algebra of the Outer Product SVD Let A ∈ Rm×n have rank r ≤ min{m, n}. Let A =
∑ri=1 σi ui v⊤
i be an outer product form SVD of A.

(i) {u1 , . . . , ur } is an orthonormal basis for Col(A).

(ii) {v1 , . . . , vr } is an orthonormal basis for Col(A⊤ ).

(iii) For each i, 1 ≤ i ≤ r, we have (σi2 , ui ) is an eigenvalue-eigenvector pair for AA⊤ .

(iv) For each i, 1 ≤ i ≤ r, we have (σi2 , vi ) is an eigenvalue-eigenvector pair for A⊤ A.

(v) For each i, 1 ≤ i ≤ r, we have Avi = σi ui .

The proof of the Linear Algebra of the Outer Product SVD is on the longer side and may distract from the overall
flow of this note, so it is left to the appendix.

8.1.6 Finding the Outer Product Form of SVD


To find the Outer Product form of SVD, follow the same process as the full SVD, except represent all matrices as a
summation of vector multiplications. Outer Product form is easier to write and compute by hand.

Theorem 8.6: Correctness of Outer Product Form Algorithm : Let A ∈ Rm×n have rank r ≤ min{m, n}. Let
{u1 , . . . , ur }, {σ1 , . . . , σr }, {v1 , . . . , vr } := O UTER P RODUCT SVD(A) be the output of the outer product SVD con-
struction algorithm. Then A = ∑ri=1 σi ui v⊤ i is an outer product SVD of A.

The proof of the Correctness of the Outer Product Form Algorithm is on the longer side and may distract from the
overall flow of this note, so it is left to the appendix.

EECS 16A, Spring 2025, Note 8 6


The Outer Product form of SVD is connected to the Compact SVD by the following calculation.

     ⊤
σ1 σ1 v1
⊤ .. ⊤ ..   .. 
Ur Σr Vr = [u1 · · · ur ]   [v1 · · · vr ] = [u1 · · · ur ]  (15)
  
. .  . 
σr σr v⊤
r
σ1 v⊤
 
1
= [u1 · · · ur ]  ...  (16)
 

σr v⊤
r
r
= ∑ σ i ui v⊤
i . (17)
i=1

8.1.7 Comparison Between the SVD Forms


We identify weaknesses and strengths of using the SVD forms, so you can decide which one to use.

• Full SVD:

– Weaknesses: It is the most computationally intensive to compute, either by computer or by hand. It re-
quires running Gram-Schmidt twice to find Um−r and Vn−r . Also Σ is non-square and thus not invertible.
– Strengths: The matrices U and V are square orthonormal, and thus invertible. There is also a characteri-
zation of null spaces of A⊤ and A as the column spaces of Um−r and Vn−r respectively.

• Compact SVD:

– Weaknesses: The matrices Ur and Vr are not square, so they do not have an inverse. Also, there is no
characterization of null spaces, as in the full SVD.
– Strengths: The matrix Σr is square and invertible. It is also easier to construct by hand than the full SVD.

• Outer Product SVD:

– Weaknesses: The summation notation is messy and sometimes tedious to work with. Also, there is no
characterization of null spaces, as in the full SVD.
– Strengths: It is the most computationally efficient to construct by computer, and also saves the most
memory. It is also easier to construct by hand than the full SVD.

When using the SVD to compute things, one potential rule to use is to use the compact SVD unless there is a need
to analyze null spaces, in which case the full SVD is essentially required.

EECS 16A, Spring 2025, Note 8 7


8.1.8 Full SVD Intuition
In the above description, we mentioned that Null(A) = Col(Vn−r ) and Null(A⊤ ) = Col(Um−r ). Why does this make
sense?

First of all, recall that null spaces are subsets of input spaces. Thus, since A = UΣV⊤ , the input space of A must
match the dimension of the column vectors of V, so it makes sense that Null(A) is related to the column vectors of
V. Similarly A⊤ = VΣ⊤ U⊤ so Null(A⊤ ) makes sense to be related to the column vectors of U.

To explain the specific submatrices of V and U that describe the null spaces, let’s examine the outer product form of
SVD (for now, let’s just focus on the matrix A):
r
A = ∑ σi ui v⊤
i (18)
i=1

Now, let’s see what happens if we multiply one of the "extra" v j vectors (which correspond to j = r + 1, . . . , n, the
column vectors of Vn−r ) by the matrix A:
r
Av j = ∑ σi ui v⊤
i vj (19)
i=1

Since i ̸= j for all terms in this summation, v⊤


i v j = 0 (since {vi } is an orthonormal set, by construction).

Thus:
Av j = 0 (20)
for j = r + 1, . . . , n, which means that these v j are in Null(A).

By linearity, every linear combination of v j is also in Null(A). By rank-nullity theorem, since rank(A) = r and A
has n columns, dim(Null(A)) = n − r, which is exactly the number of column vectors in Vn−r ! Since we already
established that the column vectors of Vn−r are in Null(A), this implies that:

Null(A) = Col(Vn−r ) (21)

Symmetrically, the outer product SVD of A⊤ is:


r
A⊤ = ∑ σi vi u⊤
i (22)
i=1

With the same logic flow as A:


Null(A⊤ ) = Col(Um−r ) (23)

8.2 Singular Value Decomposition: Geometric Properties


Let A = UΣV⊤ be the SVD of some matrix A ∈ Rm×n .

EECS 16A, Spring 2025, Note 8 8


Recall that for any vector x and any orthogonal matrix U that ∥Ux∥ = ∥x∥. Thus we can interpret multiplication by
an orthonormal matrix as a combination of operations that don’t change length, such as rotations and reflections.

Since Σr is diagonal with entries σ1 , . . . , σr , multiplying a vector by Σ stretches the first entry of the vector by σ1 ,
the second entry by σ2 , and so on.

Combining these observations, we interpret Ax as the composition of three operations:

1. V⊤ x which rotates x without changing its length.

2. ΣV⊤ x which stretches the resulting vector along each axis with the corresponding singular value, and may
also collapse or add new dimensions,

3. UΣV⊤ x which again rotates the resulting vector without changing its length.

The following figure illustrates these three operations moving from the right to the left.

A= U Σ V⊤
e2
σ1 u1 v2
σ2 e2
σ2 u2

σ1 e1 e1
v1

Here as usual e1 , e2 are the first and second standard basis vectors.

The geometric interpretation above reveals that σ1 is the largest amplification factor a vector can experience upon
multiplication by A. More specifically, if ∥x∥ ≤ 1 then ∥Ax∥ ≤ σ1 . We achieve equality at x = v1 , because then

∥Ax∥ = UΣV⊤ v1 = ∥UΣe1 ∥ = ∥σ1 Ue1 ∥ = ∥σ1 u1 ∥ = σ1 ∥u1 ∥ = σ1 . (24)

8.3 Examples

8.3.1 Example SVD Interpretation and Application


Suppose we want to build a least squares regression model to predict whether users like a movie they’ve never seen
before, based on attributes of movies they have seen. We have an m × n matrix A of rank r that contains the ratings

EECS 16A, Spring 2025, Note 8 9


of m viewers for n movies. Write
r
A = UΣV ⊤ = ∑ σi ui v⊤
i . (25)
i=1

We can interpret each rank 1 matrix σi ui v⊤ i to be due to a particular attribute, e.g., comedy, action, sci-fi, or romance
content. Then σi determines how strongly the ratings depend on the ith attribute; the entries of v⊤ i score each movie
with respect to this attribute, and the entries of ui evaluate how much each viewer cares about this particular attribute.

Interestingly, the (r + 1)th attributes onwards don’t influence the ratings, according to our analysis. This means the
SVD can reveal which attributes actually matter in our model, allowing us to build simpler models (saving compute),
and also ensures we actually can use our model. To see why, suppose we have an attribute with a singular value of
0. The corresponding column and row in A doesn’t add any new information (since they lie beyond the rank of our
matrix), and actually makes our data matrix collapse such that we can’t perform least squares regression.

Now, let’s dive into some real numbers!

8.3.2 Numerical Example 1


Let’s find the SVD for  
4 4
A= . (26)
−3 3
We find the SVD for A⊤ first and then take the transpose. We calculate
    
⊤ ⊤ ⊤ ⊤ 4 4 4 −3 32 0
(A ) (A ) = AA = = . (27)
−3 3 4 3 0 18

This happens to be diagonal1 , so we can read off the eigenvalues:

λ1 = 32 λ2 = 18 (28)

We can select the orthonormal eigenvectors:


   
1 0
u1 = u2 = . (29)
0 1
The singular values are p √ √ p √ √
σ1 = λ1 = 32 = 4 2 σ2 = λ2 = 18 = 3 2. (30)
Then to find v1 , v2 , we do
   √ 
A⊤ u1 1 4 1/√2
v1 = = √ = (31)
σ1 4 2 4 1/ 2

   √ 
A u2 1 −3 −1/√ 2
v2 = = √ = . (32)
σ2 3 2 3 1/ 2
1 in cases where A⊤ A or AA⊤ aren’t so nice, we’d solve for the eigenvalues and eigenvectors using (λ I − A)v = 0

EECS 16A, Spring 2025, Note 8 10


Thus our SVD is   √  √ √ 
⊤ 1 0 4 2 √ 0 1/ √2 1/√2
A = UΣV = . (33)
0 1 0 3 2 −1/ 2 1/ 2

Note that we can change the signs of u1 , u2 and they are still orthonormal eigenvectors, and produce a valid SVD.
However, changing the sign of ui requires us to change the sign of vi = A⊤ ui , so therefore the product of ui v⊤
i
remains unchanged.

Another source of non-uniqueness arises when we have repeated singular values, as seen in the next example.

8.3.3 Numerical Example 2


We want to find an SVD for  
1 0
A= . (34)
0 −1
Again, we find the SVD for A⊤ and then take the transpose. Note that AA⊤ = I2 , which has repeated eigenvalues at
λ1 = λ2 = 1. In particular, any pair of orthonormal vectors is a set of orthonormal eigenvectors for I2 = AA⊤ . We
can parameterize all such pairs as    
cos(θ ) − sin(θ )
u1 = u2 = (35)
sin(θ ) cos(θ )
where θ is a free parameter. Since σ1 = σ2 = 1, we obtain

A⊤ u1 A⊤ u2
   
cos(θ ) − sin(θ )
v1 = = v2 = = . (36)
σ1 − sin(θ ) σ2 − cos(θ )

Thus an SVD is    
⊤ cos(θ ) − sin(θ ) 1 0 cos(θ ) − sin(θ )
A = UΣV = (37)
sin(θ ) cos(θ ) 0 1 − sin(θ ) − cos(θ )
for any value of θ . Thus this matrix has infinite valid SVDs, one for each value of θ in the interval [0, 2π).

8.4 Final Comments


We introduced the singular value decomposition, walked through its three forms and geometric interpretation, and
demonstrated a few hands on examples.

The SVD is a very powerful method to do efficient and expressive data analysis. It can work with any matrix and
reveal the most important features in a data matrix. We will see more applications of this flavor in the next note.

EECS 16A, Spring 2025, Note 8 11


A Proofs for SVD
A.1 Proof of Eigenvalues of A⊤ A and AA⊤

Proof. We have
(A⊤ A)⊤ = (A)⊤ (A⊤ )⊤ = A⊤ A (38)
so (A⊤ A)⊤ = A⊤ A, and thus A⊤ A is symmetric.

From note 3, we know that rank(A) = rank(A⊤ A), so we have that rank(A⊤ A) = r.

Now we show that A⊤ A has exactly r nonzero eigenvalues. Indeed,


 
dim Null(A⊤ A) = n − rank A⊤ A = n − r.

(39)

Thus A⊤ A has an (n − r)-dimensional null space, corresponding to an eigenvalue 0 with geometric multiplicity
mg ⊤ (0) = n − r. By the Spectral Theorem (Note 7), and the fact that A⊤ A is symmetric, we know that A⊤ A is diag-
A A
onalizable. We know that for a diagonalizable matrix, the geometric and algebraic multiplicities of each eigenvalue
agree, i.e., ma ⊤ (λ ) = mg ⊤ (λ ) for each eigenvalue λ of A⊤ A. So ma ⊤ (0) = n − r. Since ∑λ ma ⊤ (λ ) = n,
A A A A A A A A

this implies that A A has r nonzero eigenvalues.

We now show that all nonzero eigenvalues of A⊤ A are real and positive. Since A⊤ A is symmetric, the Spectral
Theorem says that A⊤ A has real eigenvalues. It remains to show that they are all non-negative. Let λ be a nonzero
eigenvalue. Let λ be a nonzero eigenvalue of A⊤ A with eigenvector v. Then

A⊤ Av = λ v (40)
v⊤ A⊤ Av = λ v⊤ v (41)
⟨Av, Av⟩ = λ ⟨v, v⟩ (42)
2
Av = λ ∥v∥2 . (43)

We know that λ is nonzero, and v is nonzero so ∥v∥ > 0. Hence Av is nonzero and thus positive. Thus
2
Av
λ= (44)
∥v∥2

is the quotient of positive numbers and thus positive.

Now let B := A⊤ and note that AA⊤ = (A⊤ )⊤ (A⊤ ) = B⊤ B. Thus applying the same calculation to the rank-r matrix
B = A⊤ obtains that AA⊤ is symmetric, that rank(AA⊤ ) = r, and that AA⊤ has exactly r nonzero eigenvalues, which
are real and positive.

EECS 16A, Spring 2025, Note 8 12


A.2 Proof of Correctness of Construction of the Outer Product SVD
Proof of Correctness of Construction of Outer Product SVD, 8.6. We first claim that σ1 ≥ · · · ≥ σr > 0 are real. In-
deed, by Section 8.1.3, the top r eigenvalues λ1 ≥ · · · ≥ λr > 0 of A⊤ A are real and positive, so their square roots
σ1 ≥ · · · ≥ σr > 0 are real and positive.

Now we claim that{v1 , . . . , vr } is an orthonormal set. This is immediately true by the Spectral Theorem, where the
eigenvectors of A⊤ A produce an ordered set of {v1 , . . . , vn } , so we get that {v1 , . . . , vr } is an orthonormal set.

Now we claim that {u1 , . . . , ur } is an orthonormal set. Let 1 ≤ i < j ≤ r. Then

1 1
⟨ui , u j ⟩ = ⟨ Avi , Av j ⟩ (45)
σi σj
1 ⊤ ⊤
= v A Av j (46)
σi σ j i
1 2 ⊤
= σ v vj (47)
σi σ j i i
=0 (48)

by orthonormality of the vi .

Now let 1 ≤ i ≤ r. Then


2
∥ui ∥2 = Avi (49)
1 2
= 2 Avi (50)
σi
1
= 2 ⟨Avi , Avi ⟩ (51)
σi
1
= 2 σi2 v⊤
i vi (52)
σi
= ∥vi ∥2 (53)
=1 (54)

Thus {u1 , . . . , ur } is an orthonormal set.

Finally, we claim that A = ∑ri=1 σi ui v⊤ n


i . This equality holds if and only if, for every x ∈ R , we have Ax =
∑ri=1 σi ui v⊤

i x. We show the latter condition.

Since {v1 , . . . , vn } is an orthonormal basis of Rn , we may let x = ∑ni=1 αi vi for some constants α1 , . . . , αn ∈ R. The
eigenvectors corresponding to the 0 eigenvalues of A⊤ A — that is, vr+1 , . . . , vn — are an orthonormal basis for
Null(A⊤ A). We know that Null(A⊤ A) = Null(A), so {vr+1 , . . . , vn } is an orthonormal basis for Null(A). Thus

EECS 16A, Spring 2025, Note 8 13


Avr+1 = · · · = Avn = 0m . With this, we can compute
! ! !
r r n
∑ σi ui v⊤
i x= ∑ σi ui v⊤
i ∑ αi vi (55)
i=1 i=1 i=1
r n
= ∑ ∑ σi α j ui v⊤
i vj (56)
i=1 j=1
r n
= ∑ ∑ σi α j ⟨v j , vi ⟩ui (57)
i=1 j=1
r
= ∑ αi σi ui (58)
i=1
r
= ∑ αi Avi (59)
i=1
n
= ∑ αi Avi (60)
i=1
!
n
=A ∑ αi vi (61)
i=1

= Ax. (62)

Thus A = ∑ri=1 σi ui v⊤
i as desired.

A.3 Proof of the Linear Algebra of the Full SVD


Proof of the Linear Algebra of the Full SVD, Theorem 8.1.

(i) Follows from the equivalence of outer product and full SVDs: A = UΣV ⊤ = ∑ri=1 σi ui v⊤
i .

(ii) Follows from the equivalence of outer product and full SVDs: A = UΣV ⊤ = ∑ri=1 σi ui v⊤
i .

(iii) Since {u1 , . . . , um } is the orthonormal set of columns of U, we have that {u1 , . . . , ur } and {ur+1 , . . . , um }
are orthonormal bases for orthogonal subspaces. Since Span(u1 , . . . , ur ) = Col(A), we are left to show that
Null(A⊤ ) is exactly the set of vectors orthogonal to Col(A). However we know from note 3 that this is true.

(iv) Follows from applying (iii) to the SVD of A⊤ , i.e., A⊤ = V ΣU ⊤ .

(v) We have

AA⊤ = (UΣV ⊤ )(UΣV ⊤ )⊤ (63)


⊤ ⊤
= UΣV V ΣU (64)
= UΣΣU ⊤ . (65)

EECS 16A, Spring 2025, Note 8 14


(vi) We have

A⊤ A = (UΣV ⊤ )⊤ (UΣV ⊤ ) (66)


⊤ ⊤ ⊤
= V Σ U UΣV (67)
= V Σ⊤ ΣV ⊤ . (68)

(vii) Follows from the equivalence of compact and full SVDs: A = UΣV ⊤ = ∑ri=1 σi ui v⊤
i .

A.4 Proof of Correctness of Construction of Full SVD


Proof of Correctness of Construction of Full SVD, 8.2. The only thing that has not already been shown from the
proof of correctness of the outer product SVD algorithm is that U and V are square orthonormal matrices. But this
is straightforward from Gram-Schmidt.

EECS 16A, Spring 2025, Note 8 15

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy