Note 8
Note 8
Processing
Spring 2025 Note 8
Enter, the Singular Value Decomposition, or SVD. Just like how QR "factors" a matrix into components that we
can easily manipulate, SVD does the same thing, but more: the SVD can also give insight into the structure of our
data. We use the SVD to invert non-square matrices, pare down large datasets, and pick out important features in
machine learning. For those of you interested in Data Science, SVD is one of the most effective algorithms for
building recommendation systems (think Netflix movies, YouTube videos, and Spotify songs), and if you know a bit
of Python, you can build your own recommendation systems!
A = UΣV⊤ (1)
where
(i) U = Ur Um−r ∈ Rm×m is a square orthonormal matrix whose columns are the so-called left singular vectors
of A; here Ur ∈ Rm×r and Um−r ∈ Rm×(m−r) are tall matrices with orthonormal columns.
(ii) V = Vr Vn−r ∈ Rn×n is a square orthonormal matrix whose columns are the so-called right singular vectors
of A; here Vr ∈ Rn×r and Vn−r ∈ Rn×(n−r) are tall matrices with orthonormal columns.
Σr 0r×(n−r)
(iii) Σ = ∈ Rm×n is a non-square diagonal matrix whose diagonal entries σ1 ≥ · · · ≥ σr >
0(m−r)×r 0(m−r)×(n−r)
σr+1 = · · · = σmin{m,n} = 0 are non-negative ordered real numbers, and are the so-called singular values of A.
Theorem 8.1: Linear Algebra of the Full SVD: Let A ∈ Rm×n have rank r ≤ min{m, n}. Let A = UΣV⊤ be an SVD
of A.
(vii) AVr = Ur Σr .
The proof of the Linear Algebra of the Full SVD is on the longer side and may distract from the overall flow of this
note, so it is left to the appendix.
What do these properties mean? The first property takes the first r columns of the U matrix and claims they span the
same vector space as the columns of A. What that means is we can boil down the column space of A to Ur , where
Ur is much easier to work with since the columns of this matrix are orthonormal. You can draw a similar conclusion
about property (ii) with respect to the row space of A.
Where things get confusing is properties (iii) and (iv). Why are the columns of Ur related to the columns of Vn−r ,
even though Vn−r is related to the row space of A? By the rank nullity theorem, dim(Col(A)) + dim(Null(A)) = n, the
number of columns. Since Col(A) ≡ Col(Ur ), and Null(A) ≡ Col(Vn−r ), the dimensions work out. Mathematically,
we show that:
A = UΣV⊤ (2)
AV = UΣV⊤ V (3)
AV = UΣ (4)
Next, we consider V as a block matrix of Vr and Vn−r . Looking only at the last n − r columns of V, that means we
only look at the last n − r singular values of Σ, which are 0.
Avi = 0 ∀ i ∈ [r + 1, n] (5)
But conceptually, this is quite difficult to grapple with, and is discussed in more detail in the Full SVD Intuition
section.
Properties (v) and (vi) become relevant when talking about inverting non-square matrices, which will be covered in
the next note!
Now that we know how useful the SVD is, how can we factorize an arbitrary matrix A into its SVD form?
1. Compute the symmetric matrix: First, find either the matrix A⊤ A (Case I) or AA⊤ (Case II). Choose whichever
product gives you a smaller matrix.
3. Construct V or U:
4. Construct Σ: Take the square root of each eigenvalue, and place them along the diagonal of your Σ matrix in
descending order. Fill the remaining entries with zeros.
5. Construct U or V :
1
• (Case I) Each column vector of U is given by ui = σi Avi
• (Case II) Each column vector of V is given by vi = σ1i A⊤ ui
6. Write out the full SVD: Take the transpose of your V matrix, then write A = UΣV ⊤ . Make sure to order
the columns of your eigenvectors in U and V to match with their eigenvalues. For V , since you eventually
transpose it, notice the first singular value maps to the first row of V ⊤ , and so on.
Theorem 8.2: Correctness of Construction of Full SVD: Let A ∈ Rm×n have rank r ≤ min{m, n}. Let U, Σ, V :=
F ULL SVD(A) be the output of the full SVD construction algorithm. Then A = UΣV⊤ is a full SVD of A.
The proof of the Correctness of Construction of Outer Product Form of SVD is on the longer side and may distract
from the overall flow of this note, so it is left to the appendix.
Let A ∈ Rm×n have rank r ≤ min{m, n}. Then A⊤ A ∈ Rn×n and AA⊤ ∈ Rm×m are symmetric matrices of rank r. Each
has exactly r nonzero eigenvalues, which are real and positive.
A⊤ A and AA⊤ generate eigenvalues that we can turn into our Σ. The corresponding eigenvectors are orthogonal,
which we leveraged to generate U and V ⊤ !
Definition 8.2 (Compact SVD): Let A ∈ Rm×n have rank r ≤ min{m, n}. A compact SVD of A is a decomposition
A = Ur Σr V⊤
r (6)
where
(i) Ur ∈ Rm×r is a matrix with orthonormal columns, which are so-called left singular vectors of A.
(ii) Vr ∈ Rn×r is a matrix with orthonormal columns, which are so-called right singular vectors of A.
(iii) Σr ∈ Rr×r is a diagonal matrix whose diagonal entries σ1 ≥ · · · ≥ σr > 0 are positive ordered real numbers, and
are so-called singular values of A.
The compact SVD is connected to the full SVD via the following calculation.
Thus, the sub-matrices of a full SVD are a compact SVD. The compact SVD has the same properties as the full
SVD, except without the null spaces.
Theorem 8.3: Linear Algebra of the Compact SVD: Let A ∈ Rm×n have rank r ≤ min{m, n}. Let A = Ur Σr V⊤
r be
an compact form SVD of A.
(v) AVr = Ur Σr .
Check your understanding: Prove the takeaways from the Linear Algebra of the Compact SVD theorem.
Theorem 8.4: Correctness of Construction of the Compact SVD: Let A ∈ Rm×n have rank r ≤ min{m, n}. Let
Ur , Σr , Vr := C OMPACT SVD(A) be the output of the compact SVD construction algorithm. Then A = Ur Σr V⊤
r is a
compact SVD of A.
Check your understanding: Prove the Correctness of the Construction of the Compact SVD.
where
(i) {u1 , . . . , ur } ⊆ Rm is an orthonormal set of vectors, and are so-called left singular vectors of A.
(ii) {v1 , . . . , vr } ⊆ Rn is an orthonormal set of vectors, and are so-called right singular vectors of A.
(iii) σ1 ≥ · · · ≥ σr > 0 are positive ordered scalars, and are so-called singular values of A.
While this form looks quite unassuming, it actually reveals several linear-algebraic properties of A, as we will see in
the following theorem.
Theorem 8.5: Linear Algebra of the Outer Product SVD Let A ∈ Rm×n have rank r ≤ min{m, n}. Let A =
∑ri=1 σi ui v⊤
i be an outer product form SVD of A.
The proof of the Linear Algebra of the Outer Product SVD is on the longer side and may distract from the overall
flow of this note, so it is left to the appendix.
Theorem 8.6: Correctness of Outer Product Form Algorithm : Let A ∈ Rm×n have rank r ≤ min{m, n}. Let
{u1 , . . . , ur }, {σ1 , . . . , σr }, {v1 , . . . , vr } := O UTER P RODUCT SVD(A) be the output of the outer product SVD con-
struction algorithm. Then A = ∑ri=1 σi ui v⊤ i is an outer product SVD of A.
The proof of the Correctness of the Outer Product Form Algorithm is on the longer side and may distract from the
overall flow of this note, so it is left to the appendix.
⊤
σ1 σ1 v1
⊤ .. ⊤ .. ..
Ur Σr Vr = [u1 · · · ur ] [v1 · · · vr ] = [u1 · · · ur ] (15)
. . .
σr σr v⊤
r
σ1 v⊤
1
= [u1 · · · ur ] ... (16)
σr v⊤
r
r
= ∑ σ i ui v⊤
i . (17)
i=1
• Full SVD:
– Weaknesses: It is the most computationally intensive to compute, either by computer or by hand. It re-
quires running Gram-Schmidt twice to find Um−r and Vn−r . Also Σ is non-square and thus not invertible.
– Strengths: The matrices U and V are square orthonormal, and thus invertible. There is also a characteri-
zation of null spaces of A⊤ and A as the column spaces of Um−r and Vn−r respectively.
• Compact SVD:
– Weaknesses: The matrices Ur and Vr are not square, so they do not have an inverse. Also, there is no
characterization of null spaces, as in the full SVD.
– Strengths: The matrix Σr is square and invertible. It is also easier to construct by hand than the full SVD.
– Weaknesses: The summation notation is messy and sometimes tedious to work with. Also, there is no
characterization of null spaces, as in the full SVD.
– Strengths: It is the most computationally efficient to construct by computer, and also saves the most
memory. It is also easier to construct by hand than the full SVD.
When using the SVD to compute things, one potential rule to use is to use the compact SVD unless there is a need
to analyze null spaces, in which case the full SVD is essentially required.
First of all, recall that null spaces are subsets of input spaces. Thus, since A = UΣV⊤ , the input space of A must
match the dimension of the column vectors of V, so it makes sense that Null(A) is related to the column vectors of
V. Similarly A⊤ = VΣ⊤ U⊤ so Null(A⊤ ) makes sense to be related to the column vectors of U.
To explain the specific submatrices of V and U that describe the null spaces, let’s examine the outer product form of
SVD (for now, let’s just focus on the matrix A):
r
A = ∑ σi ui v⊤
i (18)
i=1
Now, let’s see what happens if we multiply one of the "extra" v j vectors (which correspond to j = r + 1, . . . , n, the
column vectors of Vn−r ) by the matrix A:
r
Av j = ∑ σi ui v⊤
i vj (19)
i=1
Thus:
Av j = 0 (20)
for j = r + 1, . . . , n, which means that these v j are in Null(A).
By linearity, every linear combination of v j is also in Null(A). By rank-nullity theorem, since rank(A) = r and A
has n columns, dim(Null(A)) = n − r, which is exactly the number of column vectors in Vn−r ! Since we already
established that the column vectors of Vn−r are in Null(A), this implies that:
Since Σr is diagonal with entries σ1 , . . . , σr , multiplying a vector by Σ stretches the first entry of the vector by σ1 ,
the second entry by σ2 , and so on.
2. ΣV⊤ x which stretches the resulting vector along each axis with the corresponding singular value, and may
also collapse or add new dimensions,
3. UΣV⊤ x which again rotates the resulting vector without changing its length.
The following figure illustrates these three operations moving from the right to the left.
A= U Σ V⊤
e2
σ1 u1 v2
σ2 e2
σ2 u2
σ1 e1 e1
v1
Here as usual e1 , e2 are the first and second standard basis vectors.
The geometric interpretation above reveals that σ1 is the largest amplification factor a vector can experience upon
multiplication by A. More specifically, if ∥x∥ ≤ 1 then ∥Ax∥ ≤ σ1 . We achieve equality at x = v1 , because then
8.3 Examples
We can interpret each rank 1 matrix σi ui v⊤ i to be due to a particular attribute, e.g., comedy, action, sci-fi, or romance
content. Then σi determines how strongly the ratings depend on the ith attribute; the entries of v⊤ i score each movie
with respect to this attribute, and the entries of ui evaluate how much each viewer cares about this particular attribute.
Interestingly, the (r + 1)th attributes onwards don’t influence the ratings, according to our analysis. This means the
SVD can reveal which attributes actually matter in our model, allowing us to build simpler models (saving compute),
and also ensures we actually can use our model. To see why, suppose we have an attribute with a singular value of
0. The corresponding column and row in A doesn’t add any new information (since they lie beyond the rank of our
matrix), and actually makes our data matrix collapse such that we can’t perform least squares regression.
λ1 = 32 λ2 = 18 (28)
Note that we can change the signs of u1 , u2 and they are still orthonormal eigenvectors, and produce a valid SVD.
However, changing the sign of ui requires us to change the sign of vi = A⊤ ui , so therefore the product of ui v⊤
i
remains unchanged.
Another source of non-uniqueness arises when we have repeated singular values, as seen in the next example.
A⊤ u1 A⊤ u2
cos(θ ) − sin(θ )
v1 = = v2 = = . (36)
σ1 − sin(θ ) σ2 − cos(θ )
Thus an SVD is
⊤ cos(θ ) − sin(θ ) 1 0 cos(θ ) − sin(θ )
A = UΣV = (37)
sin(θ ) cos(θ ) 0 1 − sin(θ ) − cos(θ )
for any value of θ . Thus this matrix has infinite valid SVDs, one for each value of θ in the interval [0, 2π).
The SVD is a very powerful method to do efficient and expressive data analysis. It can work with any matrix and
reveal the most important features in a data matrix. We will see more applications of this flavor in the next note.
Proof. We have
(A⊤ A)⊤ = (A)⊤ (A⊤ )⊤ = A⊤ A (38)
so (A⊤ A)⊤ = A⊤ A, and thus A⊤ A is symmetric.
From note 3, we know that rank(A) = rank(A⊤ A), so we have that rank(A⊤ A) = r.
Thus A⊤ A has an (n − r)-dimensional null space, corresponding to an eigenvalue 0 with geometric multiplicity
mg ⊤ (0) = n − r. By the Spectral Theorem (Note 7), and the fact that A⊤ A is symmetric, we know that A⊤ A is diag-
A A
onalizable. We know that for a diagonalizable matrix, the geometric and algebraic multiplicities of each eigenvalue
agree, i.e., ma ⊤ (λ ) = mg ⊤ (λ ) for each eigenvalue λ of A⊤ A. So ma ⊤ (0) = n − r. Since ∑λ ma ⊤ (λ ) = n,
A A A A A A A A
⊤
this implies that A A has r nonzero eigenvalues.
We now show that all nonzero eigenvalues of A⊤ A are real and positive. Since A⊤ A is symmetric, the Spectral
Theorem says that A⊤ A has real eigenvalues. It remains to show that they are all non-negative. Let λ be a nonzero
eigenvalue. Let λ be a nonzero eigenvalue of A⊤ A with eigenvector v. Then
A⊤ Av = λ v (40)
v⊤ A⊤ Av = λ v⊤ v (41)
⟨Av, Av⟩ = λ ⟨v, v⟩ (42)
2
Av = λ ∥v∥2 . (43)
We know that λ is nonzero, and v is nonzero so ∥v∥ > 0. Hence Av is nonzero and thus positive. Thus
2
Av
λ= (44)
∥v∥2
Now let B := A⊤ and note that AA⊤ = (A⊤ )⊤ (A⊤ ) = B⊤ B. Thus applying the same calculation to the rank-r matrix
B = A⊤ obtains that AA⊤ is symmetric, that rank(AA⊤ ) = r, and that AA⊤ has exactly r nonzero eigenvalues, which
are real and positive.
Now we claim that{v1 , . . . , vr } is an orthonormal set. This is immediately true by the Spectral Theorem, where the
eigenvectors of A⊤ A produce an ordered set of {v1 , . . . , vn } , so we get that {v1 , . . . , vr } is an orthonormal set.
1 1
⟨ui , u j ⟩ = ⟨ Avi , Av j ⟩ (45)
σi σj
1 ⊤ ⊤
= v A Av j (46)
σi σ j i
1 2 ⊤
= σ v vj (47)
σi σ j i i
=0 (48)
by orthonormality of the vi .
Since {v1 , . . . , vn } is an orthonormal basis of Rn , we may let x = ∑ni=1 αi vi for some constants α1 , . . . , αn ∈ R. The
eigenvectors corresponding to the 0 eigenvalues of A⊤ A — that is, vr+1 , . . . , vn — are an orthonormal basis for
Null(A⊤ A). We know that Null(A⊤ A) = Null(A), so {vr+1 , . . . , vn } is an orthonormal basis for Null(A). Thus
= Ax. (62)
Thus A = ∑ri=1 σi ui v⊤
i as desired.
(i) Follows from the equivalence of outer product and full SVDs: A = UΣV ⊤ = ∑ri=1 σi ui v⊤
i .
(ii) Follows from the equivalence of outer product and full SVDs: A = UΣV ⊤ = ∑ri=1 σi ui v⊤
i .
(iii) Since {u1 , . . . , um } is the orthonormal set of columns of U, we have that {u1 , . . . , ur } and {ur+1 , . . . , um }
are orthonormal bases for orthogonal subspaces. Since Span(u1 , . . . , ur ) = Col(A), we are left to show that
Null(A⊤ ) is exactly the set of vectors orthogonal to Col(A). However we know from note 3 that this is true.
(v) We have
(vii) Follows from the equivalence of compact and full SVDs: A = UΣV ⊤ = ∑ri=1 σi ui v⊤
i .