1.2.7 Singular Value Decomposition: Mathematical Background 39
1.2.7 Singular Value Decomposition: Mathematical Background 39
Besides the one given by (1.153), the SVD of A ∈ Rm×n with rank r has many
other forms as follows:
Un Σn VT , if m ≥ n
A= T (1.158)
UΣm Vm , otherwise
Xr
= Ur Σr VrT = σi ui viT (1.159)
i=1
n m
where Σn = Diag(σ1 , . . . , σr , 0, . . . , 0) ∈ S+ , Σm ∈ S+ , Un ∈ Rm×n and Vm ∈
Rn×m are semi-unitary, and (1.159) is the thin SVD of A (i.e., sum of r rank-
1 matrices ui viT weighted by the associated singular value σi ). Moreover, it is
noticeable from (1.159) and (1.155) that
n R(A) = R(U )
( Av = σ u ∈ R(A), i = 1, . . . , r =⇒ r
i i i
N (AT ) = R(U′ )
n R(AT ) = R(V ) (1.160)
r
AT ui = σi vi ∈ R(AT ), i = 1, . . . , r =⇒ ′
N (A) = R(V ).
The thin SVD above is computationally more economical than the full SVD.
For instance, the pseudoinverse of a matrix A ∈ Rm×n with rank r is defined as
Xr
1
A† , Vr Σ−1 T
r Ur = vr uTr ∈ Rn×m . (1.161)
i=1
σr
Moreover, when A ∈ Sn , its EVD and SVD are closely related to each other.
This can be simply inferred as follows:
r
X
A= λi qi qTi (EVD of A ∈ Sn )
i=1
r
X T
= |λi |qi sgn(λi )qi ∈ Sn (SVD of A) (1.166)
i=1
implying that σi = |λi | > 0 and the associated singular vector pair (ui , vi ) =
(qi , sgn(λi )qi ) for all i ∈ {1, . . . , r}. Hence, when A ∈ Sn+ , its SVD and EVD are
identical and so λi (A) = σi (A); moreover, A† ∈ Sn+ and R(A† ) = R(A) due to
Pr 1 T
† i=1 λi qi qi , if r < n
A = (by (1.161)) (1.167)
A−1 , otherwise.
(b) The pseudoinverse A† and the projection matrix PAT are as follows:
7 −4 10 −1 3
1 −4 7 , PAT = A† A = 1 −1 10 3 .
A† = Vr Σ−1 T
r Ur =
11 11
1 1 3 3 2
Note that Tr(PAT ) = rank(A) = 2.
(c) Let S be the solution set of Ax = b. Then it is nonempty due to b ∈
R(A) = R2 and given by
S = {A† b + αv | α ∈ R} ⊂ R3 (cf. (1.170))
where v is given by (1.168). Note that S is an affine set (a convex set to be
introduced in the next chapter), and it is also a subspace only when b = 02 .
The SVD has been widely used to solve a set of linear equations characterized
by the matrix A ∈ Rm×n as follows:
Ax = b. (1.169)
If (1.169) is solvable (i.e., b ∈ R(A)), there exist a variety of methods for
solving it, such as the widely used Gaussian elimination method, which actu-
ally finds an m × m invertible matrix E through row eliminations such that
EAx = rref(A)x = Eb (cf. Example 1.1), thus readily yielding the solutions for
x. Provided that rank(A) = r, the matrix rref(A) has r pivot rows (which also
constitute a basis for R(AT )) and r pivot columns (with the unit pivot being
the only nonzero entry in each pivot column) together with the last m − r zero
rows [Str19]. Note that the r columns in A that lead to the r pivot columns in
rref(A) also form a basis of R(A) (cf. Example 1.2).
b , exists only when b ∈ R(A) and it is
The solution of (1.169), denoted as x
given by
b = A† b + v, v ∈ N (A)
x (1.170)
which is unique only when N (A) = {0n } (i.e., A must be of full column rank).
Note that A† b ∈ R(AT ) due to (1.161) and (1.160), and thus vT A† b = 0. Oth-
erwise, the solution of (1.169) does not exist since
x = AA† b = PA b 6= b if b ∈
Ab / R(A) (cf. (1.103) and Remark 1.13). (1.171)
Suppose that A ∈ Rm×n with rank(A) = r and the thin SVD of A is given by
(1.159). Let Xℓ ∈ Rm×n with rank(X) ≤ ℓ denote the optimal low rank approx-
imation to the matrix A ∈ Rm×n by minimizing kX − Ak2F , which has been
widely known as
ℓ
X
2
Xℓ = arg min X−A F
= σi ui viT (1.172)
rank(X)≤ℓ
i=1
Mathematical Background 43
(which will also be proven via the use of a convex optimization condition and
EVD (cf. (4.61), (4.62) and Remark 4.9 in Chapter 4) and the associated approx-
imation error is given by
r
X
2
ρℓ = Xℓ − A F
= σi 2 , (1.173)
i=ℓ+1
Remark 1.19 (Hermitian and Unitary matrices) Complex inner product of two
vectors u, v ∈ Cn is defined as uH v. They are orthogonal if uH v = 0. For any
matrix X = XH ∈ Hn with eigenpairs (λi ∈ R, qi ∈ Cn ), i = 1, . . . , n (where λi ∈
R is due to (qH H H H 2
i Xqi ) = qi Xqi = λi qi qi = λi kqi k2 ∈ R), its EVD is given by
X = QΛQH , (1.174)
where Q = [q1 , . . . , qn ] ∈ Cn×n is a unitary matrix for which
QH Q = QQH = In , (1.175)
and Λ = Diag(λ1 , . . . , λn ). Note that, for any unitary matrix Q,
kQzk2 = kzk2 ∀ z ∈ Cn (1.176)
due to zH QH Qz = zH z = kzk22 .
For a complex matrix A ∈ Cm×n with rank(A) = r and singular values
{σ1 , . . . , σr } ⊂ R++ , its SVD is given by
A = UΣVH = Ur Σr VrH , (1.177)
where both U ∈ Cm×m and V ∈ Cn×n are unitary matrices, and Ur and Vr are
matrices containing the first r columns of U and V, respectively, and matrices
Σ and Σr are also given by (1.156) and (1.157), respectively.
where A ∈ Rm×n is the given system matrix, b is the given data vector, and ǫ ∈
Rm is the measurement noise vector. The LS problem is to find an optimal x ∈ Rn
by minimizing kAx − bk22 . The LS solution, denoted as xLS , that minimizes
kAx − bk22 , is known as
n o
2
xLS , arg minn Ax − b 2 = A† b + v, v ∈ N (A) (1.179)
x∈R
Example 1.6 (LS-approximation) Provided that the system matrix and the
data vector in the linear model (1.178) are given by
2 1
A = 1 2
1 1
and b = 13 = (1, 1, 1), respectively, find the LS solution xLS and the associated
LS error ρLS = kAxLS − bk22 .
Solution: Because the matrix A is of full column rank, and the transpose of the
4
matrix A in Example 1.5, and b ∈/ R(A) since PA b = PAT 13 = 11 (3, 3, 2) 6= 13 ,
one can directly obtain
† T † † T 1 7 −4 1 † 1 4
A = (A ) = (A ) = =⇒ xLS = A 13 =
11 −4 7 1 11 4
1 1 −3
1 2 1
P⊥A = I3 − P AT = 1 1 −3 =⇒ ρLS = P⊥ T
A b 2 = k13 vk2 =
2
11 11
−3 −3 9
√
where v = [−1, −1, 3]/ 11 ∈ N (A) = R(AT )⊥ = R(A)⊥ by (1.168). Note that
Tr(P⊥A ) = 1.
In other words, AxLS is the image vector of b projected onto the range space
R(A), and so the LS error (i.e., projection error) vector ǫLS ∈ R(A)⊥ = N (AT )
is nonzero if b ∈
/ R(A), as the instance presented in Example 1.6.
For the under-determined case (m < n), suppose that A is of full row rank
(i.e., rank(A) = m < n and dim(N (A)) = n − m > 0), and thus
−1 PA = AA† = Im (i.e., R(A) = Rm )
A† = AT AAT =⇒ −1 (1.183)
PAT = A† A = AT AAT A.
It can be observed that the expression A† for the overdetermined case (cf.
(1.181)) and that for the underdetermined case (cf. (1.183)) are related to each
other in form; the matrix inverse operatior (·)−1 applies to the first (last) two
elements of the finite sequence {AT , A, AT } for the former (latter). Then the
optimal xLS given by (1.179) is not unique, and
−1
xLS = A† b = AT AAT b ∈ R(AT ) (1.184)
is called the minimum-norm solution due to kxLS k2 = (kxLS k22 + kvk22 )1/2 >
kxLS k2 for any v ∈ N (A) \ {0n }. Note that AxLS = b ∈ R(A) = Rm due to
ρLS = 0 for this case, implying that every xLS is also a solution of the linear
equations Ax = b, as the instance presented in part (c) in Example 1.5. There-
fore, the closed-form minimum-norm solution xLS given by (1.184) can also be
expressed as
n o
2
xLS = arg minn x 2 | Ax = b , (1.185)
x∈R
that is exactly the convex optimization problem in Example 9.1 to which the
optimal solution xLS can be obtained by solving the associated KKT conditions.
However, if the system model is not linear, closed-form LS solutions usually do
not exist.