0% found this document useful (0 votes)
8 views7 pages

1.2.7 Singular Value Decomposition: Mathematical Background 39

Uploaded by

a0909665916
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views7 pages

1.2.7 Singular Value Decomposition: Mathematical Background 39

Uploaded by

a0909665916
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Mathematical Background 39

So A = QΛQT is a PD matrix, where Q = [q1 , q2 , q3 ] is an orthogonal matrix,


and Λ = Diag(λ1 , λ2 , λ3 ) = Diag(8, 2, 2). Hence,
 
8 2 2
1
B = QΛ1/2 QT = √  2 8 2 
3 2
2 2 8
√ √ √ √ √ √
where Λ1/2 = Diag( λ1 , λ2 , λ3 ) = Diag(2 2, 2, 2). 

1.2.7 Singular value decomposition


The singular value decomposition (SVD) is an important factorization of a rect-
angular real or complex matrix, with many applications in signal processing and
communications. Applications which employ the SVD include computation of
pseudoinverse, least-squares fitting of data, matrix approximation, and determi-
nation of rank, range space, null space of a matrix, so on and so forth.
Let A ∈ Rm×n with rank(A) = r. The SVD of A is expressed in the form
A = UΣVT . (1.153)
In the full SVD (1.153) for A,
U = [Ur , U′ ] ∈ Rm×m
(1.154)
V = [Vr , V′ ] ∈ Rn×n
are orthogonal matrices in which both Ur = [u1 , . . . , ur ] (consisting of r left
singular vectors) and Vr = [v1 , . . . , vr ] (consisting of r right singular vectors)
are semi-unitary matrices (cf. (1.105)) due to
UTr Ur = VrT Vr = Ir , (1.155)
and the singular value matrix of A is given by
 
Σr 0r×(n−r)
Σ = ∈ Rm×n (1.156)
0(m−r)×r 0(m−r)×(n−r)
where
Σr = Diag(σ1 , . . . , σr ) ∈ Sr++ (1.157)
is a diagonal matrix containing r positive singular values (supposedly arranged
in nonincreasing order), denoted as σi . Note that Σr is invertible though Σ is
not a square matrix when m 6= n, and that R(Ur ) and R(U′ ) are orthogonal
complements in Rm , so are R(Vr ) and R(V′ ) in Rn .
40 Chapter 1. Mathematical Background

Besides the one given by (1.153), the SVD of A ∈ Rm×n with rank r has many
other forms as follows:

Un Σn VT , if m ≥ n
A= T (1.158)
UΣm Vm , otherwise
Xr
= Ur Σr VrT = σi ui viT (1.159)
i=1
n m
where Σn = Diag(σ1 , . . . , σr , 0, . . . , 0) ∈ S+ , Σm ∈ S+ , Un ∈ Rm×n and Vm ∈
Rn×m are semi-unitary, and (1.159) is the thin SVD of A (i.e., sum of r rank-
1 matrices ui viT weighted by the associated singular value σi ). Moreover, it is
noticeable from (1.159) and (1.155) that
n R(A) = R(U )
( Av = σ u ∈ R(A), i = 1, . . . , r =⇒ r
i i i
N (AT ) = R(U′ )
n R(AT ) = R(V ) (1.160)
r
AT ui = σi vi ∈ R(AT ), i = 1, . . . , r =⇒ ′
N (A) = R(V ).

The thin SVD above is computationally more economical than the full SVD.
For instance, the pseudoinverse of a matrix A ∈ Rm×n with rank r is defined as
Xr
1
A† , Vr Σ−1 T
r Ur = vr uTr ∈ Rn×m . (1.161)
i=1
σr

Thus (A† )T = (AT )† holds true, and



AA† = Ur UTr = PUr = PA
(by (1.100) and (1.160)) (1.162)
A† A = Vr VrT = PVr = PAT
are also orthogonal projectors onto R(A) and R(AT ) (cf. (1.160)), respectively.
It can be easily seen from (1.160) that
 q q
AAT ui = σi2 ui T A) =
=⇒ σ i (A) = λi (A λi (AAT ) (1.163)
AT Avi = σi2 vi
provided that both the singular values σi and the eigenvalues λi are in nonin-
creasing order. Meanwhile, (1.163) also implies that for σi (A) > 0, the ith right
singular vector of A and the ith eigenvector of AT A are also identical, and so
are the ith left singular vector of A and the ith eigenvector of AAT . It can be
seen, from (1.163), (1.136) and (1.134), that
rank(AT A) = rank(AAT ) = rank(A) (1.164)
rank(A)
2
X
A F
= Tr(AAT ) = σi2 (A). (1.165)
i=1
Mathematical Background 41

Moreover, when A ∈ Sn , its EVD and SVD are closely related to each other.
This can be simply inferred as follows:
r
X
A= λi qi qTi (EVD of A ∈ Sn )
i=1
r
X T
= |λi |qi sgn(λi )qi ∈ Sn (SVD of A) (1.166)
i=1

implying that σi = |λi | > 0 and the associated singular vector pair (ui , vi ) =
(qi , sgn(λi )qi ) for all i ∈ {1, . . . , r}. Hence, when A ∈ Sn+ , its SVD and EVD are
identical and so λi (A) = σi (A); moreover, A† ∈ Sn+ and R(A† ) = R(A) due to
 Pr 1 T
† i=1 λi qi qi , if r < n
A = (by (1.161)) (1.167)
A−1 , otherwise.

Example 1.5 (SVD) Let


 
2 1 1
A= .
1 2 1
(a) Represent A in the SVD form, (b) find the pseudoinverse A† and the pro-
jection matrix PAT , and (c) find the solutions to the linear equations Ax = b,
where b ∈ R2 .
Solution: (a) One can obtain the two eigenpairs for AAT as

(λ1 = 11, u1 = [1, 1]T / 2)

(λ2 = 1, u2 = [1, −1]T / 2),
implying that r = rank(A) = 2, √ the semi-unitary matrix Ur = [u1 , u2 ], the
√ √
two singular values σ1 = λ1 = 11 and σ2 = λ2 = 1 together with Σr =
Diag(σ1 , σ2 ). Then we can obtan
   
3 1
1 1
v1 = AT u1 /σ1 = √  3  , v2 = AT u2 /σ2 = √  −1 
22 2
2 0
and the semi-unitary matrix Vr = [v1 , v2 ]. Hence, the thin SVD of A is given
by
A = Ur Σr VrT
 √ √ √  √ √ √ 
1/√2 1/ √ 2 11 0 3/ √22 3/ 22
√ 2/ 22
= .
1/ 2 − 1/ 2 0 1 1/ 2 − 1/ 2 0
Alternatively, A = UΣVT where U = Ur , Σ = [Σr 02 ] ∈ R2×3 , and V = [Vr v]
in which

v = [−1, −1, 3]/ 11 ∈ N (A) (by (1.160)). (1.168)
42 Chapter 1. Mathematical Background

(b) The pseudoinverse A† and the projection matrix PAT are as follows:
   
7 −4 10 −1 3
1  −4 7  , PAT = A† A = 1  −1 10 3  .
A† = Vr Σ−1 T
r Ur =
11 11
1 1 3 3 2
Note that Tr(PAT ) = rank(A) = 2.
(c) Let S be the solution set of Ax = b. Then it is nonempty due to b ∈
R(A) = R2 and given by
S = {A† b + αv | α ∈ R} ⊂ R3 (cf. (1.170))
where v is given by (1.168). Note that S is an affine set (a convex set to be
introduced in the next chapter), and it is also a subspace only when b = 02 . 

The SVD has been widely used to solve a set of linear equations characterized
by the matrix A ∈ Rm×n as follows:
Ax = b. (1.169)
If (1.169) is solvable (i.e., b ∈ R(A)), there exist a variety of methods for
solving it, such as the widely used Gaussian elimination method, which actu-
ally finds an m × m invertible matrix E through row eliminations such that
EAx = rref(A)x = Eb (cf. Example 1.1), thus readily yielding the solutions for
x. Provided that rank(A) = r, the matrix rref(A) has r pivot rows (which also
constitute a basis for R(AT )) and r pivot columns (with the unit pivot being
the only nonzero entry in each pivot column) together with the last m − r zero
rows [Str19]. Note that the r columns in A that lead to the r pivot columns in
rref(A) also form a basis of R(A) (cf. Example 1.2).
b , exists only when b ∈ R(A) and it is
The solution of (1.169), denoted as x
given by
b = A† b + v, v ∈ N (A)
x (1.170)
which is unique only when N (A) = {0n } (i.e., A must be of full column rank).
Note that A† b ∈ R(AT ) due to (1.161) and (1.160), and thus vT A† b = 0. Oth-
erwise, the solution of (1.169) does not exist since
x = AA† b = PA b 6= b if b ∈
Ab / R(A) (cf. (1.103) and Remark 1.13). (1.171)

Suppose that A ∈ Rm×n with rank(A) = r and the thin SVD of A is given by
(1.159). Let Xℓ ∈ Rm×n with rank(X) ≤ ℓ denote the optimal low rank approx-
imation to the matrix A ∈ Rm×n by minimizing kX − Ak2F , which has been
widely known as

X
2
Xℓ = arg min X−A F
= σi ui viT (1.172)
rank(X)≤ℓ
i=1
Mathematical Background 43

(which will also be proven via the use of a convex optimization condition and
EVD (cf. (4.61), (4.62) and Remark 4.9 in Chapter 4) and the associated approx-
imation error is given by
r
X
2
ρℓ = Xℓ − A F
= σi 2 , (1.173)
i=ℓ+1

which will be zero as ℓ ≥ r. This is also an example illustrating least-squares


approximation via SVD, which has been popularly used in various applications
in science and engineering areas. Some more introduction to LS approximation is
given in the next subsection. Let us conclude this subsection with the following
remark about EVD and SVD for complex matrices.

Remark 1.19 (Hermitian and Unitary matrices) Complex inner product of two
vectors u, v ∈ Cn is defined as uH v. They are orthogonal if uH v = 0. For any
matrix X = XH ∈ Hn with eigenpairs (λi ∈ R, qi ∈ Cn ), i = 1, . . . , n (where λi ∈
R is due to (qH H H H 2
i Xqi ) = qi Xqi = λi qi qi = λi kqi k2 ∈ R), its EVD is given by

X = QΛQH , (1.174)
where Q = [q1 , . . . , qn ] ∈ Cn×n is a unitary matrix for which
QH Q = QQH = In , (1.175)
and Λ = Diag(λ1 , . . . , λn ). Note that, for any unitary matrix Q,
kQzk2 = kzk2 ∀ z ∈ Cn (1.176)
due to zH QH Qz = zH z = kzk22 .
For a complex matrix A ∈ Cm×n with rank(A) = r and singular values
{σ1 , . . . , σr } ⊂ R++ , its SVD is given by
A = UΣVH = Ur Σr VrH , (1.177)
where both U ∈ Cm×m and V ∈ Cn×n are unitary matrices, and Ur and Vr are
matrices containing the first r columns of U and V, respectively, and matrices
Σ and Σr are also given by (1.156) and (1.157), respectively. 

1.2.8 Least-squares (LS) approximation


The method of least squares is extensively used to approximately solve for the
unknown variables of a linear system with a given set of noisy measurements.
LS approximation can be interpreted as a method of data fitting. The best fit,
between modeled and observed data, in the LS sense is that the sum of squared
residuals reaches the least value, where a residual is the difference between an
observed value and the value computed from the model.
Consider a system characterized by a set of linear equations,
b = Ax + ǫ, (1.178)
44 Chapter 1. Mathematical Background

where A ∈ Rm×n is the given system matrix, b is the given data vector, and ǫ ∈
Rm is the measurement noise vector. The LS problem is to find an optimal x ∈ Rn
by minimizing kAx − bk22 . The LS solution, denoted as xLS , that minimizes
kAx − bk22 , is known as
n o
2
xLS , arg minn Ax − b 2 = A† b + v, v ∈ N (A) (1.179)
x∈R

which is actually an unconstrained optimization problem and the solution (with


the same form as the solution (1.170) to the linear equations (1.169)) may not
be unique. The resulting LS error is given by
2 2
ρLS = ǫLS , AxLS − b 2
= P⊥
Ab 2
(by (1.171)). (1.180)

Example 1.6 (LS-approximation) Provided that the system matrix and the
data vector in the linear model (1.178) are given by
 
2 1
A = 1 2
1 1
and b = 13 = (1, 1, 1), respectively, find the LS solution xLS and the associated
LS error ρLS = kAxLS − bk22 .
Solution: Because the matrix A is of full column rank, and the transpose of the
4
matrix A in Example 1.5, and b ∈/ R(A) since PA b = PAT 13 = 11 (3, 3, 2) 6= 13 ,
one can directly obtain
   
† T † † T 1 7 −4 1 † 1 4
A = (A ) = (A ) = =⇒ xLS = A 13 =
11 −4 7 1 11 4
 
1 1 −3
1  2 1
P⊥A = I3 − P AT = 1 1 −3  =⇒ ρLS = P⊥ T
A b 2 = k13 vk2 =
2
11 11
−3 −3 9

where v = [−1, −1, 3]/ 11 ∈ N (A) = R(AT )⊥ = R(A)⊥ by (1.168). Note that
Tr(P⊥A ) = 1. 

As m > n, the system (1.178) is an over-determined system (i.e., more equa-


tions than unknowns), while for m < n it is an under-determined system (i.e.,
more unknowns than equations). Suppose that A is of full column rank (i.e.,
rank(A) = n < m and N (A) = {0n }) for the over-determined case, and thus
A† = (AT A)−1 AT

PA = AA† = A(AT A)−1 AT
=⇒ (cf. (1.162)). (1.181)
PAT = A† A = In (i.e., R(AT ) = Rn )
Then the optimal xLS = A† b is unique with the LS error
2 2
ρLS = P⊥
Ab 2
= (Im − A(AT A)−1 AT )b 2
(by (1.180)). (1.182)
Mathematical Background 45

In other words, AxLS is the image vector of b projected onto the range space
R(A), and so the LS error (i.e., projection error) vector ǫLS ∈ R(A)⊥ = N (AT )
is nonzero if b ∈
/ R(A), as the instance presented in Example 1.6.
For the under-determined case (m < n), suppose that A is of full row rank
(i.e., rank(A) = m < n and dim(N (A)) = n − m > 0), and thus

−1 PA = AA† = Im (i.e., R(A) = Rm )
A† = AT AAT =⇒ −1 (1.183)
PAT = A† A = AT AAT A.
It can be observed that the expression A† for the overdetermined case (cf.
(1.181)) and that for the underdetermined case (cf. (1.183)) are related to each
other in form; the matrix inverse operatior (·)−1 applies to the first (last) two
elements of the finite sequence {AT , A, AT } for the former (latter). Then the
optimal xLS given by (1.179) is not unique, and
−1
xLS = A† b = AT AAT b ∈ R(AT ) (1.184)
is called the minimum-norm solution due to kxLS k2 = (kxLS k22 + kvk22 )1/2 >
kxLS k2 for any v ∈ N (A) \ {0n }. Note that AxLS = b ∈ R(A) = Rm due to
ρLS = 0 for this case, implying that every xLS is also a solution of the linear
equations Ax = b, as the instance presented in part (c) in Example 1.5. There-
fore, the closed-form minimum-norm solution xLS given by (1.184) can also be
expressed as
n o
2
xLS = arg minn x 2 | Ax = b , (1.185)
x∈R

that is exactly the convex optimization problem in Example 9.1 to which the
optimal solution xLS can be obtained by solving the associated KKT conditions.
However, if the system model is not linear, closed-form LS solutions usually do
not exist.

1.3 Summary and discussion

In this chapter, we have revisited some mathematical basics of sets, functions,


matrices, and vector spaces that will be very useful to understand the remaining
chapters and we also introduced the notations that will be used throughout this
book. The mathematical preliminaries reviewed in this chapter are by no means
complete. For further details, the readers can refer to [Apo07] and [WZ97] for
Section 1.1, and [HJ85], [HJ13] and [MS00] for Section 1.2, and other related
textbooks. Due to many nontrivial theoretical proofs in the following chapters,
this section summarizes the main proof methods used in the book for the reader
to follow more readily, followed by a synoptic discussion on the optimization
problem and the process to find the desired solutions.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy