MML Book (061 090)
MML Book (061 090)
7 Linear Mappings 55
where we first expressed the new basis vectors c̃k ∈ W as linear com-
binations of the basis vectors cl ∈ W and then swapped the order of
summation.
Alternatively, when we express the b̃j ∈ V as linear combinations of
bj ∈ V , we arrive at
n
! n n m
(2.106)
X X X X
Φ(b̃j ) = Φ sij bi = sij Φ(bi ) = sij ali cl (2.109a)
i=1 i=1 i=1 l=1
m n
!
X X
= ali sij cl , j = 1, . . . , n , (2.109b)
l=1 i=1
and, therefore,
such that
ÃΦ = T −1 AΦ S , (2.112)
ÃΦ = T −1 AΦ S. (2.113)
1 1 0 1
1 0 1 1 0 1 0
B̃ = (1 , 1 , 0) ∈ R3 , 0 , 1 , 1 , 0) .
C̃ = ( (2.119)
0 1 1
0 0 0 1
Then,
1 1 0 1
1 0 1 1 0 1 0
S = 1 1 0 , T =
0
, (2.120)
1 1 0
0 1 1
0 0 0 1
where the ith column of S is the coordinate representation of b̃i in
terms of the basis vectors of B . Since B is the standard basis, the co-
ordinate representation is straightforward to find. For a general basis B ,
we would need to solve a linear equation system to find the λi such that
P3
i=1 λi bi = b̃j , j = 1, . . . , 3. Similarly, the j th column of T is the coordi-
nate representation of c̃j in terms of the basis vectors of C .
Therefore, we obtain
1 −1 −1
1 3 2 1
1 1 −1 1 −1 0 4 2
ÃΦ = T −1 AΦ S =
(2.121a)
2 −1 1 1 1 10 8 4
0 0 0 2 1 6 3
−4 −4 −2
6 0 0
= 4
. (2.121b)
8 4
1 6 3
ker(Φ) Im(Φ)
0V 0W
1 2 −1 0
= x1 + x2 + x3 + x4 (2.125b)
1 0 0 1
is linear. To determine Im(Φ), we can take the span of the columns of the
transformation matrix and obtain
1 2 −1 0
Im(Φ) = span[ , , , ]. (2.126)
1 0 0 1
To compute the kernel (null space) of Φ, we need to solve Ax = 0, i.e.,
we need to solve a homogeneous equation system. To do this, we use
Gaussian elimination to transform A into reduced row-echelon form:
1 2 −1 0 1 0 0 1
⇝ ··· ⇝ . (2.127)
1 0 0 1 0 1 − 21 − 12
This matrix is in reduced row-echelon form, and we can use the Minus-
1 Trick to compute a basis of the kernel (see Section 2.3.3). Alternatively,
we can express the non-pivot columns (columns 3 and 4) as linear com-
binations of the pivot columns (columns 1 and 2). The third column a3 is
equivalent to − 21 times the second column a2 . Therefore, 0 = a3 + 12 a2 . In
the same way, we see that a4 = a1 − 12 a2 and, therefore, 0 = a1 − 12 a2 −a4 .
Overall, this gives us the kernel (null space) as
−1
0
1 1
ker(Φ) = span[ 1 , 0 ] .
2 2 (2.128)
0 1
rank-nullity
theorem Theorem 2.24 (Rank-Nullity Theorem). For vector spaces V, W and a lin-
ear mapping Φ : V → W it holds that
dim(ker(Φ)) + dim(Im(Φ)) = dim(V ) . (2.129)
One-dimensional affine subspaces are called lines and can be written line
as y = x0 + λb1 , where λ ∈ R and U = span[b1 ] ⊆ Rn is a one-
dimensional subspace of Rn . This means that a line is defined by a sup-
port point x0 and a vector b1 that defines the direction. See Figure 2.13
for an illustration.
Exercises
2.1 We consider (R\{−1}, ⋆), where
a ⋆ b := ab + a + b, a, b ∈ R\{−1} (2.134)
3 ⋆ x ⋆ x = 15
k = {x ∈ Z | x − k = 0 (modn)}
= {x ∈ Z | ∃a ∈ Z : (x − k = n · a)} .
Zn = {0, 1, . . . , n − 1}
a ⊕ b := a + b
a ⊗ b = a × b, (2.135)
a.
1 2 1 1 0
4 5 0 1 1
7 8 1 0 1
b.
1 2 3 1 1 0
4 5 6 0 1 1
7 8 9 1 0 1
c.
1 1 0 1 2 3
0 1 1 4 5 6
1 0 1 7 8 9
d.
0 3
1 2 1 2 1 −1
4 1 −1 −4 2 1
5 2
e.
0 3
1
−1 1 2 1 2
2 1 4 1 −1 −4
5 2
2.5 Find the set S of all solutions in x of the following inhomogeneous linear
systems Ax = b, where A and b are defined as follows:
a.
1 1 −1 −1 1
2 5 −7 −5 −2
A= , b=
2 −1 1 3 4
5 2 −4 2 6
b.
1 −1 0 0 1 3
1 1 0 −3 0 6
A= , b=
2 −1 0 1 −1 5
−1 2 0 −2 −1 −1
2.6 Using Gaussian elimination, find all solutions of the inhomogeneous equa-
tion system Ax = b with
0 1 0 0 1 0 2
A = 0 0 0 1 1 0 , b = −1 .
0 1 0 0 0 1 1
and 3i=1 xi = 1.
P
2.8 Determine the inverses of the following matrices if possible:
a.
2 3 4
A = 3 4 5
4 5 6
b.
1 0 1 0
0 1 1 0
A=
1
1 0 1
1 1 1 0
1 −1 1 1 0 −1
Determine a basis of U1 ∩ U2 .
2.13 Consider two subspaces U1 and U2 , where U1 is the solution space of the
homogeneous equation system A1 x = 0 and U2 is the solution space of the
homogeneous equation system A2 x = 0 with
1 0 1 3 −3 0
1 −2 −1 1 2 3
A1 = , A2 = .
2 1 3 7 −5 2
1 0 1 3 −1 2
where L1 ([a, b]) denotes the set of integrable functions on [a, b].
b.
Φ : C1 → C0
f 7→ Φ(f ) = f ′ ,
c.
Φ:R→R
x 7→ Φ(x) = cos(x)
d.
Φ : R 3 → R2
1 2 3
x 7→ x
1 4 3
and let us define two ordered bases B = (b1 , b2 ) and B ′ = (b′1 , b′2 ) of R2 .
a. Show that B and B ′ are two bases of R2 and draw those basis vectors.
b. Compute the matrix P 1 that performs a basis change from B ′ to B .
c. We consider c1 , c2 , c3 , three vectors of R3 defined in the standard basis
of R3 as
1 0 1
c1 = 2 , c2 = −1 , c3 = 0
−1 2 −1
Analytic Geometry
Orthogonal
Lengths Angles Rotations
projection
70
This material is published by Cambridge University Press as Mathematics for Machine Learning by
Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong (2020). This version is free to view
and download for personal use only. Not for re-distribution, re-sale, or use in derivative works.
©by M. P. Deisenroth, A. A. Faisal, and C. S. Ong, 2024. https://mml-book.com.
3.1 Norms 71
3.1 Norms
When we think of geometric vectors, i.e., directed line segments that start
at the origin, then intuitively the length of a vector is the distance of the
“end” of this directed line segment from the origin. In the following, we
will discuss the notion of the length of vectors using the concept of a norm.
∥ · ∥ : V → R, (3.1)
x 7→ ∥x∥ , (3.2)
which assigns each vector x its length ∥x∥ ∈ R, such that for all λ ∈ R length
and x, y ∈ V the following hold:
absolutely
Absolutely homogeneous: ∥λx∥ = |λ|∥x∥ homogeneous
Xn
∥x∥1 := |xi | , (3.3)
i=1
where | · | is the absolute value. The left panel of Figure 3.3 shows all
vectors x ∈ R2 with ∥x∥1 = 1. The Manhattan norm is also called ℓ1 ℓ1 norm
norm.
Euclidean distance and computes the Euclidean distance of x from the origin. The right panel
of Figure 3.3 shows all vectors x ∈ R2 with ∥x∥2 = 1. The Euclidean
ℓ2 norm norm is also called ℓ2 norm.
Remark. Throughout this book, we will use the Euclidean norm (3.4) by
default if not stated otherwise. ♢
We will refer to this particular inner product as the dot product in this
book. However, inner products are more general concepts with specific
properties, which we will now introduce.
where Aij := ⟨bi , bj ⟩ and x̂, ŷ are the coordinates of x and y with respect
to the basis B . This implies that the inner product ⟨·, ·⟩ is uniquely deter-
mined through A. The symmetry of the inner product also means that A
in a natural way, such that we can compute lengths of vectors using the in-
ner product. However, not every norm is induced by an inner product. The
Manhattan norm (3.3) is an example of a norm without a corresponding
inner product. In the following, we will focus on norms that are induced
by inner products and introduce geometric concepts, such as lengths, dis-
tances, and angles.
Remark (Cauchy-Schwarz Inequality). For an inner product vector space
(V, ⟨·, ·⟩) the induced norm ∥ · ∥ satisfies the Cauchy-Schwarz inequality Cauchy-Schwarz
inequality
| ⟨x, y⟩ | ⩽ ∥x∥∥y∥ . (3.17)
♢
is called the distance between x and y for x, y ∈ V . If we use the dot distance
product as the inner product, then the distance is called Euclidean distance. Euclidean distance
The mapping
d:V ×V →R (3.22)
(x, y) 7→ d(x, y) (3.23)
positive definite 1. d is positive definite, i.e., d(x, y) ⩾ 0 for all x, y ∈ V and d(x, y) =
0 ⇐⇒ x = y .
symmetric 2. d is symmetric, i.e., d(x, y) = d(y, x) for all x, y ∈ V .
triangle inequality 3. Triangle inequality: d(x, z) ⩽ d(x, y) + d(y, z) for all x, y, z ∈ V .
Remark. At first glance, the lists of properties of inner products and met-
rics look very similar. However, by comparing Definition 3.3 with Defini-
tion 3.6 we observe that ⟨x, y⟩ and d(x, y) behave in opposite directions.
Very similar x and y will result in a large value for the inner product and
a small value for the metric. ♢
0
⟨x, y⟩
−1 ⩽ ⩽ 1. (3.24)
∥x∥ ∥y∥
−1
0 π/2 π
ω Therefore, there exists a unique ω ∈ [0, π], illustrated in Figure 3.4, with
⟨x, y⟩
cos ω = . (3.25)
∥x∥ ∥y∥
angle The number ω is the angle between the vectors x and y . Intuitively, the
angle between two vectors tells us how similar their orientations are. For
example, using the dot product, the angle between x and y = 4x, i.e., y
is a scaled version of x, is 0: Their orientation is the same.
Consider two vectors x = [1, 1]⊤ , y = [−1, 1]⊤ ∈ R2 ; see Figure 3.6.
We are interested in determining the angle ω between them using two
different inner products. Using the dot product as the inner product yields
an angle ω between x and y of 90◦ , such that x ⊥ y . However, if we
choose the inner product
⊤ 2 0
⟨x, y⟩ = x y, (3.27)
0 1
which gives exactly the angle between x and y . This means that orthog-
onal matrices A with A⊤ = A−1 preserve both angles and distances. It
turns out that orthogonal matrices define transformations that are rota-
tions (with the possibility of flips). In Section 3.9, we will discuss more
details about rotations.
for all i, j = 1, . . . , n then the basis is called an orthonormal basis (ONB). orthonormal basis
If only (3.33) is satisfied, then the basis is called an orthogonal basis. Note ONB
orthogonal basis
that (3.34) implies that every basis vector has length/norm 1.
Recall from Section 2.6.1 that we can use Gaussian elimination to find a
basis for a vector space spanned by a set of vectors. Assume we are given
a set {b̃1 , . . . , b̃n } of non-orthogonal and unnormalized basis vectors. We
concatenate them into a matrix B̃ = [b̃1 , . . . , b̃n ] and apply Gaussian elim-
⊤
ination to the augmented matrix (Section 2.3.2) [B̃ B̃ |B̃] to obtain an
orthonormal basis. This constructive way to iteratively build an orthonor-
mal basis {b1 , . . . , bn } is called the Gram-Schmidt process (Strang, 2003).
e1
U
for lower and upper limits a, b < ∞, respectively. As with our usual inner
product, we can define norms and orthogonality by looking at the inner
product. If (3.37) evaluates to 0, the functions u and v are orthogonal. To
make the preceding inner product mathematically precise, we need to take
care of measures and the definition of integrals, leading to the definition of
a Hilbert space. Furthermore, unlike inner products on finite-dimensional
vectors, inner products on functions may diverge (have infinite value). All
this requires diving into some more intricate details of real and functional
analysis, which we do not cover in this book.
sin(x) cos(x)
product evaluates to 0. Therefore, sin and cos are orthogonal functions.
0.0
Figure 3.9
Orthogonal
projection (orange 2
dots) of a
1
two-dimensional
dataset (blue dots)
x2
0
onto a
one-dimensional −1
subspace (straight
line). −2
−4 −2 0 2 4
x1
b x
πU (x)
ω sin ω
ω cos ω b
(a) Projection of x ∈ R2 onto a subspace U (b) Projection of a two-dimensional vector
with basis vector b. x with ∥x∥ = 1 onto a one-dimensional
subspace spanned by b.
We can now exploit the bilinearity of the inner product and arrive at With a general inner
product, we get
⟨x, b⟩ ⟨b, x⟩ λ = ⟨x, b⟩ if
⟨x, b⟩ − λ ⟨b, b⟩ = 0 ⇐⇒ λ = = . (3.40) ∥b∥ = 1.
⟨b, b⟩ ∥b∥2
In the last step, we exploited the fact that inner products are symmet-
ric. If we choose ⟨·, ·⟩ to be the dot product, we obtain
b⊤ x b⊤ x
λ= = . (3.41)
b⊤ b ∥b∥2
⟨x, b⟩ b⊤ x
πU (x) = λb = b = b, (3.42)
∥b∥2 ∥b∥2
where the last equality holds for the dot product only. We can also
compute the length of πU (x) by means of Definition 3.1 as
Hence, our projection is of length |λ| times the length of b. This also
adds the intuition that λ is the coordinate of πU (x) with respect to the
basis vector b that spans our one-dimensional subspace U .
If we use the dot product as an inner product, we get
b⊤ x bb⊤
πU (x) = λb = bλ = b = x, (3.45)
∥b∥2 ∥b∥2
we immediately see that
bb⊤
Pπ = . (3.46)
∥b∥2
Projection matrices Note that bb⊤ (and, consequently, P π ) is a symmetric matrix (of rank
are always 1), and ∥b∥2 = ⟨b, b⟩ is a scalar.
symmetric.
The projection matrix P π projects any vector x ∈ Rn onto the line through
the origin with direction b (equivalently, the subspace U spanned by b).
Remark. The projection πU (x) ∈ Rn is still an n-dimensional vector and
not a scalar. However, we no longer require n coordinates to represent the
projection, but only a single one if we want to express it with respect to
the basis vector b that spans the subspace U : λ. ♢