0% found this document useful (0 votes)
17 views6 pages

Linear-Algebra-Review Xid-8243921 1

This document provides an overview of fundamental concepts in linear algebra, including definitions of linear vector spaces, inner product spaces, and Hilbert spaces. It explains the significance of bases, orthonormal bases, and orthogonal projections, as well as the process of Gram-Schmidt orthogonalization. Additionally, it covers eigenanalysis and the singular value decomposition, highlighting the importance of eigenvalues and eigenvectors in applications like Principal Component Analysis.

Uploaded by

vusieupeo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

Linear-Algebra-Review Xid-8243921 1

This document provides an overview of fundamental concepts in linear algebra, including definitions of linear vector spaces, inner product spaces, and Hilbert spaces. It explains the significance of bases, orthonormal bases, and orthogonal projections, as well as the process of Gram-Schmidt orthogonalization. Additionally, it covers eigenanalysis and the singular value decomposition, highlighting the importance of eigenvalues and eigenvectors in applications like Principal Component Analysis.

Uploaded by

vusieupeo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Basics of Linear Algebra

These notes give a review of basic concepts from linear algebra.1 Data are often represented
and manipulated as matrices, and linear algebra becomes the natural tool.

1 Linear Vector Space


Definition 1 A linear vector space X is a collection of elements satisfying the following proper-
ties:
addition: 8 x, y, z 2 X

a) x + y 2 X
b) x + y = y + x
c) (x + y) + z = x + (y + z)
d) 9 0 2 X , such that x + 0 = x
e) 8x 2 X , 9 x 2 X such that x + ( x) = 0

multiplication: 8 x, y 2 X and a, b 2 R

a) a x 2 X
b) a(b x) = (ab) x
c) 1x = x, 0x = 0
d) a(x + y) = ax + ay

Example 1 Here are two examples of linear vector spaces. The familiar d-dimensional Euclidean
space Rd and the space of finite energy signals/functions supported on the interval [0, T ]
⇢ Z T
L2 ([0, T ]) := x : x2 (t) dt < 1
0

It is easy to verify the properties above for both examples.

Definition 2 A subset M ⇢ X is subspace if x, y 2 M ) ax + by 2 M, 8 scalars a, b.

Definition 3 An inner product is a mapping from X ⇥ X to R. The inner product between any
x, y 2 X is denoted by hx, yi and it satisfies the following properties for all x, y, z 2 X :
a) hx, yi = hy, xi

b) hax, yi = ahx, yi, 8 scalars a


1
Adapted from Rob Nowak’s “Elements of Statistical Signal Processing,” with permission.

1
c) hx + y, zi = hx, zi + hy, zi

d) hx, xi 0

A space X equipped with an inner product is called an inner product space.


Pn
Example 2 Let X = Rn . Then hx, yi := xT y = i=1 xi yi .
R1
Example 3 Let X = L2 ([0, 1]). Then hx, yi := 0
x(t)y(t) dt.

The inner product measures


p the alignment of the two vectors. The inner product induces a
norm defined as kxk := hx, xi. The norm measures the length/size of x. The inner product
hx, yi = kxk kyk cos(✓), where ✓ is the angle between x and y. Thus, in general, for every x, y 2 X
we have |hx, yi|  kxkkyk, with equality if and only if x and y are linearly dependent or “parallel”;
i.e., ✓ = 0. This is called the Cauchy-Schwarz Inequality. Two vectors x, y are orthogonal if
hx, yi = 0.
  
1 0 1
Example 4 Let X = R , then x =
2
and y = are orthogonal, as are u = and
 0 1 1
1
v= .
1

Definition 4 An inner product space that contains all its limits is called a Hilbert Space and in
this case we often denote the space by H; i.e., if x1 , x2 , . . . are in H and limn!1 xn exists, then
the limit is also in H.

It is easy to verify that Rn , L2 ([0, T ]), and `2 (Z), the set of all finite energy sequences (e.g.,
discrete-time signals), are all Hilbert Spaces.

2 Bases and Representations


Definition 5 A collection of vectors {x1 , . . . , xk } are said to be linearly independent if none of
them can be represented as a linear
Pcombination of the others. That is, for any xi and every set of
scalar weights {✓j } we have xi 6= j6=i ✓j xj .

Definition 6 The set of all vectors that can be generated by taking linear combinations of {x1 , . . . , xk }
have the form
Xk
v = ✓i x i ,
i=1

is called the span of {x1 , . . . , xk }, denoted span(x1 , . . . , xk ).

2
Definition 7 A set of linearly independent vectors { i }i 1 is a basis for H if every x 2 H can be
represented as a unique linear combination of { i }. That is, every x 2 H can be expressed as
X
x = ✓i i
i 1

for a certain unique set of scalar weights {✓i }.


 
1 0
Example 5 Let H = R2 . Then and are a basis (since they are orthogonal). Also,
  0 1
1 1
and are a basis because they are linearly independent (although not orthogonal).
0 1

Definition 8 An orthonormal basis is one satisfying



1, i = j
h i , j i = ij :=
6 j
0, i =

Every x 2 H can be represented in terms of an orthonormal basis { i }i 1 (or ‘orthobasis’ for


short) according to: X
x = hx, i i i
i 1
P P
This
P is easy to seePas follows. Suppose x has a representation i ✓i i . Then hx, ji = h i ✓i i , ji =
i ✓i h i , j i = i ✓i i,j = ✓j .

Example 6 Here is an orthobasis for L2 ([0, 1]): for i = 1, 2, . . .


p
2i 1 (t) := 2 cos(2⇡(i 1)t)
p
2i (t) := 2 sin(2⇡it)

Doesn’t it look familiar?

Any basis can be converted into an orthonormal basis using Gram-Schmidt Orthogonalization.
Let { i } be a basis for a vector space X . An orthobasis for X can be constructed as follows.

3 Orthogonal Projections
One of the most important tools that we will use from linear algebra is the notion of an orthogonal
projection. Let H be a Hilbert space and let M ⇢ H be a subspace. Every x 2 H can be written
as x = y + z, where y 2 M and z ? M, which is shorthand for z orthogonal to M; that is
8v 2 M, hv, zi = 0. The vector y is the optimal approximation to x in terms of vectors in M in
the following sense:
kx yk = min kx vk
v2M

3
Gram-Schmidt Orthogonalization

1. 1 := 1 /k 1 k

2. 0
2 := 2 h 1, 2i 1; 2 = 0 0
2 /k 2 k

..
.
Pk 1
k. 0
k = k i=1 h i , k i i; k =
0 0
k /k k k

..
.

The vector y is called the projection of x onto M.


Here is an application of this concept. Let M ⇢ H and let { i }ri=1 be an orthobasis for M. We
say that the subspace M is spanned by { i }ri=1 . Note that this implies that M is an r-dimensional
subspace of H (and it is isometric to Rr ). For any x 2 H, the projection of x onto M is given by
r
X
y = h i , xi i
i=1

and this projection can be viewed as a sort of filter that removes all components of the signal that
are orthogonal to M.
 
1 0
Example 7 Let H = R . Consider the canonical coordinate system 1 =
2
and 2 = .
0 1
Consider the subspace spanned by 1 . The projection of any x = [x1 x2 ]T 2 R2 onto this subspace
is   ✓  ◆ 
1 1 1 1 x1
P1 x = hx, i = [x1 x2 ] =
0 0 0 0 0
The projection operator P1 is just a matrix and it is given by
 
T 1 1 0
P1 := 1 1 = [1 0] =
0 0 0
p   p
1/p2 1/ p2
It is also easy to check that 1 = and 2 = is an orthobasis for R2 .
1/ 2 1/ 2
What is the projection operator onto the span of 1 in this case?
More generally suppose we are considering Rn and we have an orthobasis { i }ri=1 P for some r-
dimensional, r < n, subspace M of R . Then the projection matrix is given by PM = ri=1 i Ti .
n

Moreover, if { i }ri=1 is a basis for M, but not necessarily orthonormal, then PM = ( T ) 1 T ,


where = [ 1 , . . . , r ], a matrix whose columns are the basis vectors.

4
Example 8 Let H = L2 ([0, 1]) and let M = {linear functions on [0, 1]}. Since all linear functions
have the form at + b, for t 2 [0, 1], here is a basis for M: 1 (t) = 1, 2 (t) = t. Note that this
means that M is two-dimensional. That makes sense since every line is defined by its slope and
intercept (two real numbers). Using the Gram-Schmidt procedure we can construct the orthobasis
1 (t) = 1, 2 (t) = t 1/2. Now, consider any function x 2 L2 ([0, 1]). The projection of x onto
M is
PM x = hx, 1i + hx, t 1/2i(t 1/2)
Z 1 Z 1
= x(⌧ ) d⌧ + (t 1/2) (⌧ 1/2)x(t) d⌧
0 0

4 Eigenanalysis
Suppose A is an m ⇥ n matrix with entries from a field (e.g., R or C, the latter being the complex
numbers). Then there exists a factorization of the form
A = U DV⇤
where U = [u1 · · · um ] is m⇥m with orthonormal columns, V = [v1 · · · vn ] is n⇥n with orthonor-
mal columns and the superscript ⇤ means transposition and conjugation (if complex-valued), and
the matrix D is m ⇥ n and has the form
2 3
1 0 0 ··· 0
6 0 0 ··· 0 7
6 2 7
D = 6 .. . . .. 7
4 . 0 . ··· . 5
0 0 ··· m 0 · ··
The values 1 , . . . , m are called the singular values of A and the factorization is called the singular
value decomposition (SVD). Because of the orthonormality of the columns of U and V we have
Avi = i ui and A⇤ ui = i vi , i = 1, . . . , m.
A vector u is called an eigenvector of A if A u = u for some scalar . The scalar is called
the eigenvalue associated with u. Symmetric matrices (which are always square) always have real
eigenvalues and have an eigendecomposition of the form A = U DU ⇤ , where the columns of U
the orthonormal eigenvectors of A, D is a diagonal matrix written D = diag( 1 , . . . , n ), and
diagonal entries are the eigenvalues. This is just a special case of the SVD. A symmetric positive-
semidefinite matrix satisfies the property v T Av 0 for all v. This implies that the eigenvalues of
symmetric positive-semidefinite matrices are non-negative.

Example 9 Let X be a random vector taking values in Rn and recall the definition of the covari-
ance matrix:
⌃ := E[(X µ)(X µ)T ]
It is easy to see that v T ⌃v 0, and of course ⌃ is symmetric. Therefore, every covariance matrix
has an eigendecomposition of the form ⌃ = U DU ⇤ , where D = diag( 1 , . . . , n ) and i 0 for
i = 1, . . . , n.

5
The Karhunen-Loève Transform (KLT), which is also called Principal Component Analysis
(PCA), is based on transforming a random vector X into the coordinate system associated with
the eigendecomposition of the covariance of X. Let X be an n-dimensional random vector with
covariance matrix ⌃ = U DU ⇤ . Let u1 , . . . , un be the eigenvectors. Assume that the eigenvectors
and eigenvalues are ordered such that 1 2 ··· n . The KLT or PCA representation of the
random vector X is given by
Xn
X = (uTi X)ui
i=1

The coefficients in this representation can be arranged in a vector as ✓ = U T X, where U is as


defined above. The vector ✓ is called the KLT or PCA of X. Using this representation we can
define the approximation to X in the span of the first r < n eigenvectors
r
X
Xr = (uTi X)ui
i=1

Note that this approximation involves only r scalar random variables {(uTi X)}ri=1 rather than n.
In fact, it is easy to show that among all r-term linear approximations of X in terms of r random
variables, Xr has the smallest mean square error; that is if we let Sr denote all r-term linear
approximations to X, then

E[kX Xr k2 ] = min E[kX Yr k]


Yr 2Sr

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy