0% found this document useful (0 votes)
19 views428 pages

MAST10007 Notes S2 2023 (Student) 1pp

The document contains lecture notes for MAST10007 Linear Algebra at the University of Melbourne, focusing on systems of linear equations, matrix operations, and their solutions. It covers definitions, examples, and methods such as Gaussian and Gauss-Jordan elimination for solving linear systems. The notes also discuss the concepts of consistent and inconsistent systems, as well as matrix notation and special types of matrices.

Uploaded by

joharmalik0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views428 pages

MAST10007 Notes S2 2023 (Student) 1pp

The document contains lecture notes for MAST10007 Linear Algebra at the University of Melbourne, focusing on systems of linear equations, matrix operations, and their solutions. It covers definitions, examples, and methods such as Gaussian and Gauss-Jordan elimination for solving linear systems. The notes also discuss the concepts of consistent and inconsistent systems, as well as matrix notation and special types of matrices.

Uploaded by

joharmalik0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 428

MAST10007 Linear Algebra

THE UNIVERSITY OF MELBOURNE


SCHOOL OF MATHEMATICS AND STATISTICS

Semester 2, 2023

Lecture Notes
These notes have been made in accordance with the provisions of Part VB of the copyright act for teaching purposes of the
University. These notes are for the use of students of the University of Melbourne enrolled in MAST10007 Linear Algebra.

1
Topic 1: Linear equations [AR 1.1 and 1.2]

1.1. Systems of linear equations. Row operations.

1.2. Reduction of systems to row-echelon form.

1.3. Reduction of systems to reduced row-echelon form.

1.4. Consistent and inconsistent systems.

2
1.1 Systems of linear equations. Row operations
Linear equations
Definition (Linear equation and linear system)
A linear equation in n variables, x1 , x2 , . . . , xn , is an equation of the form

a1 x1 + a2 x2 + · · · + an xn = b,

where the coefficients a1 , . . . , an and the constant term b are constants.

A finite collection of linear equations in the variables x1 , x2 , . . . , xn is


called a system of linear equations or a linear system.
Example

x1 + 5x2 + 6x3 = 100


x2 − x3 = −1
−x1 + x3 = 11
3
Definition (Homogeneous linear system)
A system of linear equations is called homogeneous if all the constants
on the right hand side are zero.
Example

x1 + 5x2 + 6x3 = 0
x2 − x3 = 0
−x1 + x3 = 0

Definition (Solution of a system of linear equations)


A solution to a system of linear equations in the variables x1 , . . . , xn is a
set of values of these variables which satisfy every equation in the
system.

4
Example 1. Data fitting using a polynomial
Find the coefficients c, α, β, γ such that the cubic polynomial
y = c + αx + βx 2 + γx 3 passes through the points specified below.

x y
−0.1 0.90483
0 1
0.1 1.10517
0.2 1.2214

Solution:
Substituting the points (x, y ) into the cubic polynomial gives:

5
Note:
Solving such equations by hand can be tedious, so we can use a
computer package such as MATLAB.

6
Example 2. (Revision)
Solve the following linear system.

2x − y = 3
x +y =0

Solution:
Graphically

Note:
◮ Need an accurate drawing.

◮ Not practical for three or more variables.


7
Elimination

Note:
Elimination of variables will always give a solution, but we need to do
this systematically and not in an adhoc manner, for three or more
variables.
8
Definition (Matrix)
A matrix is a rectangular array of numbers. A m × n matrix has m rows
and n columns.

Definition (Coefficient matrix and augmented matrix)


The coefficient matrix for a linear system is the matrix formed from the
coefficients in the equations.
The augmented matrix for a linear system is the matrix formed from the
coefficients in the equations and the constant terms. These are usually
separated by a vertical line.

9
Example 3.
Write down a coefficient matrix and augmented matrix for the following
linear system:
2x − y = 3
x +y =0
Solution:

10
Row Operations
To find the solutions of a linear system, we perform row operations to
simplify the augmented matrix. An essential condition is that the row
operations we perform do not change the set of solutions to the system.

Definition (Elementary row operations)


The elementary row operations are:
1. Interchanging two rows.
2. Multiplying a row by a non-zero constant.
3. Adding a multiple of one row to another row.

Note:
The matrices produced after each row operation are not equal but are
equivalent, meaning that the solution set is the same for the system
represented by each augmented matrix. We use the symbol ∼ to denote
equivalence of matrices.
11
Example 4.
Solve the following system using elementary row operations:

2x − y = 3
x +y =0
Solution:

12
1.2 Reduction of systems to row-echelon form

Definition (Leading entry)


The first non-zero entry in each row of a matrix is called the leading
entry.

Definition (Row-echelon form)


A matrix is in row-echelon form if:
1. For any non-zero two rows, the leading entry of the lower row is
further to the right than the leading entry in the higher row.
2. Any rows that consist entirely of zeros are grouped together at the
bottom of the matrix.

Note:
These conditions imply that in each column containing a leading entry,
all entries below the leading entry are zero.
13
Examples
 
1 −2 3 4 5 row-echelon form

 
1 0 0 3
 0 4 1 2  row-echelon form
0 0 0 3

 
0 0 0 2 4
 0 0 3 1 6 

 0 0
 not row-echelon form
0 0 0 
2 −3 6 −4 9

14
Gaussian elimination
Gaussian elimination is a systematic way to reduce a matrix to
row-echelon form using row operations.
Note: The row-echelon form obtained is not unique.

Gaussian elimination
1. Interchange rows, if necessary, to bring a non-zero number to the
top of the first column with a non-zero entry.
2. (Optional, but often useful.) Divide the first row by its leading
entry to create a new leading entry 1.
3. Add suitable multiples of the top row to lower rows so that all
entries below the leading entry are zero.
4. Start again at Step 1 applied to the matrix without the first row.

To solve a linear system we read off the equations from the row-echelon
matrix and then solve the equations to find the unknowns, starting with
the last equation. This final step is called back substitution.
15
Example 5.
Solve the following system of linear equations using Gaussian
elimination:

x − 3y + 2z = 11
2x − 3y − 2z = 13
4x − 2y + 5z = 31

Solution:
Step 1. Write the system in augmented matrix form.

16
Step 2. Use elementary row operations to reduce the augmented matrix
to row-echelon form.

17
Step 3. Read off the equations from the augmented matrix and use back
substitution to solve for the unknowns, starting with the last equation.

18
1.3 Reduction of systems to reduced row-echelon form

Definition (Reduced row-echelon form)


A matrix is in reduced row-echelon form if the following three conditions
are satisfied:
1. It is in row-echelon form.
2. Each leading entry is equal to 1 (called a leading 1).
3. In each column containing a leading 1, all other entries are zero.

19
Examples
 
1 −2 3 −4 5 reduced row-echelon form
 
1 2 0
reduced row-echelon form
0 0 1
 
1 0 0 3
 0 1 1 2  is not in reduced row-echelon form
0 0 0 3
 
1 0 0 2 4
 0 1 0 1 6 

 0
 is not in reduced row-echelon form
0 0 0 0 
0 0 1 −4 9

20
Gauss-Jordan elimination
Gauss-Jordan elimination is a systematic way to reduce a matrix to
reduced row-echelon form using row operations.
Note: The reduced row-echelon form obtained is unique.

Gauss-Jordan elimination
1. Use Gaussian elimination to reduce matrix to row-echelon form.
2. Multiply each non-zero row by an appropriate number to create a
leading 1 (type 2 row operations).
3. Use row operations (of type 3) to create zeros above the leading
entries.

21
Example 6.
Solve the following system of linear equations using Gauss-Jordan
elimination:

x − 3y + 2z = 11
2x − 3y − 2z = 13
4x − 2y + 5z = 31

Solution:
Step 1. Use Gaussian elimination to reduce the augmented matrix to
row-echelon form.

22
Step 2. Divide rows by their leading entry.

Step 3. Add multiples of the last row to the rows above to make the
entries above its leading 1 equal to zero.

23
Step 4. Add multiples of the penultimate (2nd last) row to the rows
above to make the entries above its leading 1 equal to zero.

Step 5. Read off the answers.

24
1.4 Consistent and inconsistent systems

Theorem
A system of linear equations has zero solutions, one solution, or
infinitely many solutions. There are no other possibilities.

We say that the system is:


◮ inconsistent if it has no solutions.
◮ consistent if it has at least one solution.

We can determine whether a system is consistent or inconsistent by


reducing its augmented matrix to row-echelon form.

Note:
Every homogeneous linear system is consistent, since it always has at
least one solution: x1 = 0, . . . , xn = 0.

25
Inconsistent systems

Theorem
The system is inconsistent if and only if there is at least one row in the
row-echelon matrix having all entries equal to zero except for a non-zero
final entry.

Example 7.
Solve the system with row-echelon form:
 
1 0 1 −2
 0 2 2 4 
0 0 0 −3
Solution:

26
Geometrically, an inconsistent system in 3 variables has no common
point of intersection for the planes determined by the system.

27
Consistent systems

Theorem
Suppose we have a consistent linear system with n variables.
◮ If the row-reduced augmented matrix has exactly n non-zero rows,
then the system has a unique solution.
◮ If the row-reduced augmented matrix has < n non-zero rows,
then the system has infinitely many solutions.
◮ If r is the number of non-zero rows in the row-echelon form,
then n − r parameters are needed to specify the solution set.

More precisely, in the row-echelon form of the augmented matrix, there


will be n − r columns which contain no leading entry. We can choose
the variable corresponding to such a column arbitrarily. The values for
the remaining variables are then uniquely determined.
Note: r is called the rank of the augmented matrix.

28
Example 8.
Solve the system with reduced row-echelon form:
 
1 0 0 2
 0 1 0 −4 
0 0 1 15
Solution:

29
Example 9.
Solve the system with reduced row-echelon form:
 
1 2 0 0 5 1
 0 0 1 0 6 2 
 
 0 0 0 1 7 3 
0 0 0 0 0 0
Solution:

30
31
Geometrically, a consistent system in 3 variables has a common point,
line or plane of intersection for the planes determined by the system.

32
Example 10. Calculating flows in networks
At each node • we require
sum of flows in = sum of flows out.
Determine a, b, c and d in the following network.

33
Solution:

34
35
36
Note:
◮ x = t(−1, 1, −1, 1), t ∈ R is the general solution of the
homogeneous linear system.
◮ x = (9, 1, 6, 0) is called a particular solution of the linear system.
37
Example 11.
Find the values of a, b ∈ R for which the system

x − 2y + z = 4
2x − 3y + z = 7
3x − 6y + az = b

has
(a) no solution,
(b) a unique solution,
(c) an infinite number of solutions.

Solution:

38
39
Topic 2: Matrices and Determinants [AR 1.3–1.7, 2.1–2.4]

2.1 Matrix notation


2.2 Matrix operations
2.3 Matrix inverses
2.4 Elementary matrices
2.5 Linear systems revisited
2.6 Determinants

40
2.1 Matrix notation

Definition (Matrix)
A matrix of size m × n is a rectangular array of numbers with m rows
and n columns.
 
A11 A12 . . . A1n
 A21 A22 . . . A2n 
A =  .. .. ..  or A = [Aij ]
 
..
 . . . . 
Am1 Am2 . . . Amn

We denote by Aij the entry in the i -th row and j-th column of A where
i = 1, 2, . . . , m and j = 1, 2, . . . , n.

41
Special matrices

◮ A matrix with the same number of rows as columns is called a


square matrix.
◮ A matrix with one row is called a row matrix or row vector.
◮ A matrix with one column is called a column matrix or column
vector.
◮ A square matrix with Aij = 0 for i 6= j is called a diagonal matrix.
h2 0 0i
e.g. 0 −1 0
0 0 5
◮ A square matrix with Aij = 0 for i > j is called an upper triangular
matrix.
h2 1 3i
e.g. 0 −1 4
0 0 5
◮ A square matrix with Aij = 0 for i < j is called a lower triangular
matrix.
h2 0 0i
e.g. 1 −1 0
5 3 5
42
◮ A matrix with all zero entries is called a zero matrix. We denote
the zero matrix by 0 or, if size is important, we write 0 m,n to
denote the zero matrix of size m × n.
e.g. 02,2 = [ 00 00 ], 02,3 = [ 00 00 00 ]
(
1 if i = j
◮ A square matrix A satisfying Aij = is called an
0 if i 6= j
identity matrix. We write I to denote an identity matrix or, if size
is important, we write In to denote the identity matrix of size n × n.
h1 0 0i
e.g. I2 = [ 10 01 ], I3 = 0 1 0
001

43
2.2 Matrix operations
Recall that we can multiply a matrix by a scalar, add matrices of the
same size, and multiply matrices if the sizes are compatible.

Definition (Scalar multiplication)


Let A be a matrix and c be any scalar (real or complex number). The
product cA is the matrix obtained by multiplying all entries of A by c.
It is called a scalar multiple of A:
(cA)ij = c(Aij )

Definition (Addition of matrices)


Let A and B be m × n matrices. The sum A + B is an m × n matrix
obtained by adding corresponding entries of A and B.

(A + B)ij = Aij + Bij

44
Definition (Matrix multiplication)
Let A be an m × n matrix and B be a n × q matrix. The product AB is
a matrix of size m × q.
The entry in position (i , j) of the matrix product is obtained by taking
row i of A, and column j of B, then multiplying together the entries in
order and adding.
Xn
(AB)ij = Aik Bkj
k=1

Note:
◮ The product AB is only defined if the number of columns in A is
equal to the number of rows in B.
◮ We can think of matrix multiplication in terms of dot products:

(AB)ij = (row i of A) · (column j of B).

45
Properties of matrix multiplication
Whenever the matrix products and sums are defined:
1. A(B + C ) = AB + AC (left distributivity)
2. (A + B)C = AC + BC (right distributivity)
3. A(BC ) = (AB)C (associativity)
4. A(αB) = α(AB)
5. AIn = Im A = A (where A has size m × n)
6. A0 = 0 and 0 A = 0

where α is a scalar and 0 denotes the zero matrix of appropriate size.

Definition (Matrix powers)


If A is a square matrix and n > 1 is an integer we define

An = AA · · · A

as the product of n copies of A.


46
Application of matrix powers

Example 1. Graphs and adjacency matrices


A graph is a set of points called vertices together with a set of lines or
curves called edges which connect certain vertices.

Every graph has a corresponding adjacency matrix that specifies which


vertices Vi are connected. If the graph has n vertices then the
adjacency matrix is n × n and the (i , j) entry Aij is equal to the number
of edges from Vi to Vj .
47
Determine the adjacency matrix for the above graph.

Solution:

48
Example 2. Walks in a graph
A walk in a graph is a sequence of edges linking one vertex to another.
The length of a walk is equal to the number of edges it traverses.

Theorem
If A is an n × n adjacency matrix of a graph and Akij represents the (i , j)
entry of Ak , then Akij is equal to the number of walks of length k from
Vi to Vj for each k > 1.
In particular, the entry Aij of A gives the number of walks of length 1
from Vi to Vj .

Problem:
Determine the number of walks of length 3 between V1 and V4 for the
graph in example 1.

49
Solution:
To find walks of length 3, we compute A3 .
 
2 6 3 3 5
 6 6 6 7 7 
A3 = 
 
 3 6 2 5 3 

 3 7 5 4 7 
5 7 3 7 4

50
Definition (Transpose of a matrix)
Let A be an m × n matrix. The transpose of A, denoted by AT , is the
n × m matrix whose entries are given by interchanging the rows and
columns of A:
(AT )ij = Aji

Example 3.
 
1 2 3
Let A = . Find AT .
4 5 6

Solution:

51
Properties of the transpose
Let α be a scalar. The following properties hold whenever the matrix
products and sums exist:
T
1. AT =A
2. (A + B)T = AT + B T
3. (αA)T = αAT
4. (AB)T = B T AT

Proof: Property 4.
(1) Check that (AB)T and B T AT are the same size.

52
(2) Show that the corresponding entries are equal for all i and j.

53
2.3 Matrix inverses
Definition (Matrix inverse)
An n × n matrix A is called invertible if there exists a matrix B such that

AB = BA = In .

The matrix B is called the inverse of A and is denoted by A−1 . If A is


not invertible, we say that A is singular.
We can prove from the definition that:
◮ (If it exists) A−1 has the same size as A.

◮ (If it exists) A−1 is unique.


◮ If A is invertible, then A−1 is invertible and (A−1 )−1 = A.
◮ In−1 = In
◮ 0 is singular.
◮ A matrix that has a row or column consisting entirely of zeros is
singular. 54
Properties of the matrix inverse
If A and B are n × n invertible matrices and α is a non-zero scalar, then
1. (αA)−1 = α1 A−1
2. (AB)−1 = B −1 A−1
−1 k
3. Ak = A−1 (for all integers k > 1)
 −1 T
4. AT = A−1

Proof: Property 2.

55
56
Theorem (Inverse of a 2 × 2 matrix)
 
a b
Let A= . Then
c d
1. A is invertible ⇐⇒ ad − bc 6= 0.
 
1 d −b
2. If ad − bc 6= 0, then A−1 = .
ad − bc −c a

Example 4.  
2 −1
Find the inverse of A = .
1 1
Solution:

57
Finding the inverse of a square matrix via Gauss-Jordan elimination
Calculating the inverse of a matrix
Algorithm: input: n × n matrix A
output: A−1 or “A is not invertible”.
1. Construct the augmented matrix [A | In ].

2. Apply row operations to [A | In ] to get the block corresponding to


A into reduced row-echelon form. This gives

[A | In ] ∼ [R | B]

where R is in reduced row-echelon form.

3. If R = In , then A is invertible and A−1 = B.


6 In , then A is singular , i.e. A−1 does not exist.
If R =

58
Example 5.  
1 2 1
Find the inverse of A =  −1 −1 1 .
0 1 3

Solution:
Form the augmented matrix [A | I3 ] and perform row operations so that
A is in reduced row-echelon form.

59
60
2.4 Elementary matrices
The effect of an elementary row operation can be achieved by
multiplication on the left by an elementary matrix.

Definition (Elementary matrix)


An n × n matrix is an elementary matrix if it can be obtained from In by
performing a single elementary row operation.

Theorem
Let Ep be the elementary matrix obtained by applying a row operation p
to In .
If A is a matrix such that Ep A is defined, then Ep A is equal to the result
of performing p on A.

Thus, we can perform a sequence of elementary row operations using a


corresponding sequence of elementary matrices.
61
Example 6.
Find the elementary matrices obtained by applying the following row
operations on I3 .
(a) R2 ↔ R3 (b) R1 → 2R1
(c) R3 → R3 + 3R1

Solution:

62
Example 7. 
1 2
Let A = . Assume that
3 4
       
1 2 1 2 1 2 1 0
∼ ∼ ∼ .
3 4 0 −2 0 1 0 1

(a) Write I2 as a product of elementary matrices and A.


(b) Write A−1 as a product of elementary matrices.
(c) Write A as a product of elementary matrices.
Solution:

63
64
In general, if A ∼ In by a sequence of elementary row operations, then
there is a sequence of elementary matrices E1 , E2 , . . . , Ek such that

Ek Ek−1 · · · E2 E1 A = In .

Each of the elementary matrices Ep (p = 1, . . . k) is invertible.

Theorem
1. Let A be an n × n matrix. Then

A is invertible ⇐⇒ A ∼ In .

2. Every invertible matrix A can be written as a product of elementary


matrices:
A = E1−1 E2−1 · · · Ek−1 .
3. If A and B are n × n matrices such that AB = In , then A and B
are invertible with A−1 = B and B −1 = A.
65
2.5 Linear systems revisited
A linear system in the variables x1 , . . . , xn can be written in the form
Ax = b
where A is an m × n matrix, b is an m × 1 matrix and x is the n × 1
matrix [x1 , . . . , xn ]T .

Theorem
If A is an invertible n × n matrix, then every linear system of the form
Ax = b has a unique solution, given by x = A−1 b.

Proof:
Since A is an n × n invertible matrix, then AA−1 = A−1 A = In . So
Ax = b
⇒A−1 Ax = A−1 b
⇒In x = A−1 b
⇒x = A−1 b
66
Example 8.
Use a matrix inverse to solve the linear system

x + 2y + z = −4
−x − y + z = 11
y + 3z = 21
Solution:
Step 1. Write the system in the form Ax = b

Step 2. Find A−1 .

67
Step 3.
The solution is x = A−1 b.

Note:
Once we know A−1 we can solve the linear system Ax = b for any
choice of b.

68
Rank of a matrix

Definition (Rank of a matrix)


The rank of a matrix A is the number of non-zero rows in the reduced
row-echelon form of A.

Note:
◮ This is the same as the number of non-zero rows in any
row-echelon form of A.
◮ If A has size m × n, then rank(A) 6 m and rank(A) 6 n.

69
Example 9.
Find the rank of  
1 −1 2 1
A= 0 1 1 −2 
1 −3 0 5
Solution:

70
Rank and solutions of linear systems
We can rewrite the results of section 1.4 for the solutions of a linear
system in terms of rank.

Theorem
The linear system Ax = b, where A is an m × n matrix, has
1. No solution if rank(A) < rank([A | b]).

2. A unique solution if rank(A) = rank([A | b]) and rank(A) = n.

3. Infinitely many solutions if rank(A) = rank([A | b]) and


rank(A) < n.

Note:
rank(A) 6 rank([A | b])

71
Example 10.
Determine the number of solutions of the system with row-echelon form:
   
1 −1 2 1 1 −1 2 1
 0 1 1 −2  ∼  0 1 1 −2 
1 −3 0 5 0 0 0 0
Solution:

72
Theorem
If A is an n × n matrix, the following conditions are equivalent:
1. A is invertible.
2. Ax = b has a unique solution for any b.
3. The rank of A is n.
4. The reduced row-echelon form of A is In .

73
2.6 Determinants

The inverse of  
a b
A=
c d
exists if and only if ad − bc 6= 0.

We call ad − bc the determinant of A, and write this as det(A) or |A|.

Geometrically, | det(A)| gives the area of the parallelogram spanned by


the row vectors v1 = (a, b), v2 = (c, d) of A.

The sign of det(A) depends on the orientation of v1 , v2 .

74
Examples of area and determinants

Multiplying one row by a positive scalar

The parallelograms of the row vectors of


   
1 0 3 0
A= and B=
0 2 0 2
(0,2)
(0,2)

(1,0) (3,0)
det(A)=2, area equals 2 det(B)=6, area equals 6

75
Adding a multiple of one row to another row

The parallelograms of the row vectors of


   
3 0 3 0
A= and B=
2 2 0 2

(2,2)
(0,2)

(3,0) (3,0)
det(A)=6, area equals 6 det(B)=6, area equals 6

76
Multiplying a row by −1

The parallelograms of the row vectors of


  
3 0 −3 0
A= and B=
2 2 2 2

(2,2) (2,2)

(3,0) (-3,0)
anti-clockwise orientation clockwise orientation
det(A)=6, area equals 6 det(B)=-6, area equals 6

77
Interchanging two rows

The parallelograms of the row vectors of


  
1 0 0 1
A= and B=
0 1 1 0
(0,1) (0,1)


(1,0) (1,0)

anti-clockwise orientation clockwise orientation


det(A)=1, area equals 1 det(A)=-1, area equals 1

78
Defining determinants

For n × n matrices the determinant can be interpreted as the volume of


an n-dimensional parallelepiped.
For n × n matrices we can define and compute the determinant by
understanding how it changes under elementary row operations:

Definition (Determinant)
The determinant is a function that maps each n × n matrix A to a
number, denoted det(A) or |A|, that satisfies the following conditions:
D1. Adding one row to another row does not change the determinant.
D2. Multiplying any row by a scalar α multiplies the determinant by α.
D3. Swapping two rows multiplies the determinant by −1.
D4. The determinant of the identity matrix In equals 1.

79
Theorem
These four conditions specify the determinant uniquely.

Note:
◮ Definitions D1 and D2 imply that adding any multiple of one row
to another row does not change the determinant.
◮ All invertible n × n matrices have a non zero determinant.

80
Example 11.
Compute the determinant of
 
2 6 9
A= 0 3 8 
0 0 −1

using row operations.


Solution:

81
82
A similar argument proves the following:

Theorem
If A is an upper triangular or lower triangular matrix, then det(A) is the
product of the elements on the main diagonal.

Calculating determinants
For a general matrix, we use Gaussian elimination to transform the
matrix into triangular form, keeping track of what each step does to the
determinant. Then we can read off the determinant using the theorem.

83
Example 12.
Compute the determinant of
 
1 2 1
A =  −1 1 1 
0 1 3
using row operations.
Solution:

84
Properties of the determinant

Theorem
Let A, B be n × n matrices and let α be a scalar. Then
1. If A has a row (or column) of zeros, then det(A) = 0.
2. If A has a row (or column) which is a scalar multiple of another
row (or column) then det(A) = 0.
3. det(αA) = αn det(A)

4. det(AB) = det(A) det(B)

5. det AT = det(A)


6. A is invertible if and only if det(A) 6= 0.


1
7. If A is invertible then det A−1 =

.
det(A)
Note:
Property 5 means that we can replace rows by columns in all the
previous properties of determinants, including D1-D3.
85
Example 13.
Prove property 7 for the determinant, using the definition of
determinant and the previous properties.

Solution:

86
Cofactor expansion
Another way to calculate determinants is to use cofactor expansion.

Definition (Cofactor)
Let A be a square matrix. The (i , j)-cofactor of A, denoted by Cij , is
the number given by

Cij = (−1)i +j det (A(i , j))

where A(i , j) is the matrix obtained from A by deleting the i th row and
jth column.

87
Example 14. 
1 2 1
If A =  −1 −1 1 , calculate C23 .
0 1 3

Solution:

88
Theorem (Cofactor expansion)
The determinant of an n × n matrix A can be computed by choosing
any row (or column) of A and multiplying the entries in that row (or
column) by their cofactors and then adding the resulting products.
That is, for each 1 6 i 6 n,
n
X
det(A) = Ai 1 Ci 1 + Ai 2 Ci 2 + · · · + Ain Cin = Aik Cik
k=1

(cofactor expansion along the ith row)


and for each 1 6 j 6 n,
n
X
det(A) = A1j C1j + A2j C2j + · · · + Anj Cnj = Akj Ckj
k=1

(cofactor expansion along the jth column).

89
How do you remember the sign of the cofactor?
The (1, 1) cofactor always has sign +. Starting from there, imagine
walking to the square you want using either horizontal or vertical steps.
The appropriate sign will change at each step.

We can visualise this arrangement with the following matrix:


 
+ − + − ...
 − + − + ... 
 
 + − + − ... 
 .
 − + − + ... 
.. .. .. .. . .
 
. . . . .

So, for example, C13 is assigned + but C32 is assigned −.

Note:
The cofactor method is particularly useful for matrices with many zeros.
90
Example 15.
Compute the determinant of
 
1 2 1
A =  −1 1 1 
0 1 3

using cofactor expansion.


Solution:

91
Example 16.
1 −2 0 1
3 2 2 0
Calculate |A| = using cofactors.
1 0 1 0
0 −4 2 4
Solution:

92
93
Topic 3: Euclidean Vector Spaces [AR 3.1-3.5]

3.1 Vectors in Rn
3.2 Dot product
3.3 Cross product of vectors in R3
3.4 Lines and planes

94
3.1 Vectors in Rn

Definition (Addition and scalar multiplication of vectors in Rn )


We define Rn as the set of all ordered n-tuples of real numbers and
write these as row vectors:

v = (v1 , v2 , . . . , vn ), where vi ∈ R for i = 1, 2, . . . , n.

Let v = (v1 , v2 , . . . , vn ) ∈ Rn , u = (u1 , u2 , . . . , un ) ∈ Rn and α ∈ R.


Then addition and scalar multiplication are defined by:

v + u = (v1 , . . . , vn ) + (u1 , . . . , un ) = (v1 + u1 , . . . , vn + un )


αv = α(v1 , v2 , . . . , vn ) = (αv1 , αv2 , . . . , αvn )

Note:
In handwritten work, we denote vectors using the notation ~v or v or v .
˜

95
In R3 , the unit vectors in the x, y , z directions are often denoted by
i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1)
so any vector in R3 can be written in the form
v = (v1 , v2 , v3 ) = v1 i + v2 j + v3 k.

Definition (Length of a vector)


The length (norm, magnitude) of a vector u = (u1 , u2 , . . . , un ) ∈ Rn is
given by q
kuk = u12 + u22 + · · · + un2 .
A vector of length 1 is called a unit vector.

Note:
We say a vector u is parallel to a vector v if u is a scalar multiple of v.
Then every non-zero vector u is parallel to a unit vector û, where
u
û = .
||u||
96
Distance between two points
Let u, v ∈ Rn . The distance between u and v is
d(u, v) = ||v − u||.

v−u

Example 1.
Find the distance between the points P(1, 3, −1, 2) and Q(2, 1, −1, 3).
Solution:

97
3.2 Dot product

Definition (Dot product)


Let u = (u1 , u2 , . . . , un ) ∈ Rn and v = (v1 , v2 , . . . , vn ) ∈ Rn .
We define the dot product (scalar product, Euclidean inner product) by
Xn
u·v = uk vk = u1 v1 + u2 v2 + · · · + un vn .
k=1

Note:    
u1 v1
 ..   .. 
If we write u and v as column matrices U =  .  and V =  . , then
un vn

u · v = UT V .

98
Properties of the dot product
Let u, v, w ∈ Rn and let α ∈ R be a scalar. Then
1. u · v is a scalar
2. u · v = v · u
3. (u + v) · w = u · w + v · w
4. u · (v + w) = u · v + u · w
5. (αu) · v = u · (αv) = α(u · v)
6. u · u = kuk2

99
Geometry of the dot product
We can also interpret the dot product geometrically by

u · v = kukkvk cos θ
Ƨ
u

where 0 6 θ 6 π is the angle between the vectors u and v.

Note:
◮ Two non-zero vectors are perpendicular or orthogonal when
u · v = 0, so that θ = π2 .
◮ Since | cos θ| 6 1, then

|u · v| 6 ||u|| ||v||.

This is known as the Cauchy-Schwarz inequality.

100
Example 2.
Verify that the Cauchy-Schwarz inequality holds for the vectors
u = (0, 2, 2, −1) and v = (−1, 1, 1, −1).
Solution:

101
Example 3. Correlation
Correlation is a measure of how closely two variables are dependent.
Suppose we want to compare how closely 7 students’ assignment marks
correlate with their exam marks.

Student Assignment Mark Exam Mark


S1 99 100
S2 80 82.5
S3 79 79
S4 75.5 82.5
S5 87.5 91
S6 67 67.5
S7 76 68
Average 80.5 81.5

To measure the correlation we first need to adjust the scores so that


they have mean zero. This can be done by subtracting the average
score X̄ from each students’ score X .
102
Let’s store the differences X − X̄ in a matrix:

18.5 18.5
 

 −0.5 1 


 −1.5 −2.5 

X − X̄ = 
 −5 1 


 7 9.5 

 −13.5 −14 
−4.5 −13.5
To compare the exam and assignment scores, we compute the cosine of
the angle between the two columns x1 , x2 of this matrix:
x1 · x2 656.75
cos θ = = ≈ 0.92
kx1 kkx2 k (24.92)(28.62)

A cosine value near 1 indicates that the scores are highly correlated.

103
Vector projection

Definition (Projection of v onto u)


Assuming u 6= 0, the projection of v onto u is given by
u·v
proju v = u = (û · v)û
kuk2

v-projuv

θ projuv

proju v = kvk cos θ û


= kûkkvk cos θ û
= (û · v)û
Note:
u · (v − proju v) = 0
104
Example 4.
Let u = (2, −1, −2) and v = (2, 1, 3). Find vectors v1 and v2 such that

v = v1 + v2

where v1 is parallel to u and v2 is perpendicular to u.


Solution:

105
3.3 Cross product of vectors in R3

Definition (Cross product)


Let u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) be two vectors in R3 . We define
the cross product (or vector product) of u and v by

u × v = (u2 v3 − u3 v2 )i + (u3 v1 − u1 v3 ) j + (u1 v2 − u2 v1 )k.

It is easy to remember the cross product as a determinant expanded


along the first row:

i j k
u u u u u u
u × v = u1 u2 u3 = 2 3 i − 1 3 j + 1 2 k
v2 v3 v1 v3 v1 v2
v1 v2 v3

Note:
The cross product is defined only for R3 .
106
Properties of the cross product
Let u, v, w ∈ R3 and α ∈ R be a scalar. Then
1. v × u = −(u × v)
2. (u + v) × w = u × w + v × w
3. u × (v + w) = u × v + u × w
4. (αu) × v = u × (αv) = α(u × v)
5. u × u = 0

Example
i × j = k, j × k = i, k × i = j.

107
Geometry of the cross product
We can also interpret the cross product geometrically by

Ƨ
u × v = kuk kvk sin(θ) n̂ u

where
◮ 0 6 θ 6 π is the angle between u and v
◮ n̂ is a unit vector
◮ n̂ is perpendicular to both u and v
◮ n̂ points in the direction given by the right-hand rule. The
direction of u × v is determined by curling your fingers on your
right hand following the angle from u to v and noting the direction
your thumb is pointing.
108
Example 5.
Find a vector perpendicular to both (1, 1, 1) and (1, −1, −2).
Solution:

109
Application: Area of a parallelogram
Suppose that u, v ∈ R3 . The area of the parallelogram defined by u and
v is equal to ku × vk.

v
height
θ
u

base length

Area of parallelogram = (base) (height)


= kuk kvk sin θ
= ku × vk

110
Example 6.
Find the area of the triangle with vertices (2, −5, 4), (3, −4, 5) and
(3, −6, 2).
  

   V
  

Solution:

111
Definition (Scalar triple product)
Let u = (u1 , u2 , u3 ), v = (v1 , v2 , v3 ) and w = (w1 , w2 , w3 ) be vectors in
R3 . The scalar triple product is the real number u · (v × w) given by

u1 u2 u3
u · (v × w) = v1 v2 v3 .
w1 w2 w3

Properties of the scalar triple product

1. v · (v × w) = w · (v × w) = 0
2. u · (v × w) = v · (w × u) = w · (u × v)
3. u · (v × w) = −u · (w × v) = −v · (u × w) = −w · (v × u)

Note:
Property 1 says that v × w is orthogonal to both v and w.

112
Application: Volume of a parallelepiped
Suppose that u, v, w ∈ R3 . Then, the parallelepiped defined by the
vectors u, v and w has volume equal to the absolute value of the scalar
triple product of u, v and w:
Volume of parallelepiped = |u · (v × w)|

θ w

Volume of parallelepiped = (area of base) (height)


= kv × wk kuk | cos θ|
= |u · (v × w)|

Note:
This is the absolute value of the determinant of the matrix with rows
given by u, v, w.
113
Example 7.
−→ −→ −

Find the volume of the parallelepiped with adjacent edges PQ, PR, PS,
where the points are P(2, 0, −1), Q(4, 1, 0), R(3, −1, 1) and
S(2, −2, 2).
Solution:

114
3.4 Lines and planes
Equations of a straight line
A vector equation of a line through a point P0 in the direction of a
vector v is given by
r = r0 + tv, t ∈ R
−−→
where r0 = OP0 .

3
Wv
P

U
r

0
Each point on the line is given by a unique value of t.
115
For example, in R3 , if r = (x, y , z), r0 = (x0 , y0 , z0 ) and v = (a, b, c),
the vector equation of the line is

(x, y , z) = (x0 , y0 , z0 ) + t(a, b, c), t∈R

Equating components gives parametric equations for the line:

x = x0 + ta
y = y0 + tb, t∈R
z = z0 + tc

If a 6= 0, b 6= 0 and c 6= 0, we can solve the parametric equations for t


and equate. This gives a Cartesian form of the line:
x − x0 y − y0 z − z0
= =
a b c
Note:
The equations for a line are not unique.
116
Example 8.
Determine vector, parametric and Cartesian equations of the line
passing through the points P(−1, 2, 3) and Q(4, −2, 5).
Solution:

117
118
Example 9.
Find a vector equation of the line through the point (2, 0, 1) that is
parallel to the line
x −1 y +2 z −6
= = .
1 −2 2
Does the point (0, 4, −3) lie on the line?
Solution:

119
Definition
Two lines are said to
◮ intersect if there is a point lying on both lines.
◮ be parallel if their direction vectors are parallel.
◮ be skew if they do not intersect and are not parallel.

The angle between two lines is given by the angle θ between their
direction vectors. (We take the acute angle, which is the smaller of the
two possible angles θ and π − θ.)

120
Equations of planes
A vector equation of a plane through a point P0 that contains two
non-parallel vectors in the directions u and v is
r = r0 + su + tv, s, t ∈ R,
−−→
where r0 = OP0 .

su

P0
tv

r0
r

Every value of s and t determines a point on the plane.


In R3 , a vector perpendicular to the plane is
n=u×v
We say that n is a normal vector to the plane.
121
If P is a point with position vector r, then

P lies on the plane ⇔ r − r0 is parallel to plane


⇔ r − r0 is perpendicular to n

r - r0
n

r0
r

Hence the equation of the plane in R3 can also be written in


point-normal form:
(r − r0 ) · n = 0
122
Putting r = (x, y , z), r0 = (x0 , y0 , z0 ) and n = (a, b, c) in the
point-normal form, we derive the Cartesian equation of a plane:

r · n = r0 · n
⇒ (x, y , z) · (a, b, c) = (x0 , y0 , z0 ) · (a, b, c)
⇒ ax + by + cz = ax0 + by0 + cz0
⇒ ax + by + cz = d

where d is the constant given by

d = ax0 + by0 + cz0 .

Note:
◮ The equations for a plane are not unique.
◮ The angle between two planes in R3 is given by the angle θ
between their normal vectors. (We take the acute angle, which is
the smaller of the two possible angles θ and π − θ.)
123
Example 10.
Find a Cartesian equation of the plane with vector form

(x, y , z) = s(1, −1, 0) + t(2, 0, 1) + (−1, 1, 1), s, t ∈ R

Solution:

124
125
Example 11.
Find a vector equation for the plane containing the points P(1, 0, 2),
Q(1, 2, 3) and R(4, 5, 6).
Solution:

126
Intersection of a line and a plane
Example 12.
Where does the line
x −1 y −2 z −3
= =
1 2 3
intersect the plane 3x + 2y + z = 20?
Solution:

127
Intersection of two planes
Example 13.
Find a vector form for the line of intersection of the two planes
x + 3y + 2z = 6 and 3x + 2y − z = 11.
Solution:

128
129
Topic 4: General Vector Spaces [AR 4.1 - 4.8]

4.1 General vector spaces


4.2 Subspaces
4.3 Linear combinations and spans
4.4 Linear dependence and linear independence
4.5 Bases and dimension
4.6 Solution space, column space, row space
4.7 Coordinates relative to a basis

130
4.1 General vector spaces

Vector spaces
We want to extend the properties of vectors in Rn to more general sets
of objects.

A vector space is a set V with two operations defined:


1. vector addition,
2. scalar multiplication.

We want these two operations to satisfy the basic algebraic properties


of vectors in Rn . This leads to a list of ten axioms (properties) that we
use as our definition.

131
Fields
The scalars are members of a number system F called a field where
addition, subtraction, multiplication, and division by a non zero scalar
are defined.

Every field F contains a:


◮ zero scalar 0 such that α + 0 = α for all α ∈ F.
◮ unit scalar 1 (with 1 6= 0) such that 1α = α for all α ∈ F.

In this subject, we focus on the fields F = R and F = C.

We will also look at F = F2 = {0, 1}, the integers modulo 2.

132
Definition (Vector space)
Fix a field of scalars F. Then a vector space over F is a non-empty set
V with two operations: vector addition and scalar multiplication. These
operations are required to satisfy the following 10 rules (axioms).

For all u, v, w ∈ V :
Vector addition behaves well:
A1 u + v ∈ V . (closure of vector addition)
A2 (u + v) + w = u + (v + w). (associativity)
A3 u + v = v + u. (commutativity)
There must be a zero and inverses:
A4 There exists a vector 0 ∈ V such that
v + 0 = v for all v ∈ V . (existence of zero vector)
A5 For all v ∈ V , there exists a vector −v
such that v + (−v) = 0. (additive inverse)

133
Definition (Vector space continued)
For all u, v ∈ V and α, β ∈ F:

Scalar multiplication behaves well:


M1 αv ∈ V . (closure under scalar multiplication)
M2 α(βv) = (αβ)v. (associativity of scalar multiplication)
M3 1v = v. (multiplication by unit scalar)
Vector addition and scalar multiplication combine well :
D1 α(u + v) = αu + αv. (distributivity 1)
D2 (α + β)v = αv + βv. (distributivity 2)

Note:
It follows from the axioms that for all v ∈ V :
1. 0v = 0.
2. (−1)v = −v.
134
Examples of vector spaces

Definition
A vector space that has R as the scalars is called a real vector space.

1. Rn
Our definition of vector spaces was based on algebraic properties of Rn ,
so it is easy to check that Rn is a vector space with scalars R.

135
2. Vector space of matrices

Definition
Denote by Mm,n ( or Mm,n (R)) the set of all m × n matrices with real
entries.
Mm,n is a real vector space with the usual matrix addition and scalar
multiplication operations.
Example 1.
Prove axioms A1, A4, A5 and M1 for M2,2 .
Solution:
Define two vectors in V and a scalar in R

136
A1. Closure under vector addition

A4. Existence of zero vector

137
A5. Additive inverse

M1. Closure under scalar multiplication

138
3. Vector space of polynomials

Definition
For a fixed integer n > 0, denote by Pn (or Pn (R)) the set of all
polynomials with degree at most n with real coefficients:

Pn = {a0 + a1 x + a2 x 2 + · · · + an x n | a0 , a1 , . . . , an ∈ R}

Note:
Two polynomials are equal if and only if their coefficients are equal.

Pn is a real vector space with the usual operations of polynomial


addition and scalar multiplication. That is, if
a0 + · · · + an x n ∈ Pn , b0 + · · · + bn x n ∈ Pn and α ∈ R, then

(a0 + · · · + an x n ) + (b0 + · · · + bn x n ) = (a0 + b0 ) + · · · + (an + bn )x n

α(a0 + a1 x + · · · + an x n ) = (αa0 ) + (αa1 )x + · · · + (αan )x n

139
4. Vector space of functions

Definition
Let
F(S, R) = { all functions f : S → R }
denote the set of all functions from S to R, where S is a non-empty set.

If f , g ∈ F(S, R) and α ∈ R, then f + g and αf are the functions


defined by
(f + g )(x) = f (x) + g (x)
(αf )(x) = αf (x)
for all x ∈ S.
Then F(S, R) is a real vector space.

Note:
The zero vector is the function which is identically zero, i.e.
0 : S → R such that 0(x) = 0 for all x ∈ S.
140
Example 2.
If f , g ∈ F(R, R) are given by
f (x) = sin(x) and g (x) = x for all x ∈ R,
then
(f + g )(x) = sin(x) + x and (3f )(x) = 3 sin(x) for all x ∈ R.

10 10

5 5

-5 5 -5 5

-5 -5

-10 -10

Note:
Function spaces are important in analysis of sound and light waves,
quantum mechanics and the study of differential equations.
141
5. Vector space Fn2
The field F2
Definition
The set containing just two elements

F2 = {0, 1},

equipped with the operations of addition and multiplication defined


below, is a field called the integers modulo 2.

Addition Multiplication
0+0 =0 0×0=0
1+0 =1 1×0=0
0+1 =1 0×1=0
1+1 =0 1×1=1

We call the arithmetic in F2 , modular or mod 2 arithmetic.


142
The vector space Fn2

Definition
Denote by Fn2 the set of vectors with integers modulo 2 as scalars:

Fn2 = {(a1 , a2 , . . . , an ) | ai ∈ F2 , i = 1, . . . , n}

For (a1 , a2 , . . . , an ) ∈ Fn2 , (b1 , b2 , . . . , bn ) ∈ Fn2 and α ∈ F2 , we define


addition and scalar multiplication by

(a1 , a2 , . . . , an ) + (b1 , b2 , . . . , bn ) = (a1 + b1 , a2 + b2 , . . . , an + bn )

α(a1 , a2 , . . . , an ) = (αa1 , αa2 , . . . , αan ),


where on the right hand side the addition and scalar multiplication are
as for mod 2 arithmetic.

Then Fn2 is a vector space over F2 .


143
6. Complex vector spaces

Definition
A vector space that has C as the scalars is called a complex vector
space.

Example 3.
C2 = {(a1 , a2 ) | a1 , a2 ∈ C}

For (a1 , a2 ) ∈ C2 , (b1 , b2 ) ∈ C2 and α ∈ C, we define addition and


scalar multiplication by

(a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 + b2 )


α(a1 , a2 ) = (αa1 , αa2 )
Then C2 is a complex vector space.

Note:
The real vector spaces Rn , Mm,n (R), Pn (R) and F(S, R) have complex
analogues denoted by Cn , Mm,n (C), Pn (C) and F(S, C).
144
Example 4.
Consider V = R2 . Let u = (a1 , a2 ) ∈ R2 , v = (b1 , b2 ) ∈ R2 and
α ∈ R. Define addition and scalar multiplication as follows:

(a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 + b2 )


α(a1 , a2 ) = (αa1 , 0)

Show that V is not a vector space with these operations.


Solution:
Find a specific counterexample:

145
Applications of general vector spaces

Regarding matrices and functions as vector spaces allows geometrical


notions such as distance, angles and projection to be extended to these
objects. (See Topic 7.)

Example 5.   
a11 a12 b11 b12
Let A = and B = so that A, B ∈ M2,2 .
a21 a22 b21 b22

The distance between A and B can be defined as


q
(a11 − b11 )2 + (a12 − b12 )2 + (a21 − b21 )2 + (a22 − b22 )2

This is useful in comparing digital photographs since the data is


naturally a matrix (each pixel gives an entry). Image compression using
singular value decomposition is an application of these ideas.

146
Example 6.
The vector space Fn2 over F2 is used in coding theory applications to
reduce errors occurring in data storage and message transmission.

One way of coding data is to use a Hamming code so that errors can
not only be detected but also corrected efficiently.
1. Data is stored in the form of a string of 1s and 0s. These 1s and 0s
are called binary numbers.
2. Using modular 2 arithmetic enables checking and correction of
digital information.

147
4.2 Subspaces
Definition (Subspace)
A subspace W of a vector space V is a subset W ⊆ V that is itself a
vector space using the addition and scalar multiplication operations
defined on V .

The following theorem allows us to avoid checking all 10 vector space


axioms.

Theorem (Subspace theorem)


Let V be a vector space with scalars F. A subset W ⊆ V is a subspace
of V if and only if
0. W contains the zero vector 0 from V .
1. W is closed under vector addition:
u, v ∈ W ⇒ u + v ∈ W .
2. W is closed under scalar multiplication:
u ∈ W , α ∈ F ⇒ αu ∈ W .
148
Example 7.
Show that S = {(x, y , z) ∈ R3 | x + y + z = 0} is a subspace of R3 .

Solution:
Define two vectors in S and a scalar in R

0. S contains the zero vector from R3

149
1. Closure under vector addition

2. Closure under scalar multiplication

150
Some geometric examples
1. The only subspaces of R2 are
• {0}
• Lines through the origin
• R2

2. The only subspaces of R3 are


• {0}
• Lines through the origin
• Planes through the origin
• R3

Example 8.
Is the line {(x, y ) ∈ R2 | y = 2x + 1} a subspace of R2 ?
Solution:

151
Example 9.
Let S ⊂ P2 be the set of polynomials defined by

S = {a1 x + a2 x 2 | a1 , a2 ∈ R}

Is S a subspace of P2 ?

Solution:
Define two vectors in S and a scalar in R

0. S contains the zero vector from P2

152
1. Closure under vector addition

2. Closure under scalar multiplication

153
Definition (Trace)
If A is an n × n matrix, then the trace of A is the sum of the diagonal
entries, namely:
Tr(A) = A11 + A22 + ... + Ann
Example 10.

Let W be the real 2 × 2 matrices whose trace is equal to 0. Show that


W is a subspace of M2,2 .

Solution:
Define two vectors in W and a scalar in R

154
0. W contains the zero vector

1. Closure under vector addition

155
2. Closure under scalar multiplication

156
Example 11.
Let W be the set of real 2 × 2 matrices with determinant equal to 0:
  
a b
W = | a, b, c, d ∈ R and ad − bc = 0
c d
Show that W is not a subspace of M2,2 .

Solution:
Find a specific counterexample

157
4.3 Linear combinations and spans
Linear combinations
Let V be a vector space with scalars F.

Definition (Linear combination)


A linear combination of vectors v1 , v2 , . . . , vk ∈ V is the sum of scalar
multiples of the vectors:
k
X
αi vi = α1 v1 + · · · + αk vk ,
i =1

where each αi ∈ F.

Example
(5, 1) is a linear combination of (1, −1) and (1, 2) since

3(1, −1) + 2(1, 2) = (5, 1)


158
Example 12. Computer graphics application
Linear combinations are used to generate colours in computer graphics.
Three primary colours are represented as unit vectors in R3 .

Red := (1, 0, 0) Green := (0, 1, 0) Blue := (0, 0, 1)

To generate a general colour we form a linear combination

α(1, 0, 0) + β(0, 1, 0) + γ(0, 0, 1)

with α, β, γ chosen between 0 and 1.

Examples

 has α = 0.8, β = 0.47, γ = 0.25


 has α = 0.8, β = 0.12, γ = 0.5

Note:
These primary colours are different from the standard primary colours
red, blue, yellow.
159
Example 13.
Determine whether (1, 2, 3) ∈ R3 is a linear combination of (1, −1, 2)
and (−1, 1, 2).
Solution:

160
161
Example 14.
In P2 , determine whether q(x) = 1 − 2x − x 2 is a linear combination of
p1 (x) = 1 + x + x 2 and p2 (x) = 3 + x 2 .
Solution:

162
163
Spans
We can use linear combinations to produce vector equations for lines
and planes.

Definition (Span of a set of vectors)


Let V be a vector space defined over the field F. The span of vectors
v1 , . . . , vk ∈ V is the set of all linear combinations of these vectors:
span{v1 , . . . , vk } = {α1 v1 + · · · + αk vk | α1 , . . . , αk ∈ F}.

Theorem
Let V be a vector space and v1 , . . . , vk ∈ V . Then span{v1 , . . . , vk } is
a subspace of V .

Proof:
Check the conditions of the subspace theorem.

164
What does the span of a set of vectors look like?
Consider 3 vectors v1 , v2 , v3 in R3 . Then span{v1 , v2 , v3 } could be:
1. {0}
2. A line through the origin
3. A plane through the origin
4. R3
Note:
A span of 4 or more vectors in R3 will also give one of the above sets
because these are the only possible subspaces of R3 .

Example
The span of the vectors u and v is a plane through the origin.
#

$
!

"

165
Example 15.
Let v1 = (1, 1, 1), v2 = (2, 2, 2), v3 = (3, 3, 3). In R3 , what is
span{v1 , v2 , v3 }?
Solution:

166
We can also use the word “span” as a verb.

Definition (Vectors spanning a space)


Let V be a vector space. The vectors v1 , . . . , vk ∈ V span V if

span{v1 , . . . , vk } = V .

Then we also say that {v1 , . . . , vk } is a spanning set for V .

Note:
The vectors v1 , . . . , vk span V if and only if
(1) V contains all the vectors v1 , . . . , vk , and
(2) all vectors in V are linear combinations of the vectors v1 , . . . , vk .

167
Example 16.
Show that the vectors (1, −1), (2, 4) span R2 .
Solution:

168
169
Example 17.
Do the vectors (1, 2, 0), (1, 5, 3), (0, 1, 1) span R3 ?

Solution:

170
171
General method to determine if vectors span Rn
If A = [v1 · · · vk ] where v1 , · · · , vk ∈ Rn are column vectors, then:
1. v1 , · · · , vk span Rn ⇔ the linear system with augmented matrix
[A | w] has a solution for all w ∈ Rn .

2. v1 , . . . , vk span Rn ⇔ rank(A) = n.

Theorem
If k < n, then vectors v1 , . . . , vk ∈ Rn do not span Rn .
Proof:
Since A = [v1 · · · vk ] has k columns, rank(A) 6 k.
Since k < n this implies that rank(A) 6= n.
Hence v1 , . . . , vk do not span Rn .

172
Example 18.
Do the polynomials p1 (x) = 1 + x + x 2 and p2 (x) = x 2 span P2 ?

Solution:

173
4.4 Linear dependence and linear independence

A vector space can have many spanning sets.


For example, R2 is spanned by any of the following sets:

1. {(1, 0), (0, 1)}


2. {(1, 1), (1, 2)}
3. {(1, 1), (1, 2), (3, 5)}
4. {(1, 0), (0, 1), (0, 0), (1, 1), (1, 2), (3, 5)}

To get a small spanning set, we want to remove “redundant” vectors.

This leads to the ideas of linear dependence and linear independence.

174
Let V be a vector space with scalars F.

Definition (Linear dependence)


The vectors v1 , . . . , vk ∈ V are linearly dependent if there exist scalars
α1 , . . . , αk ∈ F, with at least one αi non-zero, such that
k
X
αi vi = α1 v1 + · · · + αk vk = 0.
i =1

Definition (Linear independence)


The vectors v1 , . . . , vk ∈ V are linearly independent if they are not
linearly dependent. Equivalently, if α1 , . . . , αk ∈ F are scalars such that
k
X
αi vi = α1 v1 + · · · + αk vk = 0,
i =1

then the only solution is α1 = 0, α2 = 0, . . . , αk = 0.

175
Note:
Any set of vectors containing the zero vector 0 is linearly dependent.

Theorem
The vectors v1 , . . . , vk (k > 2) are linearly dependent if and only if one
vector is a linear combination of the others.
In particular:
Two vectors are linearly dependent if and only if one is a multiple of the
other.

176
Example 19.
Are the following vectors linearly dependent or linearly independent?
(a) u = (2, −1, 1), v = (−6, 3, −3)
(b) u = (2, −1, 1), v = (4, 0, 2)

Solution:

177
Example 20.
Are the vectors (2, 0, 0), (6, 1, 7), (2, −1, 2) linearly dependent or linearly
independent?

Solution:

178
Note:
The augmented matrix has the original vectors as its columns.

179
General method to determine if vectors in Rn are linearly independent

If A = [v1 · · · vk ] where v1 , · · · , vk ∈ Rn are column vectors, then:


1. v1 , · · · , vk ∈ Rn are linearly independent ⇔ the linear system with
augmented matrix [A | 0] has a unique solution.

2. v1 , . . . , vk ∈ Rn are linearly independent ⇔ rank(A) = k

3. If k = n then

v1 , . . . , vn ∈ Rn are linearly independent ⇔ det(A) 6= 0

180
Theorem
Let v1 , . . . , vk be vectors in Rn . If k > n, then the vectors are linearly
dependent.

Proof:
Let A = [v1 · · · vk ] where vi ∈ Rn .
Then rank(A) 6 n.
Since k > n this implies that rank(A) 6= k.
Hence the vectors are linearly dependent.

Note:
The results and methods on linear combinations and linear dependence
for vectors in Rn also apply to vectors in Fn for any field F, e.g. C, F2 .

181
Example 21.
Determine if the following polynomials

2 + 2x + 5x 2 , 1 + x + x 2 , 1 + 2x + 3x 2

are linearly independent.

Solution:

182
183
Example 22.
Are the following matrices linearly independent or linearly dependent?
     
1 3 −2 1 1 10
, , .
1 1 1 −1 4 2

Solution:

184
185
Linear combinations via reduced row-echelon form
Let A = [v1 · · · vk ] and B = [u1 · · · uk ] be matrices with vectors in
columns.

If A and B are row-equivalent (i.e., A ∼ B), then the columns of A


satisfy the same linear relations as the columns of B:

α1 v1 + · · · + αk vk = 0 ⇔ α1 u1 + · · · + αk uk = 0

If A ∼ B and B is in reduced row-echelon form, then the relations


between the vectors v1 , . . . , vk can be read off from B.

186
Example 23.
Consider the vectors v1 = (1, 2, 3), v2 = (3, 6, 9), v3 = (−1, 0, −2) and
v4 = (1, 4, 4) in R3 .
Given that:
   
1 3 −1 1 1 3 0 2
A = 2 6 0 4 ∼ 0 0 1 1 = B.
3 9 −2 4 0 0 0 0

(a) Are the vectors linearly independent?


(b) Express v2 and v4 as a linear combination of v1 and v3 .
(c) Are the vectors v1 , v3 linearly independent?

Solution:

187
188
4.5 Bases and dimension

Sometimes spanning sets can be too big. To get an efficient description


of a vector space V , we seek a set with the minimum number of vectors
needed to span V .

Definition (Basis)
A basis for a vector space V is a set of vectors in V which:
1. spans V , and
2. is linearly independent.

Note:
A vector space V can have many different bases.
e.g. Any two linearly independent vectors in R2 will give a basis for R2 .

189
Example 24.
Is {(1, −1), (2, 4)} a basis for R2 ?

Solution:

190
Example 25.
Recall that
W = {A ∈ M2,2 | Tr (A) = 0}
is a subspace of M2,2 .
     
1 0 0 1 0 0
Is , , a basis for W ?
0 −1 0 0 1 0

Solution:
1. Are the vectors in W ?

191
2. Do the vectors span W ?

192
3. Are the vectors linearly independent?

193
Theorem (Existence of bases)
Let V be a vector space. Then
1. V has a basis.
2. Any set that spans V contains a basis for V .
3. Any linearly independent set in V can be extended to a basis of V .

Theorem
Let V be a vector space and let {v1 , . . . , vk } be a basis for V .

1. A subset of V with more than k vectors is linearly dependent.


2. A subset of V with fewer than k vectors does not span V .
3. Any two bases of V have the same number of elements, k.

194
This allows us to define the dimension of a vector space.

Definition (Dimension)
The dimension of a vector space V is the number of vectors in any basis
for V . This is denoted by dim(V ).

We call V finite dimensional if it has a basis with a finite number of


elements, and infinite dimensional otherwise.

Note:
If V = {0}, we say that dim(V ) = 0.

195
Standard Bases

◮ The standard basis for Rn is

{(1, 0, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, 0, 0, . . . , 0, 1)} .

These vectors are typically denoted as {e1 , e2 , · · · , en }.


(In R3 the standard basis is also denoted by {i, j, k})

The dimension of Rn is n.

◮ The standard basis for Mm,n is


 
1 0 ... 0 0 1 ... 0 0 0 ... 0 
   

0 0 . . . 0

0 0 . . . 0 0 0 . . . 0

. . .  , . . ..  , ... , ..  .
 ... ...

  .. .. ..   .. .. . . 

 

0 0 ... 0 0 0 ... 0 0 0 ... 1

The dimension of Mm,n is mn.


196
◮ The standard basis for Pn is

1, x, x 2 , . . . , x n .


The dimension of Pn is n + 1.

◮ Let P be the set of all polynomials of all possible degrees:

P = {a0 + a1 x + · · · + an x n | n ∈ N, a0 , a1 , . . . , an ∈ R}

where N denotes the set of natural numbers.


P is an infinite dimensional vector space, since for each integer
k > 1, we can find a linearly independent set {1, . . . , x k } with
more than k elements.
The standard basis for P is

{1, x, x 2 , . . .}.

197
Theorem
Let V be a vector space, and suppose that dim(V ) = n is finite.
1. If a set with exactly n elements spans V , then it is a basis for V .
2. If a linearly independent subset of V has exactly n elements, then
it is a basis for V .

198
Example 26.
     
1 0 0 1 0 0
Is S = , , a basis for M2,2 ?
0 −1 0 0 1 0

Solution:

199
Example 27.
Let B = {2 + 2x + 5x 2 , 1 + x + x 2 , 1 + 2x + 3x 2 }. Is B a basis for P2 ?

Solution:

200
How to find a basis for the span of a set of vectors in Rn

Recall that if A ∼ B, then the columns of A and B satisfy the same


linear relationships.

Method 1: Using columns

To find a basis for V = span{v1 , . . . , vk } where v1 · · · vk ∈ Rn .


1. Form the matrix A with columns v1 , . . . , vk .
2. Reduce A to row-echelon form B.
3. The columns of A corresponding to the leading entries in B give a
basis for V .

Note:
This method gives a basis that is a subset of the original set of vectors.

201
Example 28.
Let S = {v1 , v2 , v3 } = {(1, 3, −1, 1), (2, 6, 0, 4), (3, 9, −2, 4)}.
(a) Find a subset of S that is a basis for span(S).
(b) What is the dimension of span(S)?

Solution:
(a) We write the vectors as columns of a matrix and reduce to row
echelon form.

202
203
How to find a basis for the span of a set of vectors in Rn

Method 2: Using rows

To find a basis for V = span{v1 , . . . , vk } where v1 · · · vk ∈ Rn .


1. Form the matrix A with rows v1 , . . . , vk .
2. Reduce A to row-echelon form B.
3. The non-zero rows of B give a basis for V .

Note:
◮ This method gives a basis that will usually not be a subset of the
original set of vectors.
◮ The method works because elementary row operations do not
change the subspace spanned by the rows, and in the row echelon
form the non-zero rows are linearly independent.

204
Example 29.
Let
S = {(1, 3, −1, 1), (2, 6, 0, 4), (3, 9, −2, 4))} ⊂ R4 .
Find a basis for W = span(S) and dim(W ) using the row method.
Solution:
We write the vectors as rows of a matrix and reduce to row echelon
form. Now
   
1 3 −1 1 1 3 0 2
A = 2 6 0 4 ∼ 0 0 1 1 = B
3 9 −2 4 0 0 0 0

205
Example 30.
(a) Show that
W = {(a + b, 0, a − b, 0) | a, b ∈ R}
is a subspace of R4 and find a basis B for W .
(b) Extend B to a basis for R4 .

Solution:

206
207
208
4.6 Solution space, column space, row space

Solution space of a matrix

Theorem (Solution Space)


Let A be a real m × n matrix. Then the solution set

S = {x ∈ Rn |Ax = 0}

is a subspace of Rn , with x regarded as a column vector.


This is called the solution space or nullspace of A, and its dimension is
called the nullity of A, denoted by nullity(A).

Proof:
Check the conditions of the subspace theorem.

209
Example 31.
Find a basis for the solution space of
x1 + 2x2 + x3 + x4 = 0
3x1 + 6x2 + 4x3 + x4 = 0
What is the dimension of the solution space?
Solution:

210
211
Note:
The nullity(A) is the number of columns in the row-echelon form of A
not containing a leading entry.

212
Column space and row space of a matrix

Definition (Column space)


Let A be a real m × n matrix. The subspace of Rm spanned by the
columns of A is called the column space of A.

Definition (Row Space)


Let A be a real m × n matrix. The subspace of Rn spanned by the rows
of A is called the row space of A.

Note:
◮ The column space and row space are subspaces, since the span of
any set of vectors is a subspace.
◮ To find a basis for the column space use the column method.
◮ To find a basis for the row space use the row method.

213
Example 32.
Given that
   
1 −1 2 −2 1 −1 2 −2
2 0 1 0 ∼ 0 2 −3 4  = B,
 
A=
5 −3 7 −6 0 0 0 1
1 1 −1 3 0 0 0 0

find a basis for the


(a) column space of A,
(b) row space of A.
What are the dimensions of the row space and the column space?
Solution:

214
215
Effect of row operations
Assume that A ∼ B by a sequence of elementary row operations. Then
◮ solution space of A = solution space of B,
◮ row space of A = row space of B,
◮ column space of A 6= column space of B in general.

Theorem
For any m × n matrix A:

rank(A) = dim(row space of A) = dim(column space of A).

This is because in both cases the number of leading entries in


row-echelon form determines the number of basis vectors.

216
Rank-nullity theorem
Theorem
Suppose A is an m × n matrix. Then

rank(A) + nullity(A) = n = number of columns in A.

Proof:
Suppose A has row echelon form B.
The rank of A equals the number of columns in B containing a leading
entry.
The nullity of A equals the number of columns in B which do not
contain a leading entry.
Therefore, rank(A) + nullity(A) is equal to the number of columns in B,
which is equal to the number of columns in A.

217
Other fields of scalars

Our work on vector spaces applies using any field of scalars F instead of
the real numbers R. For example, we could use F = C or F = F2 .
Then
Fn = {(x1 , . . . , xn ) | x1 , . . . , xn ∈ F}
is vector space using F as scalars.

Given any m × n matrix A ∈ Mm,n (F) with entries in F, we have:


◮ the solution space of A is a subspace of Fn
◮ the column space of A is a subspace of Fm
◮ the row space of A is a subspace of Fn

The previous theorems also apply in this context.

218
Example 33.
Consider the matrix
 
1 1 0
A= ∈ M2,3 (F2 ).
1 0 1
Determine the column space, row space and solution space for A.
Note:
Since the field is F2 all matrix entries are in {0, 1} and the operations of
vector addition and scalar multiplication are done using mod 2
arithmetic.

Solution:
Reduce A to reduced row-echelon form using mod 2 arithmetic.

219
Column space of A

220
Row space of A

221
Solution space of A
We need to solve the following system using mod 2 arithmetic
 
  x1  
1 1 0   0
x2 = .
1 0 1 0
x3

222
Note:
The rank-nullity theorem holds as rank(A) = 2, nullity(A) = 1 and A
has 3 columns.

223
4.7 Coordinates relative to a basis
Theorem
If {v1 , . . . , vn } is a basis for a vector space V , then every vector v ∈ V
can be written uniquely in the form

v = α1 v1 + . . . + αn vn ,

where α1 , . . . , αn ∈ F.
Proof:
Since {v1 , . . . , vn } is a basis, it spans V and so for some scalars
α1 , . . . , αn ∈ F
v = α1 v1 + . . . + αn vn (1)
Are these scalars unique? Suppose
v = β1 v1 + . . . + βn vn (2)
for some other scalars β1 , . . . , βn ∈ F.
224
Taking (2) − (1) gives

0 = (β1 − α1 )v1 + . . . + (βn − αn )vn .

Since {v1 , . . . , vn } is a basis, v1 , . . . , vn are linearly independent, and


the only solution is

β1 − α1 = 0, . . . , βn − αn = 0
⇒ α1 = β1 , . . . , αn = βn
showing the uniqueness of the scalars α1 , . . . , αn .

225
Definition (Coordinates)
Suppose B = {v1 , . . . , vn } is an ordered basis for a vector space V ,
i.e. a basis arranged in the order v1 first, v2 second and so on.
For v ∈ V we can write

v = α1 v1 + · · · + αn vn for some scalars α1 , . . . , αn ∈ F.

The scalars α1 , . . . , αn are uniquely determined by v and are called the


coordinates of v relative to B.
Then  
α1
[v]B =  ... 
 

αn
is called the coordinate vector of v with respect to B.

226
Example 34.
Consider the vector v = (1, 5) ∈ R2 . Determine the coordinate vector of
v with respect to the ordered bases:
(a) B = {(1, 0), (0, 1)}
(b) B ′ = {a, b} = {(2, 1), (−1, 1)}

Solution:

227
y

b 5

3b 4 a

3 3

2 2
v 1 1
5j x
1

-1 -1

-2 -2
2a

The vector (1, 5) shown on a grid formed by a and b.


228
Example 35.
Find the coordinate vector of p(x) = 2 + 7x − 9x 2 with respect to the
ordered basis for P2 given by
1
B = {2, x, −3x 2 }.
2

Solution:

229
Having fixed a finite ordered basis B for a vector space V , there is a
one-to-one correspondence between vectors and their coordinate
vectors.

Theorem
Let V be a finite dimensional vector space over a field of scalars F. Let
B be an ordered basis of V .
If u, v ∈ V and α ∈ F is a scalar, then

[u + v]B = [u]B + [v]B


[αv]B = α[v]B .

This theorem tells us that taking coordinates is a linear transformation


between the vector spaces V and Fn .

230
Topic 5: Linear Transformations [AR 4.9-4.11, 8.1-8.5]

5.1 General linear transformations


5.2 Geometric linear transformations from R2 to R2
5.3 Matrix representations for general linear transformations
5.4 Image, kernel, rank and nullity
5.5 Invertible linear transformations
5.6 Change of basis

231
5.1 General linear transformations
We now consider functions or mappings between general vector spaces
that preserve the vector space structure of vector addition and scalar
multiplication.

Definition (Linear transformation)


Let V and W be vector spaces over the same field of scalars F.
A linear transformation from V to W is a function T : V → W such
that for all u, v ∈ V and all α ∈ F:
1. T (u + v) = T (u) + T (v) (T preserves vector addition)

2. T (αu) = αT (u) (T preserves scalar multiplication)

Note:
Taking α = 0 in condition 2 shows that
T (0V ) = 0W
i.e. T maps the zero vector in V to the zero vector in W .
232
Example 1.
Show that the cross product T : R3 → R3 given by

T (u) = u × (1, 1, −1)


is a linear transformation.

Solution:
Define two vectors in the domain and a scalar

1. T preserves vector addition

233
2. T preserves scalar multiplication

234
Example 2.
Show that the function D : P3 → P2 defined by differentiation with
respect to x,
D(a0 + a1 x + a2 x 2 + a3 x 3 ) = a1 + 2a2 x + 3a3 x 2
is a linear transformation.
Solution:
Define two vectors in the domain and a scalar

1. D preserves vector addition

235
2. D preserves scalar multiplication

236
Example 3.
Show that the determinant T : M2,2 → R given by
 
a b a b
T = = ad − bc
c d c d
is not a linear transformation.
Solution:
Find a specific counterexample
Method 1 - vector addition

237
Method 2 - scalar multiplication

238
Example 4.
Prove that the function T : R3 → R2 given by
T (x1 , x2 , x3 ) = (x2 − 2x3 , 3x1 + x3 )
is a linear transformation.

Solution:
Define two vectors in the domain and a scalar

1. T preserves vector addition

239
2. T preserves scalar multiplication

240
Note:
◮ We can write T in matrix form using coordinate vectors:
 
    x1
x2 − 2x3 0 1 −2  
[T (x1 , x2 , x3 )] = = x2
3x1 + x3 3 0 1
x3

The matrix 
0 1 −2
[T ] =
3 0 1
is called the standard matrix representation for T .

◮ All linear transformations from Rn to Rm can be represented by


matrices.

241
Theorem
Let S = {e1 , . . . , en } and S ′ = {e1 , . . . , em } be the ordered standard
bases for Rn and Rm , and let v ∈ Rn .
Every linear transformation T : Rn → Rm has a standard matrix
representation [T ] specified by the m × n matrix

[T ] = [ [T (e1 )] [T (e2 )] · · · [T (en )] ]

where [T (ei )] denotes the coordinate vector of T applied to ei . Then

[T (v)] = [T ] [v] for all v ∈ Rn ,

where [v] is the coordinate vector for v with respect to S, and


[T (v)] is the coordinate vector for T (v) with respect to S ′

242
Proof:
Let v ∈ Rn and write
v = α1 e1 + α2 e2 + · · · + αn en .
The coordinate vector of v with respect to the standard basis is
 
α1
 .. 
[v] =  . 
αn
Since T is a linear transformation
[T (v)] = [T (α1 e1 + α2 e2 + · · · + αn en )]
= [T (α1 e1 )] + [T (α2 e2 )] + · · · + [T (αn en )]
= α1 [T (e1 )] + α2 [T (e2 )] + · · · + αn [T (en )]
 
α1
 .. 
= [ [T (e1 )] [T (e2 )] · · · [T (en )] ]  . 
αn
= [T ] [v]
243
5.2 Geometric linear transformations from R2 to R2
Many familiar geometric transformations are linear transformations
T : R2 → R2 .

Two geometric properties of linear transformations are:


1. T maps every line in R2 to a line (or a point) in R2 .
Proof:
A line has a vector equation
{v0 + tv | t ∈ R}.
Then
T (v0 + tv) = T (v0 ) + T (tv)
= T (v0 ) + tT (v)
This is the line through T (v0 ) in the direction T (v), or a point if
T (v) = 0.
244
2. T maps every parallelogram in R2 with vertices

0, v, w, v + w

to a (possibly degenerate) parallelogram with vertices

0, T (v), T (w), T (v + w).

v +w
w
T (w )

T (v + w ) v

0
T (v )

245
Example 5.
Determine the standard matrix representation in R2 for reflection across
the y -axis.
Z

Y y xZ
Y

Solution:

246
Example 6.
Determine the standard matrix representation in R2 for reflection in the
line y = 5x.

Solution:

247
248
249
Example 7.
Determine the standard matrix representation in R2 for rotation around
the origin anticlockwise by an angle of θ.

T (e1 )
e2
T (e2 )
θ
x
e1

250
Solution:
Consider the effect of the transformation on the standard basis vectors.

251
Example 8.
Determine the standard matrix representation in R2 for the
compression/expansion along the x-axis by a factor c > 0.
y y

1 1

1 x c x

Solution:
Consider the effect of the transformation on the standard basis vectors.

252
Example 9.
Determine the standard matrix representation in R2 for the shear by a
factor c along the x-axis.
y y

1 1

1 x c 1 1+c x

Solution:
Consider the effect of the transformation on the standard basis vectors.

253
Composition of linear transformations
Recall:
If S : U → V and R : V → W are functions, then the composition

R ◦S :U →W

is defined by doing S followed by R:

(R ◦ S)(u) = R(S(u)) for all u ∈ U.

Theorem
Suppose R : Rk → Rm and S : Rn → Rk are linear transformations with
standard matrices [R] and [S] respectively.

Then the composition R ◦ S : Rn → Rm is a linear transformation with


standard matrix
[R ◦ S] = [R] [S].

254
Example 10.
Find the image of (x, y ) after a shear along the x-axis with c = 1
followed by a reflection across the y -axis.

Solution:
From examples 5 and 9

255
5.3 Matrix representations for general linear
transformations

Let U, V be finite-dimensional vector spaces with the same scalars F.

Suppose that
◮ T : U → V is a linear transformation.
◮ B = {b1 , . . . , bn } is an ordered basis for U.
◮ C = {c1 , . . . , cm } is an ordered basis for V .

Then for any u ∈ U, the coordinate vector for the image of u under T
with respect to basis C , is given by

[T (u)]C = [T ]C,B [u]B

where [T ]C,B is a suitable m × n matrix.

256
Theorem
There exists a unique matrix satisfying

[T (u)]C = [T ]C,B [u]B for all u ∈ U

which is given by
h i
[T ]C,B = [T (b1 )]C [T (b2 )]C ··· [T (bn )]C .

Note:
◮ The matrix [T ]C,B is called the matrix of T with respect to the
bases B and C.
◮ For the case U = V and B = C, we write often [T ]B instead of
[T ]B,B .

257
Example 11.
Consider the linear transformation T : M2,2 → M2,2 where T is defined
by
T (Q) = Q T .
Find the matrix representation of T with respect to the ordered basis
       
1 0 0 1 0 0 0 0
B= , , , .
0 0 0 0 1 0 0 1

Solution:
Find the image of each basis vector in B under T , and express it as a
coordinate vector using the basis B.

258
259
Example 12.
Find [T ]C,B for the linear transformation T : P2 → P1 given by

T (a0 + a1 x + a2 x 2 ) = (a0 + a2 ) + a0 x

using the ordered basis B = {1, x, x 2 } for P2 and the ordered basis

(a) C = {1, x} for P1 .


(b) C = {2, 3x} for P1 .

Solution:
1. Find the image of each basis vector in B under T .

260
2. Express each image as a coordinate vector with respect to basis C.

261
262
Example 13.
A linear transformation T : R3 → R2 has matrix
 
5 1 0
[T ] =
1 5 −2
with respect to the ordered standard bases of R3 and R2 .
What is the transformation matrix with respect to the ordered bases
B = {(1, 1, 0), (1, −1, 0), (1, −1, −2)} of R3
and
C = {(1, 1), (1, −1)} of R2 ?

Solution:
1. Find the coordinates of the image of each vector in B with respect to
the standard basis.

263
2. Express each image as a coordinate vector with respect to basis C.

264
265
5.4 Image, kernel, rank and nullity

Definition (Kernel and Image)


Let T : U → V be a linear transformation.
The kernel (or nullspace) of T is defined to be

ker(T ) = {u ∈ U | T (u) = 0}.

The image (or range) of T is defined to be

Im(T ) = {v ∈ V | v = T (u) for some u ∈ U}.

Note:
◮ The kernel is a subspace of U. Its dimension is denoted by
nullity(T ).
◮ The image is a subspace of V . Its dimension is denoted by
rank(T ).
266
Example 14.
Find the kernel and image of the linear transformation D : P3 → P2
given by differentiation with respect to x:
D(a0 + a1 x + a2 x 2 + a3 x 3 ) = a1 + 2a2 x + 3a3 x 2 .

Solution:

267
Image and kernel from matrix representations

Given a linear transformation T : U → V and choices of ordered bases


B for U and C for V , we can represent T by the matrix

[T ] = [T ]C,B .

◮ To find ker(T ) we solve equations of the form [T ][x] = [0], so the


solution space for [T ] gives the coordinate vectors of the elements
of ker(T ) with respect to the basis B.

◮ If B is a basis for U then T (B) spans T (U). The columns of [T ]


give the C-coordinates of the elements of T (B), so the column
space of [T ] gives the coordinate vectors of the elements of Im(T )
with respect to the basis C.

◮ Hence, nullity(T ) = nullity([T ]) and rank(T ) = rank([T ]).

268
Using the rank-nullity theorem for matrices gives the following:

Theorem (Rank-Nullity Theorem)


For a linear transformation T : U → V with dim(U) = n,

rank(T ) + nullity(T ) = n.

269
Example 15.
Consider the linear transformation T : R3 → R2 defined by
T (x, y , z) = (2x − y , y + z).
Find bases for Im(T ) and ker(T ). Verify the rank-nullity theorem.
Solution:
1. Image of T

270
2. Kernel of T

271
3. Verify rank-nullity theorem

272
Example 16.
Consider the linear transformation T : P2 → P1 defined by
T (a0 + a1 x + a2 x 2 ) = (a0 − a1 + a2 )(1 + 2x)
Find bases for Im(T ) and ker(T ).

Solution:
1. Image of T

273
2. Kernel of T

274
275
5.5 Invertible linear transformations

Definition (Injective, surjective)


A function T : U → V is
1. injective or one-to-one if T (x) = T (y) ⇒ x = y.
2. surjective or onto if T (U) = V .

For linear transformations, injectivity simplifies to the following:


Theorem
A linear transformation T : U → V is injective if and only if
ker(T ) = {0}.

276
Example 17.
Consider the linear transformation T : P2 → P1 defined by

T (a0 + a1 x + a2 x 2 ) = (a0 − a1 + a2 )(1 + 2x).

(a) Is T injective?
(b) Is T surjective?

Solution:
From example 16:

277
Definition (Invertible)
A function T : U → V is invertible if there exists a function S : V → U
such that
◮ (S ◦ T )(u) = u for all u ∈ U, and
◮ (T ◦ S)(v) = v for all v ∈ V .
Then S is called the inverse of T and is denoted by T −1 .

278
Theorem
Let T : U → V be a linear transformation.
1. T is invertible if and only if it is both injective and surjective.
2. If T is invertible, then its inverse T −1 is also a linear
transformation.
3. Assume that U and V have finite ordered bases B and C,
respectively. Then T is invertible if and only if [T ]C,B is invertible,
and
[T −1 ]B,C = ([T ]C,B )−1 .
This requires dim(U) = dim(V ).

Note:
An invertible linear transformation T is also called an isomorphism.

279
Example 18.
Is the rotation T : R2 → R2 around the origin anticlockwise by an angle
θ, with standard matrix
 
cos θ − sin θ
[T ] = ,
sin θ cos θ

an invertible linear transformation? If it is, find [T −1 ].


Solution:

280
Example 19.
Consider the linear transformation T : R2 → R2 for orthogonal
projection onto the x-axis, with standard matrix
 
1 0
[T ] = .
0 0
y y

1 1

1 x 1 x

(a) Is T surjective?
(b) Is T injective?
(c) Is T invertible?
Solution:

281
282
5.6 Change of basis
Transition Matrices
Let B = {b1 , . . . , bn } and C = {c1 , . . . , cn } be ordered bases for the
same vector space V with field of scalars F.
Given v ∈ V , how are [v]B and [v]C related?

Theorem
There exists a unique matrix PC,B such that for any vector v ∈ V ,

[v]C = PC,B [v]B .

The matrix PC,B is given by

PC,B = [ [b1 ]C · · · [bn ]C ]

and is called the transition matrix from B to C.


283
Proof:
We want to find a matrix PC,B such that for all vectors v ∈ V ,

[v]C = PC,B [v]B .

If T : V → V is a linear transformation, then

[T (v)]C = [T ]C,B [v]B

where  
[T ]C,B = [T (b1 )]C [T (b2 )]C · · · [T (bn )]C
is the matrix representation of T with respect to bases B and C.

284
For the special case of the identity function

T (v) = v for all v ∈ V ,

this gives  
[T ]C,B = [b1 ]C [b2 ]C . . . [bn ]C
and
[v]C = [T ]C,B [v]B .
So
PC,B = [T ]C,B .

Note:
The transition matrix PC,B is invertible since T is invertible, and

(PC,B )−1 = PB,C .

285
Example 20.
Consider the following ordered bases for R2

B = {(1, 1), (−1, 1)} and S = {(1, 0), (0, 1)} .

(a) Determine the transition matrix from B to S.


 
1
(b) If [v]B = , compute [v]S .
1
Solution:

286
Example 21.
Consider the following ordered bases for R2

B = {(1, 1), (−1, 1)} and S = {(1, 0), (0, 1)} .

(a) Determine the transition matrix from S to B.


 
0
(b) If [v]S = , compute [v]B .
2
Solution:

287
Calculating a general transition matrix from basis B to basis C

We have
[v]S = PS,B [v]B and [v]C = PC,S [v]S .
Combining these, we get

[v]C = PC,S [v]S = PC,S PS,B [v]B .

Since the transition matrix is unique then

PC,B = PC,S PS,B = (PS,C )−1 PS,B .

288
Example 22.
Consider the following ordered bases for R2

B = {(1, 2), (1, 1)} and C = {(−3, 4), (1, −1)}.

Find the transition matrix from B to C.

Solution:

289
Note:
Writing vectors in B as linear combinations of vectors in C gives:

(1, 2) = 3(−3, 4) + 10(1, −1)


(1, 1) = 2(−3, 4) + 7(1, −1)

where the coefficients correspond to the coordinate vectors in the


columns of PC,B .
290
How are [T ]C and [T ]B related?

Theorem
The matrix representations of T : V → V with respect to two ordered
bases C and B for V are related by the following equation:

[T ]B = PB,C [T ]C PC,B .

Proof:
We need to show that for all v ∈ V ,

[T (v)]B = PB,C [T ]C PC,B [v]B

Starting with the right-hand side we obtain

PB,C [T ]C PC,B [v]B = PB,C [T ]C [v]C (property of PC,B )


= PB,C [T (v)]C (property of [T ]C )
= [T (v)]B (property of PB,C )
291
Example 23.
Determine the standard matrix representation in R2 for reflection in the
line y = 5x

using a change of basis from the ordered standard basis


S = {(1, 0), (0, 1)} to the ordered basis B = {u, v}, where
◮ u = (1, 5) is a vector in the direction of y = 5x
◮ v = (5, −1) is a vector perpendicular to y = 5x

292
Solution:
1. Find image of the vectors in B and write in terms of B to give [T ]B .

293
2. Find the transition matrices PS,B and PB,S .

294
3. Find the standard matrix representation of T .

295
Example 24.
Consider the linear transformation T : R2 → R2 given by

T (x, y ) = (3x − y , −x + 3y ).

Let B = {(1, 1), (−1, 1)} be a basis for R2 .


(a) Find the standard matrix representation of T .
(b) Find the matrix representation of T with respect to the basis B.
 
v
(c) Find [T (v)]B if [v]B = 1 . Hence interpret T geometrically.
v2
Solution:

296
297
v2 v1

b1
b2

298
Topic 6: Eigenvalues and Eigenvectors [AR Chap 5]

6.1 Definition of eigenvalues and eigenvectors


6.2 Finding eigenvalues
6.3 Finding eigenvectors
6.4 Diagonalisation
6.5 Matrix powers

299
6.1 Definition of eigenvalues and eigenvectors

We wish to identify subspaces W that are invariant under a linear


transformation T : V → V , i.e. with T (W ) ⊆ W .
Example 1.
Let T : R2 → R2 be defined by a reflection in the line y = 5x.
(a) Identify two lines through the origin that are invariant under T .
(b) Find the image of the direction vectors for each line in (a).

300
Solution:

301
Algebraically, we seek non-zero vectors v such that

T (v) = λv

for some scalar λ. Then the 1-dimensional subspace W = span{v} is


invariant under T .

Geometrically,
◮ If λ = 0, then T maps v to 0.
◮ If λ is real and positive, then T rescales v by a factor λ.
◮ If λ is real and negative, then T rescales v by a factor λ and also
reverses the direction of v.

302
Let V be a vector space over a field of scalars F.

Definition (Eigenvalues and eigenvectors of transformations)


Let T : V → V be a linear transformation.
A scalar λ ∈ F is an eigenvalue of T if there is a non-zero vector v ∈ V
such that
T (v) = λv.
Then v is called an eigenvector of T with corresponding eigenvalue λ.

For each eigenvalue λ, the set

{v ∈ V | T (v) = λv}.

is a subspace of V , called the eigenspace of λ.

Note:
Each non-zero vector in this eigenspace is an eigenvector of T with
eigenvalue λ.
303
If we represent T by its matrix [T ]B with respect to an ordered basis B
for V , then we can find eigenvalues and eigenvectors by solving

[T ]B [v]B = λ[v]B , where v 6= 0.

This leads to the following definition.

Definition (Eigenvalues and eigenvectors of matrices)


Let A be an n × n matrix with real or complex entries. A scalar λ ∈ C is
an eigenvalue of A if there is a non-zero column vector v ∈ Cn such that

Av = λv.

Then v is called an eigenvector of A, with corresponding eigenvalue λ.

Note:
For a real matrix, eigenvalues and eigenvectors can be real or complex.
304
6.2 Finding eigenvalues

Let A be an n × n matrix and v be a column vector in Rn or Cn .


If I denotes the n × n identity matrix, then the equation

Av = λv

can be rewritten in the form

Av − λv = 0
⇒ Av − λI v = 0
⇒ (A − λI )v = 0

The values of λ for which this equation has non-zero solutions are the
eigenvalues.

305
Theorem
The homogeneous linear system

(A − λI )v = 0

has a non-zero solution if and only if

det(A − λI ) = 0.

The eigenvalues of A are the values of λ which satisfy the characteristic


equation
det(A − λI ) = 0.

306
Example 2.  
1 4
Find the eigenvalues of A = .
1 1
Solution:
Solve det(A − λI ) = 0

307
Definition (Characteristic polynomial)
Let A be an n × n matrix. The determinant det(A − λI ) is a polynomial
in λ of degree n called the characteristic polynomial of A.

Definition (Multiplicity of eigenvalues)


The algebraic multiplicity of an eigenvalue λ0 is the power of λ − λ0 in
the characteristic polynomial.

The geometric multiplicity of an eigenvalue is the number of linearly


independent eigenvectors associated with it.

By the fundamental theorem of algebra, we can always find at least one


complex eigenvalue for any real or complex n × n matrix. Further, the
sum of the algebraic multiplicities must equal n.
308
Example 3.  
2 −3 6
Find the eigenvalues of A = 0 5 −6.
0 1 0
What is the algebraic multiplicity of each eigenvalue?
Solution:
Solve det(A − λI ) = 0

309
310
Useful properties of eigenvalues

The eigenvalues λi (i = 1, · · · , n) of an n × n matrix A satisfy the


following properties:
n
X
1. Tr(A) = λi = sum of the eigenvalues
i =1
Yn
2. det(A) = λi = product of the eigenvalues
i =1

These properties are useful for checking that you have calculated the
eigenvalues correctly.

311
Example 4.
Check these properties for the eigenvalues λ = 2, 2, 3 of the matrix
 
2 −3 6
A = 0 5 −6 .
0 1 0

Solution:

312
Example 5.  
0 1
Consider the matrix A = .
−1 0
(a) Find the eigenvalues of A.
(b) Interpret the matrix and eigenvalues geometrically.

Solution:

313
Note:
The matrix A also defines a linear transformation T : C2 → C2 where

T (v) = Av

for v ∈ C2 .
Then T has complex eigenvalues.

314
Eigenvalues of matrices versus eigenvalues of linear transformations

Let A be an n × n matrix with complex entries. Then the eigenvalues of


A are the eigenvalues of the linear transformation T : Cn → Cn with
standard matrix A. For each eigenvalue λ,

{v ∈ Cn | Av = λv}

is a subspace of Cn , called the (complex) eigenspace of λ.

Let A be an n × n matrix with real entries. Then the real eigenvalues of


A are the eigenvalues of the linear transformation T : Rn → Rn with
standard matrix A. For each real eigenvalue λ,

{v ∈ Rn : Av = λv}

is a subspace of Rn called the (real) eigenspace of λ.

315
6.3 Finding eigenvectors

To find the eigenvectors of a matrix


Let A be an n × n matrix and I be the n × n identity matrix.
◮ For each eigenvalue λ, solve the linear system

(A − λI )v = 0.

The solution space is the eigenspace of λ.


◮ You always obtain at least one row of zeros in row-echelon form as
rank(A − λI ) < n. So there are non-zero solutions.
◮ Choose a basis for the eigenspace. The vectors in the basis are
linearly independent eigenvectors with eigenvalue λ.
Note:
The eigenvectors of a matrix are not unique. An eigenvector v can be
multiplied by any non-zero scalar and it will still be an eigenvector.

316
Example 6.  
1 4
Find the eigenvectors of the matrix A = .
1 1
Solution:
From example 2, the eigenvalues are λ = −1, 3.
For each eigenvalue λ, solve (A − λI )v = 0

317
318
319
Example 7.  
2 −3 6
Find the eigenvectors of the matrix A = 0 5 −6.
0 1 0
What is the geometric multiplicity of each eigenvalue?
Solution:
From example 3, the eigenvalues are λ = 2, 3.
For each eigenvalue λ, solve (A − λI )v = 0

320
321
322
Note:
The geometric multiplicity is always less than or equal to the algebraic
multiplicity.
323
Example 8.  
0 1
Find the eigenvectors of the matrix A = .
−1 0
Solution:
From example 5, the eigenvalues are λ = ±i .
For each eigenvalue λ, solve (A − λI )v = 0

324
325
Note:
◮ The eigenvalues are complex conjugates.
◮ The eigenvectors are complex conjugates.
In general, the non-real eigenvalues and eigenvectors of a real matrix
occur in complex conjugate pairs.
326
6.4 Diagonalisation

Definition
A linear transformation T : V → V is diagonalisable if there is a basis
B = {b1 , . . . , bn } for V such that the matrix [T ]B is diagonal.

Definition
A complex n × n matrix A is diagonalisable (or diagonalisable over C) if
the linear transformation

T : Cn → Cn withT (v) = Av

is diagonalisable.
A real n × n matrix A is diagonalisable over R if the linear
transformation
T : Rn → Rn withT (v) = Av
is diagonalisable.
327
Assume V is a finite dimensional vector space over a field F = R or C.
Theorem
A linear transformation T : V → V is diagonalisable if and only if there
is a basis B for V consisting of eigenvectors of T .
A matrix A ∈ Mn,n (F) is diagonalisable over F if and only if it has n
linearly independent eigenvectors in Fn .

Proof:
Let B = {b1 , b2 , · · · , bn } be a basis for V .The matrix [T ]B is diagonal
if and only if each of the basis vectors bi satisfies

T (bi ) = λi bi ,

where λi is the corresponding diagonal entry of the matrix [T ]B .

328
How to diagonalise a matrix

Theorem
An n × n matrix A is diagonalisable if and only if there is an n × n
invertible matrix P and an n × n diagonal matrix D such that

P −1 AP = D

or, equivalently,
A = PDP −1 .
The matrix P is said to diagonalise A.
If v1 , . . . , vn are linearly independent eigenvectors of A with
corresponding eigenvalues λ1 , . . . , λn , then we can take
 
P = v1 · · · vn and D = diag(λ1 , λ2 , . . . , λn ).

Note:
If A is diagonalisable over F then P and D must have entries in F.
329
Example 9. 
1 4
Let A = . Suppose T : R2 → R2 is such that [T ] = A.
1 1
(a) Write A in the form PDP −1 . Check your answer.
(b) Give a geometric description of T .

Solution:
(a) From example 6:  
−2
An eigenvector corresponding to λ = −1 is v1 = .
1
 
2
An eigenvector corresponding to λ = 3 is v2 = .
1

330
Check A = PDP −1 or AP = PD by multiplying the matrices.

331
Theorem
Eigenvectors v1 , . . . , vk corresponding to distinct eigenvalues λ1 , . . . , λk
are linearly independent.
Proof:
If k = 2.
Let v1 , v2 be eigenvectors of A corresponding to λ1 and λ2 respectively,
such that λ1 = 6 λ2 . Then

Av1 = λ1 v1 and Av2 = λ2 v2 .


Suppose that
α1 v1 + α2 v2 = 0 (1)
Then
A(α1 v1 + α2 v2 ) = A0 = 0
⇒ α1 Av1 + α2 Av2 = 0
⇒ α1 λ1 v1 + α2 λ2 v2 = 0 (2)

332
Equation (2) − λ1 (1) gives

α2 (λ2 − λ1 )v2 = 0

Since λ1 6= λ2 and v2 6= 0, then α2 = 0.

Since v1 6= 0 and α2 = 0, then (1) gives α1 = 0.

Since the only solution of (1) is

α1 = α2 = 0

then v1 and v2 are linearly independent eigenvectors.

Note:
Similar arguments can be used for the cases k > 2.
333
Theorem
If an n × n matrix A has n distinct eigenvalues, then A has n linearly
independent eigenvectors. Hence, A will be diagonalisable.

Note:
If A has one or more repeated eigenvalues, then A may or may not be
diagonalisable depending on the number of linearly independent
eigenvectors associated with each eigenvalue.
◮ If the algebraic and geometric multiplicities for each eigenvalue are
equal, A will be diagonalisable. In this case, the dimensions of the
eigenspaces add up to n.
◮ If the geometric multiplicity is less than the algebraic multiplicity
for some eigenvalue, A will not be diagonalisable.

334
Example  10. 
2 −3 6
Is A = 0 5 −6 diagonalisable?
0 1 0
If it is, specify the matrices P and D such that A = PDP −1 .
Solution:
From example 7:
Two linearly independent eigenvectors corresponding to λ = 2 are
   
1 0
v1 = 0  and v2 = 2 

0 1
An eigenvector corresponding to λ = 3 is
 
−3
v3 =  3 
1
335
336
Example 11. 
1 2
Show that the matrix A = is not diagonalisable.
0 1
Solution:
Find the eigenvalues of A

Find the eigenvectors of A

337
Note:
The eigenvalue λ = 1 has algebraic multiplicity 2 but geometric
multiplicity of 1, so we cannot diagonalise A.
338
Example  12. 
0 1
Is A = diagonalisable?
−1 0
If it is, specify the matrices P and D such that A = PDP −1 .

Solution:
From example 8:
 
−i
An eigenvector corresponding to λ = i is v1 =
1
 
i
An eigenvector corresponding to λ = −i is v2 =
1

339
6.5 Matrix powers

In applications, one often needs to apply a transformation many times.


If the transformation can be represented by a diagonal matrix D, it is
easy to compute D k and thus the action of the k-th application of the
transformation.

Example 13.
 
−4 0 0
If D =  0 3 0, write down D 10 .
0 0 2
Solution:

340
Theorem
Suppose A is diagonalisable, so that

A = PDP −1

for some invertible matrix P and a diagonal matrix D.


Then for all integers k > 1, we have

Ak = PD k P −1 .

341
Example 14.  
1 4
From example 9, A = can be written in the form A = PDP −1
1 1
where
     
−1 0 −2 2 −1 1 −1 2
D= , P= and P = .
0 3 1 1 4 1 2
Calculate Ak , for all integers k > 1.
Solution:

342
Example 15. An application from genetics

A snapdragon plant has either red, pink or white flowers, depending on


its genotype (with alleles RR, RW or WW respectively). We want to
cross pink plants with red, pink and white plants. On average,

◮ crossing pink plants with red plants produces:


1 1
2 red plants and 2 pink plants as offspring.
◮ crossing pink plants with pink plants produces:
1 1 1
4 red plants, 2 pink plants and 4 white plants as offspring.
◮ crossing pink plants with white plants produces:
1 1
2 pink plants and 2 white plants as offspring.

This is an example of incomplete dominance in genetics.

If you continue crossing with only pink plants, what fractions of the
three types of flowers would you eventually expect to see in your garden?

343
Solution:
Let  
rn
xn =  pn 
wn
where rn , pn , wn are the fractions of red, pink and white flowers after n
generations, for n > 0. Note that these satisfy

rn + pn + wn = 1.

Now for each n > 0 we have

344
This gives
xn+1 = T xn ,
where T is the transition matrix

Note that T has non-negative entries and the sum of each column is 1.

Then
x1 = T x0 , x2 = T x1 = T (T x0 ) = T 2 x0 ,
and in general
xn = T n x0
gives the distribution after n generations for n > 1.

345
Now T has eigenvalues
1
λ = 1, , 0
2
with corresponding eigenvectors
     
1 −1 1
v1 = 2 , v2 = 0 , v3 = −2 .
    
1 1 1

Hence we can diagonalise T and

T = PDP −1

where
     1 1 1

1 −1 1 1 0 0 4 4 4
P = 2 0 −2 , D = 0 21 0 , P −1 = − 12 0 1 .

2
1 1 1 0 0 0 1
− 41 1
4 4

346
Now
T n = PD n P −1
Taking the limit as n → ∞ we get

347
So in the long run, the expected distribution of flower colours is:

Note:
◮ These fractions are proportional to the entries of the eigenvector of
T with eigenvalue 1 . This is expected since the limiting
proportions x∞ satisfy T x∞ = x∞ .
◮ This is an example of a Markov chain.
348
Topic 7: Inner Product Spaces [AR 6.1-6.5,5.3,7.1-7.2,7.5]

7.1 Definition of inner products


7.2 Geometry from inner products
7.3 Orthogonal sets and Gram-Schmidt procedure
7.4 Least squares curve fitting
7.5 Orthogonal matrices and symmetric matrices
7.6 Unitary matrices and Hermitian matrices

349
7.1 Definition of inner products
We now generalise the dot product to define inner products, angles and
lengths on general vector spaces.

Definition (Inner product over R)


An inner product on a real vector space V is a function that associates
to every pair of vectors u, v ∈ V a real number, denoted by hu, vi,
satisfying the following properties.
For all u, v, w ∈ V and α ∈ R:
1. hu, vi = hv, ui (symmetry)
2. hαu, vi = αhu, vi (linearity of scalar multiplication)
3. hu + v, wi = hu, wi + hv, wi (linearity of vector addition)
4. a. hu, ui > 0 (positivity)
b. hu, ui = 0 ⇒ u = 0

A real vector space V together with an inner product is called a real


inner product space.
350
Example
For V = Rn , the dot product

hu, vi = u · v

defines an inner product.

351
Example 1.
For u = (u1 , u2 ) ∈ R2 and v = (v1 , v2 ) ∈ R2 , define

hu, vi = u1 v1 + 2u2 v2 .

Show that this gives an inner product on R2 .


Solution:
Define a third vector and a scalar

1. Symmetry

352
2. Linearity of scalar multiplication

3. Linearity of vector addition

353
4. Positivity

354
Example 2.
For u = (u1 , u2 , u3 ) ∈ R3 and v = (v1 , v2 , v3 ) ∈ R3 , define

hu, vi = u1 v1 − u2 v2 + u3 v3 .

Show that this does not define an inner product on R3 .


Solution:
Give a specific counterexample

355
Note:
Let u = (u1 , · · · , un ) ∈ Rn and v = (v1 , · · · , vn ) ∈ Rn .

◮ All inner products defined on Rn can be written in the form

hu, vi = [u]T A [v]


where A is a real symmetric n × n matrix, so

A = AT .

◮ When written in this matrix form, axioms 1, 2 and 3 follow


immediately from properties of matrices. We only need to check
axiom 4.

356
Example 3.
For u = (u1 , u2 ) ∈ R2 and v = (v1 , v2 ) ∈ R2 , define

hu, vi = 2u1 v1 − 2u1 v2 − 2u2 v1 + 3u2 v2 .

(a) Write hu, vi in terms of a symmetric matrix.


(b) Show that hu, vi defines an inner product on R2 .

Solution:

357
358
Example 4.
For p = p(x) ∈ Pn and q = q(x) ∈ Pn , define
Z 1
hp, qi = p(x)q(x) dx.
0
Show that this gives an inner product on Pn .
Solution:
Define a third polynomial and a scalar

1. Symmetry

359
2. Linearity of scalar multiplication

3. Linearity of vector addition

360
4. Positivity

361
Complex vector spaces

Definition (Hermitian dot product on Cn )


The Hermitian dot product (or complex dot product) of u, v ∈ Cn is
defined by
u · v = u1 v1 + u2 v2 + · · · + un vn ,
where a bar indicates the complex conjugate.

Taking the complex conjugate is necessary to have real and positive


lengths:

v · v = v1 v1 + v2 v2 + · · · + vn vn = |v1 |2 + |v2 |2 + · · · + |vn |2 > 0.



So we can define the length of v by v · v.

362
Example 5.
Let u = (1 + i , 1 − i ), v = (i , 1) ∈ C2 .
Compute the Hermitian dot products u · v, v · u and u · u.
Solution:

363
Definition (Inner product over C)
A Hermitian inner product on a complex vector space V is a function
that associates to every pair of vectors u, v ∈ V a complex number,
denoted by hu, vi, satisfying the following properties.

For all u, v, w ∈ V and α ∈ C:


1. hu, vi = hv, ui (Hermitian symmetry)
2. hαu, vi = αhu, vi (linearity of scalar multiplication)
3. hu + v, wi = hu, wi + hv, wi (linearity of vector addition)
4. a. hu, ui > 0 (positivity)
b. hu, ui = 0 ⇒ u = 0

A complex vector space V together with an Hermitian inner product is


called a complex inner product space.

364
Example
The Hermitian dot product on Cn satisfies these properties.

Example
The inner product on Pn (R) defined in example 4 can be generalised to
Pn (C) by replacing the second polynomial in the definition by its
complex conjugate:
Z 1
hp, qi = p(x)q(x) dx.
0

Note:
Inner products defined by integration are important in many
applications, for example in quantum mechanics, electrical engineering,
and in the study of differential equations.

365
Example 6.
For u = (u1 , u2 ) ∈ C2 and v = (v1 , v2 ) ∈ C2 , define

hu, vi = iu1 v1 − iu2 v2 .

Show that this does not define an inner product on C2 .


Solution:
Give a specific counterexample

366
7.2 Geometry from inner products

In a general vector space, how can we find “lengths” of vectors and


“angles” between vectors?

Definition (Length, distance)


For a real or complex inner product space, the length (or norm) of a
vector v is defined by
p
kvk = hv, vi.

The distance between u and v is given by

d(u, v) = kv − uk.

367
Example (Frobenius inner product and digital pictures)
Given two matrices A, B ∈ Mm,n , we define

hA, Bi = Tr(AT B).

This is an inner product on Mm,n (see tutorial 12), so defines a distance


function d on Mm,n .
If the matrix entries represent the brightness of pixels in grayscale
digital photos, then
 1/2
d(A, B) = kA − Bk = Tr((A − B)T (A − B))

measures how “close together” two photos are.

368
Definition (Angle)
For a real inner product space, the angle θ between two non-zero
vectors u and v is defined by

hu, vi
cos θ = , 0 6 θ 6 π.
kukkvk

For a complex inner product space, the angle θ between two non-zero
vectors u and v is defined by

Rehu, vi
cos θ = , 0 6 θ 6 π,
kukkvk
where Rehu, vi denotes the real part of the inner product.

369
Note:
In order for these definitions of angle to make sense, we need

hu, vi Rehu, vi
−1 6 6 1 and −16 6 1.
kvkkuk kvkkuk

These results follow immediately from the Cauchy-Schwarz inequality.

Theorem (Cauchy-Schwarz inequality)


Let V be an inner product space. Then for all u, v ∈ V

|hu, vi| 6 kukkvk.

Equality holds if and only if one vector is a multiple of the other i.e. the
vectors are linearly dependent.

370
Example 7.
Let u = 1 and v = x. Using the inner product:
Z 1
hp, qi = p(x)q(x) dx
0
find the
(a) lengths of u and v
(b) distance between u and v
(c) angle between u and v
Solution:

371
372
7.3 Orthogonal sets and the Gram-Schmidt procedure

Definition (Orthogonal vectors)


Let V be an inner product space. Two vectors u, v ∈ V are orthogonal
if
hu, vi = 0.

Note:
For complex inner products this is stronger than saying the angle
between u and v is π/2 which only requires Rehu, vi = 0. It also implies
that the angle between u and i v is π/2.

Definition (Orthogonal set)


A set of vectors {v1 , . . . , vk } in an inner product space V is orthogonal
if
hvi , vj i = 0, i 6= j.
That is, the inner product of each distinct pair of vectors is zero.
373
Example 8. (Pythagoras’ Theorem)
If V is a real inner product space and u, v are orthogonal vectors in V ,
show that
ku + vk2 = kuk2 + kvk2 .

Solution:

374
Example 9.
Show that
{v1 , v2 , v3 } = {(1, 1, 1), (1, −1, 1), (1, 0, −1)}
is an orthogonal set of vectors in R3 with respect to the inner product
hu, vi = u1 v1 + 2u2 v2 + u3 v3 .

Solution:

375
Theorem
Every orthogonal set of non-zero vectors {v1 , . . . , vk } in an inner
product space V is linearly independent.
Proof:
Since {v1 , . . . , vk } is an orthogonal set, then

hvi , vj i = 0, i 6= j

Also

h0, vi i = 0, i = 1, · · · , k

Suppose that

α1 v1 + · · · + αk vk = 0 (*)

for some scalars αi .


376
Then, for each i = 1, · · · , k:

hα1 v1 + · · · + αk vk , vi i = 0
⇒ hα1 v1 , vi i + · · · + hαi vi , vi i + · · · + hαk vk , vi i = 0
⇒ α1 hv1 , vi i + · · · + αi hvi , vi i + · · · + αk hvk , vi i = 0
⇒ αi hvi , vi i = 0

Since hvi , vi i =
6 0 then αi = 0.

Hence the only solution of (∗) is

αi = 0, for all i = 1, · · · , k

so {v1 , . . . , vk } are linearly independent.

377
Definition (Orthonormal set)
A set of vectors {v1 , . . . , vk } in an inner product space V is
orthonormal if it is orthogonal and each vector has length 1.
That is
(
0 i 6= j,
{v1 , . . . , vk } is orthonormal ⇔ hvi , vj i =
1 i = j.

Note:
Any orthogonal set of non-zero vectors can be made orthonormal by
dividing each vector by its length.

378
Example 10.
Using the orthogonal set

{v1 , v2 , v3 } = {(1, 1, 1), (1, −1, 1), (1, 0, −1)}

construct a set of R3 that is orthonormal with respect to the inner


product
hu, vi = u1 v1 + 2u2 v2 + u3 v3 .

Solution:

379
Orthonormal bases

Theorem
Let V be an inner product space. If {v1 , . . . , vn } is an orthonormal
basis for V and x ∈ V , then

x = hx, v1 iv1 + · · · + hx, vn ivn .


Proof:
Since {v1 , . . . , vn } is an orthonormal set
(
0 i 6= j,
hvi , vj i =
1 i = j.

Since {v1 , . . . , vn } is a basis for V , then there is a unique set of scalars


αi (i = 1, · · · , n) such that

x = α1 v1 + · · · + αn vn .
380
Then, for each i = 1, · · · , n, we have

hx, vi i = hα1 v1 + · · · + αn vn , vi i
= hα1 v1 , vi i + · · · + hαi vi , vi i + · · · + hαn vn , vi i
= α1 hv1 , vi i + · · · + αi hvi , vi i + · · · + αn hvn , vi i
= αi

by properties 2 and 3 of inner products.

Hence
x = hx, v1 iv1 + · · · + hx, vn ivn .

381
Example 11.
Recall R3 has an orthonormal basis
 
1 1 1
{b1 , b2 , b3 } = (1, 1, 1), (1, −1, 1), √ (1, 0, −1)
2 2 2
with respect to the inner product
hu, vi = u1 v1 + 2u2 v2 + u3 v3 .
Express x = (1, 1, −1) as a linear combination of b1 , b2 , b3 .
Solution:

382
383
Orthogonal projection
Orthogonal projection in Rn can be generalised to any inner product
space.

Definition (Orthogonal projection onto a vector)


Let V be an inner product space. Let u ∈ V be a non-zero vector.
The orthogonal projection of v onto u is

hv, ui
proju v = u.
hu, ui

If u is unit vector then


proju v = hv, uiu.

Note:
If p = proju v then v − p is orthogonal to u since:
hv, ui
hv − p, ui = hv, ui − hp, ui = hv, ui − hu, ui = 0.
hu, ui
384
We can also generalise to projection onto a subspace W of V .

Definition (Orthogonal projection onto a subspace)


Let {u1 , . . . , uk } be an orthonormal basis for the subspace W ⊆ V and
v ∈ V.
The orthogonal projection of v onto W is

projW v = hv, u1 iu1 + · · · + hv, uk iuk .

If p = projW v then:

v v−p
W

0 p

385
Properties of orthogonal projection of v onto W
1. projW v ∈ W
2. If w ∈ W , then
projW w = w
3. v − projW v is orthogonal to W , i.e. orthogonal to every vector in
W.
4. projW v is the unique vector in W that is closest to v. Thus,

kv − wk > kv − projW vk

for all w ∈ W with w 6= projW v.


5. projW v does not depend on the choice of orthonormal basis for W .

386
Example 12.
Let W = {(x, y , z) | x + y + z = 0}. The set
 
1 1
{u1 , u2 } = √ (1, −1, 0), √ (1, 1, −2)
2 6
is an orthonormal basis for W with respect to the dot product.

(a) Find the orthogonal projection of the vector v = (1, 2, 3) onto W .


(b) Find the (shortest) distance from v to W .

Solution:

387
388
How to produce an orthonormal basis from a general basis

Gram-Schmidt procedure
Suppose {v1 , . . . , vk } is a basis for an inner product space V . Define:
1
1. u1 = kv1 k v1

2a. w2 = v2 − hv2 , u1 iu1


1
2b. u2 = kw2 k w2

3a. w3 = v3 − hv3 , u1 iu1 − hv3 , u2 iu2


3b. u3 = kw13 k w3
.. ..
. .
ka. wk = vk − hvk , u1 iu1 − · · · − hvk , uk−1 iuk−1
1
kb. uk = kwk k wk

Then {u1 , . . . , uk } is an orthonormal basis for V .


389
Example 13.
Using Gram-Schmidt, construct an orthonormal basis for R3 with
respect to the dot product, starting with the ordered basis

{v1 , v2 , v3 } = {(1, 1, 1), (0, 1, 1), (0, 0, 1)}.

Solution:
v1
Step 1: Find u1 = kv1 k

390
Step 2a: Find w2 = v2 − hv2 , u1 iu1

w2
Step 2b: Find u2 = kw2 k

391
Step 3a: Find w3 = v3 − hv3 , u1 iu1 − hv3 , u2 iu2

392
w3
Step 3b: Find u3 = kw3 k

393
7.4 Least squares curve fitting

Given a set of data points (x1 , y1 ), (x2 , y2 ),. . . , (xn , yn ) we want to find
the line y = a + bx that best approximates the data.

A common approach is to minimise the least squares error E .

E 2 = sum of the squares a+bx


of vertical errors
Xn
= δi2
i =1
n
X x
= (yi − (a + bxi ))2
i =1

394
This can be written as
E 2 = ky − Auk2
using the Euclidean dot product, where
   
  1 x1 a + bx1
y1 1 x2    a + bx2 
 ..  a
y =  . , A =  .. ..  , u = , Au =  ..  .
   
. .  b  . 
yn
1 xn a + bxn

To minimise ky − Auk2 , we want to choose u so that Au is as close as


possible to y.

To find the closest point, we project y onto


W = {Av | v ∈ R2 } = column space of A.

So we want u ∈ R2 such that Au = projW y.

We can calculate u directly (without finding an orthonormal basis for


W ) by noting that y − projW y is orthogonal to W .
395
Thus
hw, y − projW yi = 0 for all w ∈ W
⇒ hAv, y − Aui = 0 for all v ∈ R2
⇒ (Av)T (y − Au) = 0 for all v ∈ R2
⇒ vT AT (y − Au) = 0 for all v ∈ R2
⇒ AT (y − Au) = 0
⇒ AT y − AT Au = 0
⇒ AT Au = AT y
From this we can calculate u, given that we know A and y.

396
Theorem (Least squares line of best fit)
The least squares solution for the line y = a + bx of best fit for the
points (x1 , y1 ), . . . , (xn , yn ) is determined by

AT A u = AT y,

or, if AT A is invertible,

u = (AT A)−1 AT y,

where  
  1 x1
y1 1 x2   
 ..  a
y =  . , A = . .  , u= .
 
. .
. .  b
yn
1 xn

397
Example 14.
Find the line of best fit for the data points (−1, 1), (1, 1), (2, 3) using
the method of least squares.
Solution:

398
399
400
7.5 Orthogonal matrices and symmetric matrices

Definition (Orthogonal matrix)


An orthogonal matrix Q is a real n × n matrix such that Q −1 = Q T .

Note:
It is enough to show that

Q T Q = In or QQ T = In .

Theorem
A real n × n matrix is orthogonal if and only if its columns form an
orthonormal basis for Rn using the dot product.

401
Example 15.
Show that the following rotation matrix is orthogonal:
 
cos θ − sin θ
Q=
sin θ cos θ

Solution:

402
Geometric properties of orthogonal matrices

Theorem
If Q is an orthogonal n × n matrix then
1. Qu · Qv = u · v for all u, v ∈ Rn (with u, v regarded as column
vectors),
2. det(Q) = ±1.

Note:
◮ Property 1 says that orthogonal matrices preserve dot products.
Hence they also preserve Euclidean lengths and angles.
◮ Geometrically, orthogonal matrices represent rotations when
det(Q) = 1, and reflections composed with rotations when
det(Q) = −1.

403
Example 16.
Prove property 1: If Q is an orthogonal n × n matrix then
Qu · Qv = u · v for all u, v ∈ Rn regarded as column vectors.
Solution:

404
Real symmetric matrices
Theorem (Orthogonal Diagonalisation)
Let A be an n × n real symmetric matrix. Then using the dot product
as the inner product on Rn :
1. all eigenvalues λi (i = 1, · · · , n) of A are real,
2. eigenvectors corresponding to distinct eigenvalues are orthogonal,
3. there is an orthonormal basis of eigenvectors {v1 , · · · , vn },
4. A is diagonalisable with
A = QDQ −1 = QDQ T

where D is the diagonal matrix


diag(λ1 , . . . , λn )

and Q is the orthogonal matrix


 
v1 · · · vn .

405
Example 17.
Write the symmetric matrix
 
1 −1
A=
−1 1

in the form A = QDQ T where D is a diagonal matrix.


Solution:
Step 1. Find the eigenvalues of A

406
Step 2. Find a unit eigenvector corresponding to each eigenvalue

407
408
Step 3. Form D and Q.

Note:
In the case of repeated eigenvalues, you may need to use the
Gram-Schmidt procedure in each eigenspace to find an orthonormal
basis of eigenvectors.

409
Singular value decomposition (SVD)

For a general real m × n matrix A, the diagonalisation formula for


symmetric matrices can be extended to find orthogonal matrices U and
V such that

A = USV T
where S is an m × n matrix with Sii > 0 and Sij = 0 whenever i 6= j.

The above equation is called the singular value decomposition of A.

The singular value decomposition is important in many applications,


including digital image compression and stable numerical computations.

410
Theorem
Every real m × n matrix A has a singular value decomposition

A = USV T

where
◮ U is an m × m real orthogonal matrix,
◮ V is an n × n real orthogonal matrix,
◮ S is an m × n matrix with Sii = σi > 0 and Sij = 0, i 6= j,
◮ σi for 1 6 i 6 min(m, n) are called the singular values of A.

Further, using the dot product as the inner product,


1. the σi2 are the eigenvalues of AAT and AT A,
2. the columns of U are orthonormal eigenvectors ui of AAT ,
3. the columns of V are orthonormal eigenvectors vi of AT A,
4. Avi = σi ui .

411
Note:
◮ AAT is an m × m matrix and AT A is an n × n matrix.
◮ AAT and AT A are real symmetric matrices.
◮ Geometrically, U and V correspond to rotations (or rotations
followed by reflections), so only S changes the lengths of vectors by
stretching by factors σi > 0 and orthogonal projection.

412
How to find the singular value decomposition of A ∈ Mm,n

1. Find an orthonormal basis v1 , . . . , vn of eigenvectors for AT A, with


corresponding eigenvalues λ1 , . . . , λn .
2. The singular values of A are
p
σi = λi > 0.

3. Let
1
ui = Avi
σi
for σi 6= 0, say i = 1, . . . , r = rank(A).
4. Extend u1 , . . . , ur to an orthonormal basis {u1 , . . . , um } for Rm .
5. Then
A = USV T
where V = [v1 . . . vn ], U = [u1 . . . um ] and S has diagonal entries
Sii = σi for i = 1, . . . , min(m, n).

413
Example 18.
Let    
0 −1 σ1 0
A= and S = .
0 0 0 σ2

Find orthogonal matrices U and V and scalars σ1 , σ2 > 0 such that

A = USV T .

Solution:
Step 1. Find a matrix V

414
415
Step 2. Find a matrix U compatible with V using property 4 of theorem

416
417
7.6 Unitary matrices and Hermitian matrices
We now consider the complex analogues of orthogonal and real
symmetric matrices.

Definition (Conjugate transpose of a matrix)


Let A be a complex m × n matrix. We define its conjugate transpose
T
(or adjoint or Hermitian transpose) A∗ = A to be the n × m matrix
whose entries are
T
(A∗ )ij = (A )ij = Aji .

The conjugate transpose of A is sometimes denoted by A† .

Note:
If α ∈ C is a scalar, then the following properties hold whenever the
matrix products and sums exist:
(A∗ )∗ = A, (A + B)∗ = A∗ + B ∗ ,
(AB)∗ = B ∗ A∗ , (αA)∗ = ᾱA∗ .
418
Definition (Unitary matrix)
An unitary matrix is a complex n × n matrix U such that
T
U −1 = U ∗ = U .

Note:
◮ To check if a matrix is unitary it suffices to check that U ∗ U = In or
UU ∗ = In .
◮ Any real orthogonal matrix is unitary if viewed as a complex matrix.

419
Example 19.
Show that  
1 −i i
U=√
2 1 1
is a unitary matrix.
Solution:

420
The Hermitian dot product on Cn can be written in matrix form:

hu, vi = u1 v1 + u2 v2 + · · · + un vn = v̄T u = v∗ u,

where u, v ∈ Cn are regarded as column vectors.

This leads to the following analogues of our previous results.

Theorem
If U is a unitary n × n matrix then
1. hUu, Uvi = hu, vi for all u, v ∈ Cn ,
2. | det(U)| = 1,
where hu, vi is the Hermitian dot product on Cn .

421
Theorem
A complex n × n matrix is unitary if and only its columns form an
orthonormal basis for Cn using the Hermitian dot product.

Definition (Hermitian matrix)


A complex n × n matrix A is Hermitian or self-adjoint if

A∗ = A.

Note:
Any real symmetric matrix is Hermitian if viewed as a complex matrix.

422
Example 20.
Are the following matrices Hermitian?
 
1 i
(a) A =
−i 1
 
i 0
(b) B =
0 i
Solution:

423
Theorem
Let A be an n × n Hermitian matrix. Then using the Hermitian dot
product as inner product on Cn :
1. all eigenvalues λi (i = 1, · · · , n) of A are real,
2. eigenvectors corresponding to distinct eigenvalues are orthogonal,
3. there is an orthonormal basis of eigenvectors {v1 , · · · , vn },
4. A is diagonalisable with

A = UDU −1 = UDU ∗

where D is the diagonal matrix

diag(λ1 , λ2 , . . . , λn )

and U is the unitary matrix


 
v1 · · · vn .

424
Example 21.
Write the Hermitian matrix
 
1 i
A=
−i 1
in the form
A = UDU −1
where D is a diagonal matrix and U is a unitary matrix.
Solution:
Step 1. Find the eigenvalues of A

425
Step 2. Find a unit eigenvector corresponding to each eigenvalue

426
427
Step 3. Form D and U.

428

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy