Lecture 1 - Linear Algebra
Lecture 1 - Linear Algebra
Linear algebra is the most fundamental pillar of linear systems and controls. A comprehensive coverage
of linear algebra can take years!, and is way beyond our scope here. In this lecture I cover only some of
the basic concepts and results that we will use later in the course. For a nice and more comprehensive
treatment, but still without proofs, see Chapter 3 of the Chen’s textbook. For a pretty comprehensive
treatment, you can take a stop at Tom Bewley’s encyclopedia, “Numerical Renaissance: simulation,
optimization, & control”. If you want just something in between, which is nicely readable and moderately
comprehensive, try Gilbert Strang’s “Introduction to Linear Algebra”.
Contents
1.1 Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Definition (Scalars, vectors, and matrices) . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Definition (Matrix multiplication) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Theorem (Breakdown of matrix multiplication) . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Exercise (Breakdown of matrix multiplication) . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Linear (In)dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 MATLAB (Linearity of matrix multiplication) . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Definition (Linear combination) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Definition (Linear (in)dependence) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Remark (Determining linear independence) . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.5 Exercise (Linear (in)dependence) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Rank and Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Definition (Rank) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Example (Rank of matrices) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.3 Theorem (Rank) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.4 MATLAB (Rank) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.5 Definition (Full rank matrices) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.6 Definition (Determinant) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.7 Theorem (Rank and determinant) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Definition 1.1.1 (Scalars, vectors, and matrices) A “scalar”, for the purpose of this course, is either
a real (R) or a complex (C) number. A “vector”, again for the purpose of this course, is an ordered set of
numbers, depicted as a column:
x1
x2
x= .
..
xn n×1
Almost always, our vectors are column vectors. But occasionally, we need row vectors as well, which we may
show using the transpose T notation:
xT = x1 x2 · · · xn 1×n
As you have noticed, throughout this course, we use bold-faced small letters for vectors and bold-faced capital
letters for matrices. □
Vectors and matrices of the same size can be added together, and both vectors and matrices can be multiplied
by a scalar, not super interesting. What is more interesting and, as you will see, essentially the basis of linear
algebra, is matrix multiplication. You probably have seen the basic definition.
Definition 1.1.2 (Matrix multiplication) For two matrices Am×n and Br×p , their product C = AB is
only defined if n = r, in which case the entries of Cm×p are defined as
n
X
cij = aik bkj .
k=1
□
The above definition gives little intuition about what a matrix multiplication really does. To see this, we
need to notice two facts.
Theorem 1.1.3 (Breakdown of matrix multiplication) Let ai and bi denote the i’th column of A and
B, respectively:
A = a1 a2 · · · an , B = b1 b2 · · · bp
Then
Exercise 1.1.4 (Breakdown of matrix multiplication) Use the above two rules to compute AB, where
0 2 1 −4 0
A = −1 1 −3 , B = 2 1
3 −1 3 −1 0
□
You might be wondering why I put so much emphasis on these simple properties? One reason is the intuition
that they give you on matrix multiplication. But it’s more than that. The real reason is the important role
that matrix-vector multiplication plays in linear algebra. Notice that for a matrix Am×n and a vector xn × 1,
their product
ym×1 = Ax
is also a vector. In other words, a matrix is more than a rectangular array of numbers! It defines a function
that maps vectors to vectors.
Rn Rm
A
x y
This view of matrices as maps (or operators) is arguably the basis of linear algebra!
where x1 , x2 are vectors and α1 , α2 are scalars. In other words, if you have already computed y1 = Ax1 and
y2 = Ax2 , and you want to compute A(α1 x1 + α2 x2 ), you don’t have to use matrix multiplication (which is
computationally expensive) anymore. You can simply reuse y1 and y2 and “combine” them using the same
coefficients α1 and α2 .
MATLAB 1.2.1 (Linearity of matrix multiplication) To see the advantage of using the right hand
side of Eq. (1.3) over its left hand side, compare
1 n = 1e4;
2 x1 = rand(n, 1);
3 x2 = rand(n, 1);
4 A = rand(n);
5 tic
6 for i = 1:1e3
7 alpha1 = rand;
8 alpha2 = rand;
9 y = A * (alpha1 * x1 + alpha2 * x2);
10 end
11 toc
with
1 n = 1e4;
2 x1 = rand(n, 1);
3 x2 = rand(n, 1);
4 A = rand(n);
5 tic
6 y1 = A * x1;
7 y2 = A * x2;
8 for i = 1:1e3
9 alpha1 = rand;
10 alpha2 = rand;
11 y = alpha1 * y1 + alpha2 * y2;
12 end
13 toc
If you haven’t seen them before, rand(n, m) generates a matrix of size n × m with random entries in the
interval [0, 1], rand(n) is equivalent to rand(n, n) (not rand(n, 1)), and rand is equivalent to rand(1).
The tic, toc pair compute the CPU time for all the commands computed in between. □
In general, terms of the form α1 x1 + α2 x2 appear over and over again in linear algebra, so they have been
given a name!
xn+1 = α1 x1 + α2 x2 + · · · + αn xn (1.4)
x4 = 2x1 + 3x2
α1 x1 + α2 x2 + · · · + αn xn = 0 (1.5)
In contrast, if Eq. (1.5) holds only for α1 = · · · = αn = 0 (which it clearly always does), then x1 , x2 , . . . , xn
are called “linearly independent”. □
Now, if I give you a set of vectors x1 , x2 , . . . , xn , how can you say if they are linearly independent or not?
For example,
2 −3 7
x1 = 0 , x2 = 3 , x3 = −3 (1.6a)
−1 1 3
are not. There are various ways for determining it, and we will get back to it in more detail in Section 1.3.
But for now, we can say the following.
• First, notice that for two vectors, if they are linearly dependent, it means that
α1 x1 + α2 x2 = 0
for some α1 and α2 that at least one of them, say α1 , is nonzero. This means that
α2
x2 = − x1
α1
or, in words, two vectors are linearly dependent if and only if they are a multiple of each other.
• Second, if you have n, m-dimensional vectors and n > m, they are necessarily linearly dependent. So
in R2 , you cannot have 3 linearly independent vectors, in R3 , you cannot have 4 linearly independent
vectors, and so on. We will see why shortly, in Section 1.3.
• Also, just remember that you can always check linear independence directly from Definition 1.2.3. At
the end of the day, Eq. (1.5) is a system of linear equations with unknowns α1 , α2 , . . . , αn . Later in this
note I will discuss systems of linear equations in detail, but you can always solve them manually (for
example using the substitution method, reviewed here). If the only answer is α1 = α2 = · · · = αn = 0,
then the vectors are linearly independent. If you fail to find a unique answer, then the vectors are
linearly dependent.
6
Exercise 1.2.5 (Linear (in)dependence) Check whether any of the following set of vectors are linearly
dependent or independent:
So the matrix A has n columns and m rows. Notice that this same matrix can also be seen as a stack of m,
n-dimensional row vectors
T
y1
y2T
A= .
..
T
ym
where each row vector yiT denotes the i’th row of A (or, equivalently, the column vector yi is the transpose
of the i’th row of A). Now, we are ready to define what the rank of a matrix is:
• The column-rank of A is the number linearly independent columns of A (the number of linearly
independent x1 , x2 , . . . , xn ).
• The row-rank of A is the number of linearly independent rows of it (the number of linearly independent
y1 , y2 , . . . , ym ).
□
The notion of rank essentially translate a property of a set of vectors to a property of a matrix, but is very
fundamental to linear algebra. Let’s see a couple of examples.
Example 1.3.2 (Rank of matrices) Consider again the example vectors in Eq. (1.6). The first set of
vectors were linearly independent (as you checked in Exercise 1.2.5). Therefore, when we put them side by
side in the matrix
2 −3 7
A= 0 3 −3
−1 1 3
it has column rank equal to 3. What about its row rank? The rows of the matrix are
y1T = 2 −3 7 , y2T = 0 3 −3 , y3T = −1 1 3
which are also linearly independent (check). So the row rank of the matrix is also 3.
Now, consider the vectors in Eq. (1.6b). They are not linearly independent, because x3 = 2x1 − x2 . But x1
and x2 are indeed linearly independent, because they are not a multiple of each other (remember from last
section). So putting them side by side, the matrix
2 −3 7
B= 0 3 −3
−1 1 −3
has column rank 2. What about its row rank? The rows of B are
y1T = 2 −3 7 , y2T = 0 3 −3 , y3T = −1
1 −3
which are also not linearly independent, because y3 = − 21 y1 − 16 y2 . And similar to x1 and x2 , y1 and y2
are also linearly independent because they are not a multiple of each other. So, exactly 2 of y1 , y2 , y3 are
linearly independent, and the row rank of B is also 2. □
The fact that the column rank and the row rank of both A and B were equal was not a coincident. This is
always the case:
Theorem 1.3.3 (Rank) For any matrix Am×n , its column rank equals its row rank, which is called the
rank of the matrix. As a consequence,
rank(A) ≤ min{m, n}. (1.8)
□
Recall that in the previous section (second point after Definition 1.2.3), I told you that if you have n, m-
dimensional vectors and n > m, they are necessarily linearly dependent. Now you can see why from Eq. (1.8).
For example, consider the same A matrix in Eq. (1.7). The rank of the matrix can at most be 3, because it
has only 3 rows and its row-rank (number of independent rows) cannot be more than the number of rows!
In this case, the rank is in fact 3, which means that out of the 5 columns, no more than 3 of them can be
simultaneously independent form each other.
• If n = 2,
a b
A=
c d
det(A) = ad − bc
• If n = 3,
a b c
A = d e f
g h i
det(A) = aei + bf g + cdh − ceg − bdi − af h
Which is much easier to remember and calculate using the following image:
• If n > 1 (including n = 2 and 3 above, but really used for n ≥ 4), then the determinant of A is defined
based on an expansion over any arbitrary row or column. For instance, choose row 1. Then
n
X
det(A) = |A| = (−1)1+j a1j det(A−(1,j) )
j=1
where A−(1,j) is an n − 1 by n − 1 matrix obtained from A by removing its 1st row and j’th column.
9
The main use of determinants for us, however, is not their geometrical interpretation, but rather their relation
to independence of vectors (finally!):
MATLAB 1.3.8 (Determinant) In MATLAB, use the function det to obtain the determinant of a matrix.
□
Before finishing this section, let’s see a couple of basic properties of the determinant:
Theorem 1.3.9 (Determinant of product and transpose) For two square matrices A and B,
det(AT ) = det(A)
and
Exercise 1.3.10 (Rank using definition and determinant) Compute that rank of
−3 −3 −4 −4
A= 3 3 4 4
0 1 −4 2
in 3 ways: from Definition 1.2.3, from Theorem 1.3.7, and using MATLAB. From both of the first two
methods, determine a maximal set of linearly independent columns. □
containing m equations and n unknowns x1 , . . . , xn . Now you can easily see that this system can be written
in matrix form as
Ax = b (1.9)
So the question is: for a given A and b, find all x that solve this equation. In general, Eq. (1.9) can have 0,
1, or infinitely many solutions.
The above question is essentially composed of two questions:
so whether Eq. (1.9) has a solution is the same as asking whether b is linearly dependent on the columns of
A. To check this, you can just create a larger augmented matrix
Aaug = A b
(what if rank(Aaug ) < rank(A)?) and you can check the ranks either using determinants via Theorem 1.3.7,
or directly using rank() in MATLAB. Note that if you are using the determinants, your job often becomes
easier if you remove any linearly dependent columns of A before appending b to it (right?).
1
v1 + v2 − v3 = 0 ⇒ A 1 = 0
−1
| {z }
z
so all x + αz are solutions as well. Note that this only happens because there exists a nonzero vector z such
that Az = 0. □
The above examples motivates the definition of another fundamental concept in linear algebra:
12
Definition 1.4.4 (Null space) The null space of a matrix A is the set
{z | Az = 0}.
□
Let me emphasize that the null space is never empty because it always contains at least the zero vector.
So to determine whether the solution to Ax = b (assuming that at least one exists) is unique, we need to
determine whether the null space of A contains any nonzero vectors. Again, Eq. (1.2) comes in handy!
z1
z2
Az = a1 a2 · · · an . = z1 a1 + z2 a2 + · · · + zn an
..
zn
Az = 0
z1 a1 + z2 a2 + · · · + zn an = 0
for a nonzero set of coefficients z1 , z2 , . . . , zn , which is precisely the definition of linear independence! So,
The solution to Ax = b (if any) is unique ⇔ A has full column rank (1.10)
When the solution to the equation Ax = b is not unique, then you essentially have a solution space rather
than a solution. Obtaining the solution space is very easy once you have the null space: if {z1 , z2 , . . . , zk } is
a basis for the null space of A and x is one solution to the equation, then
Definition 1.4.6 (Matrix inverse) For a square and nonsignular matrix A, there exists a unique matrix
A−1 such that
AA−1 = A−1 A = I
13
• If m > n (A is a tall matrix), and A is full column rank (it cannot be full row-rank, right?), then we
have a unique solution only if b is linearly dependent on the columns of A. If so, to find that solution,
Ax = b ⇒ AT Ax = AT b ⇒ (AT A)−1 AT Ax = (AT A)−1 AT b ⇒ x = (AT A)−1 AT b
• If m < n (A is a fat matrix), the equation never has a unique solution (even if A is full row-rank),
because A cannot be full column rank (right?). Given any solution (that you can find, e.g., using
elimination of variables), you can build the entire solution space as in Eq. (1.11).
If, however, A is full row rank, then we are sure that the system of equations has always a solution
for any b (why?). In this case, we also know (from a theorem we don’t prove) that the square matrix
AAT is nonsingular, and
x = AT (AAT )−1 b
is one solution to the equation (just plug it in and check!).
If you compare the three case above, you will notice that the matrices (AT A)−1 AT (for full column rank
A) and AT (AAT )−1 (for full row rank A) are essentially taking the place of A−1 in Eq. (1.12). In fact,
if A is square and nonsingular, both of them will become equal to A−1 (because (AT )−1 = (A−1 )T and
(AB)−1 = B−1 A−1 ). In other words, (AT A)−1 AT is an extension of A−1 for non-square, full column rank
matrices and AT (AAT )−1 is an extension of A−1 for non-square, full row rank matrices. As such, both of
them are called the “pseudo-inverse” of A, shown as A† . Therefore, whenever A has full rank,
x = A† b (1.13)
is a solution to Ax = b. It goes beyond our course, but be aware that A† is always defined for any matrix
(even the zero matrix) and Eq. (1.13) is always a solution to Ax = b.
MATLAB 1.4.7 (Inverse and pseudo-inverse) To find the inverse of a matrix, use the inv() function.
Similarly, use pinv() to find the pseudo-inverse. However, if you want to invert a matrix only for the purpose
of solving a linear equation, as in Eq. (1.12) or Eq. (1.13), a computationally advantageous way is to use the
MATLAB’s left division:
14
Using left division also allows you to solve systems of equations without a unique solution, or even non-square
systems of equations. If your system of equations has infinitely many solutions, A \ b returns one of them.
If your system has no solutions, then it returns an x for which the error Ax − b is smallest (in magnitude).□
Example 1.5.1 (2D mappings) Consider a few simple mappings in two dimensions:
do to vectors? Here is how it maps a whole bunch of random points (the red dot shows the origin, the blue
dots are random x’s, the red dots are the corresponding Ax, and the arrow shows the mapping):
15
It already gives us a sense: The arrows are all pointing outwards, showing that A perform some sort of
enlargement (scaling with a scale k > 1). But this enlargement is not uniform, it is more pronounced along
a NorthWest-SouthEast axis. Notice that this NorthWest-SouthEast axis can be described by the vector
−1
v1 =
1
In other words, v1 is special because when A acts on it, the result is a multiple of v1 again! In other words,
the effect of A on v1 is a pure scaling. (If you think this is not so special, try a whole bunch of random
vectors and check if Av becomes exactly a multiple of v.)
Note that this special property also clearly holds for any multiple of v1 :
In other words, the effect of A on the whole NorthWest-SouthEast axis is a 6-time enlargement.
But this does not tell us all about A. What about other directions other than the NorthWest-SouthEast
axis? We can visually see that no other direction is scaled as much. To see what A does to other vectors, we
16
can search to see if there are any other vectors such that the effect of A on them is a pure scaling. In other
words, are there any vectors v, other than v1 and its multiples, such that
Av = λv (1.15)
for some scalar λ. Clearly, v = 0 satisfies this, but that is not what we are looking for.
Eq. (1.15) is fortunately a linear system of equations:
(A − λI)v = 0
and we are looking for nonzero vectors v that satisfy it. The difficulty compared to a usual linear system of
equations is that λ is also unknown. But notice one thing. If λ is such that A − λI is nonsingular, then
In other words, for any λ such that A − λI is nonsingular, Eq. (1.15) has only the trivial solution v = 0,
which is of no help. This is very useful, because we know we have to restrict our attention to values of λ for
which A − λI is singular:
5−λ −1
det(A − λI) = 0 ⇔ det =0
−1 5−λ
⇔ (5 − λ)2 − 1 = 0
⇔ λ2 − 10λ + 24 = 0
λ1 = 6
λ2 = 4
λ1 = 6 is what we had originally found by guessing v1 . So for λ1 , there is even no need to solve Eq. (1.15),
because we already know its solution. But what about the solution to (A − λ2 I)v = 0?
1 −1 v1 v − v2
(A − λ2 I)v = = 1 (1.16)
−1 1 v2 v2 − v1
So (A − λ2 I)v = 0 if and only if v1 = v2 . Not surprisingly, we did not get a unique solution v, because we
found λ2 precisely such that we get infinitely many solutions. It is not hard to see that vectors v for which
v1 = v2 are all multiples of
1
v2 =
1
and constitute the SouthWest-NorthEast axis in the picture. Now the picture makes even more sense: the
mapping A scales all the vectors along the NorthWest-SouthEast axis by 6 times, and all the vectors along
the SouthWest-NorthEast axis by 4 times.
What about other vectors x that lie neither on the NorthWest-SouthEast axis nor on the SouthWest-
NorthEast axis? Well, notice that v1 and v2 are linearly independent, and we can decompose any other
vector into a linear combination of them (using standard orthogonal projection you learned in geometry):
x = α1 v1 + α2 v2
17
2.5
x
2
1.5
2
v2
v1
1
v2
1
v1
0.5
0
-1 -0.5 0 0.5 1 1.5
To find the unknown coefficients α1 and α2 , you can simply use your knowledge of linear algebra!
x = α1 v1 + α2 v2
α1
x = v1 v2
| {z } α2
V
α1
= V−1 x
α2
Note that V is invertible precisely because v1 and v2 are independent. If they weren’t, then we wouldn’t be
able to decompose x based on them. But fortunately, eigenvectors always turn out to be linearly independent
(more on this later).
Now, we can clearly see what happens when A is applied to x:
Ax = A(α1 v1 + α2 v2 )
= α1 Av1 + α2 Av2
= 6α1 v1 + 4α2 v2
So A scales the component of any vector along the NorthWest-SouthEast axis (along v1 ) by 6 times and
its component along the SouthWest-NorthEast axis (along v2 ) by 4 times. Visually, this looks a bit like a
combination of scaling (stretching) and rotation (rotation away from the ±v2 and towards ±v1 ), but there
isn’t any rotation really, it’s just an unequal scaling.
Definition 1.5.2 (Eigenvalue and eigenvector) For any matrix A ∈ Rn×n , there exist precisely n num-
bers λ (potentially complex, and potentially repeated) for which the equation
Av = λv
has nonzero solutions. These numbers are called the eigenvalues of A, and the corresponding nonzero vectors
v that solve this equation are called the eigenvectors of A. □
Let’s see a few examples of finding eigenvalues and eigenvectors.
Example 1.5.3 (3x3 matrix with unique eigenvalues) Consider the matrix
−5 3 7
A = −5 3 5
−4 2 6
18
det(A − λI) = 0
−5 − λ 3 7
−5 3−λ 5 =0
−4 2 6−λ
−λ3 + 4λ2 − 6λ + 4 = 0
This a polynomial equation and we know, from Section 0.2, that it has exactly 3 (potentially repeated,
potentially complex) roots. Using hand calculations or MATLAB, we can see that its solutions are
λ1 = 2
λ2 = 1 + j
λ3 = 1 − j
λ1 = 2
λ2,3 = 1 ± j
To find the eigenvectors associated with each eigenvalues, we simply solve the equation (A − λi I)v = 0, as
we did in Eq. (1.16). Note that this is nothing but finding the null space of A − λi I. For λ1 , this becomes
−7a + 3b + 7c = 0
a
(A − 2I) b = 0 ⇔ −5a + b + 5c = 0 ⇔ b = 5a − 5c
c
−4a + 2b + 4c = 0
|{z}
v
is an eigenvector corresponding to λ1 = 2. This gives an entire line in the 3D space that is scaled by 2 by A
(similar to the NorthWest-SouthEast and SouthWest-NorthEast directions for the matrix A in Eq. (1.14)).
If you prefer (or are
asked
to provide) a single eigenvector associated with λ1 , pick any a ∈ C that you like,
2j
for example v1 = 0 .
2j
To find the eigenvector associated with λ2 , we proceed similarly:
−(6 + j)a + 3b + 7c = 0
a
(A − (1 + j)I) b = 0 ⇔ −5a + (2 − j)b + 5c = 0
c 5−j
−4a + 2b + (5 − j)c = 0 ⇔ b = 2a − 2 c
19
These equations may not immediately look multiples of each other, but they are in fact. To see, solve one
and replace in the other:
2nd eq
1st eq ⇔ a = (1.5 + 0.5j)c ⇒ −(0.5 + 3.5j)c + (0.5 + 3.5j)c = 0
which holds for any c. Substituting a = (1.5 + 0.5j)c into b = 2a − 5−j2 c also gives us b = (0.5 + 1.5j)c.
Therefore, any vector
(1.5 + 0.5j)c
v2 = (0.5 + 1.5j)c , for any c ∈ C, c ̸= 0
c
is an eigenvector associated with λ2 = 1 + j. Again, if you want a single matrix, pick your choice of c ̸= 0,
T
such as c = 2 ⇒ v2 = 3 + j 1 + 3j 2 .
Finally, to find the eigenvector associated with λ3 = 1 − j, you can repeat the same process as above, which
gives,
(1.5 − 0.5j)c
v3 = (0.5 − 1.5j)c , for any c ∈ C, c ̸= 0
c
□
Notice that v3 is the complex conjugate of v2 . This is not by chance. Whenever you have two eigenvalues
that are complex conjugates of each other, their corresponding eigenvectors are also complex conjugate of
each other (can you prove this?).
In the examples that we have seen so far, all the eigenvalues of A have been distinct. This is not always the
case. The following is an example.
Example 1.5.4 (3x3 matrix with repeated eigenvalues) This time consider the matrix
−5 2 −6
A = 6 −1 6
6 −2 7
det(A − λI) = 0
−5 − λ 2 −6
6 −1 − λ 6 =0
6 −2 7−λ
−λ3 + λ2 + λ − 1 = 0
λ1 = −1
λ2 = λ3 = 1
20
two of which are repeated. This is perfectly fine. However, finding the eigenvectors becomes a bit more
complicated. For λ1 = −1 which is not repeated, everything is as before. You solve (A + I)v = 0 and will
find that any vector
a
v1 = −a , for any a ∈ C, a ̸= 0
−a
is an eigenvector corresponding to λ1 .
In order to find the eigenvectors corresponding to both of λ2 and λ3 , we have to solve the same equation
−6a + 2b − 6c = 0
a
(A − I) b = 0 ⇔ −6a + 2b − 6c = 0 ⇔ b = 3a + 3c
c
−6a + 2b − 6c = 0
You can see that this time, we get only one equation in 3 variables, and we have two free variables to choose
arbitrarily (I chose a and c, but any two would work). Therefore, any vector
a 1 0
v2,3 = 3a + 3c = a 3 + c 3 , for any a, c ∈ C, (a, c) ̸= 0 (1.17)
c 0 1
is an eigenvector corresponding to λ2,3 = 1. Notice the difference with the example before where the eigen-
values where different. When the eigenvalues where different, we found one line of eigenvectors corresponding
to each eigenvalue. Now that we have a repeated eigenvalue, we found a plane of eigenvectors, with the same
dimension (2) as the multiplicity of the repeated eigenvalue.
Similar to before, if you want (or are asked to) give two specific eigenvectors corresponding to λ2 and λ3
(instead of the plane of eigenvectors in Eq. (1.17)), you can pick any two linearly independent vectors from
that plane, for example,
1 0
v2 = 3 , v3 = 3
0 1
□
The case of repeated eigenvalues can get more complex than this. In the above example, we were able to
find two linearly independent vectors that satisfy (A − I)v = 0. In other words, the dimension of the null
space of A − I was 2, equal to the multiplicity of the repeated eigenvalues. We may not always be so lucky!
Consider for example the matrix
3 1
A=
0 3
It is not hard to see that the two eigenvalues are λ1 = λ2 = 3. So we ideally would need two linearly
independent vectors that satisfy (A − 3I)v = 0. This is impossible, because the null space of A − 3I is only
1 dimensional. To see this, notice that
(
a b=0
(A − 3I) =0⇔
b 0=0
So we can only choose a freely, but b must be zero, and only the vectors
a
v1 = , for any a ∈ C, a ̸= 0
0
21
are an eigenvector corresponding to λ1 = λ2 = 3. What about the other eigenvector v2 ? It doesn’t exist!
For these kinds of matrices where not enough eigenvectors can be found (which can only happen if we have
repeated eigenvalues), we have to supplement the eigenvectors with additional vectors called “generalized
eigenvectors”. Good news, that’s beyond our course!
Let’s go back to Example 1.5.4 where we were lucky and able to find two linearly independent eigenvectors
corresponding to the repeated eigenvalue. You might have noticed that not only v2 and v3 are linearly inde-
pendent, but all three eigenvalues {v1 , v2 , v3 } are linearly independent. The same was true in Example 1.5.3
where the eigenvalues were distinct. This is again not by chance:
Theorem 1.5.5 (Independence of eigenvectors) Consider a matrix A ∈ Rn×n that has distinct eigen-
values, or a matrix A ∈ Rn×n that has repeated eigenvalues but we are able to find as many linearly
independent eigenvectors as the number of repeated eigenvalues. In both cases, the set of all eigenvectors
{v1 , v2 , . . . , vn } is linearly independent. □
In the above theorem, I focused on a specific set of (square) matrices: those that either have distinct
eigenvalues, or even if they have repeated eigenvalues, we are able to find as many independent eigenvectors
as the multiplicity of each repeated eigenvalue. For reasons that we will see shortly, these matrices are called
diagonalizable.
Before closing this section, here is how to do all of these in MATLAB.
MATLAB 1.5.6 (Eigenvalues & eigenvectors) The function eig() gives you the eigenvalues and eigen-
vectors. If you only want the eigenvalues, type
1 lambda = eig(A);
and it will give you a column vector lambda containing the eigenvalues of A. If you want the eigenvectors as
well, type
1 [V, D] = eig(A);
and it will give you two matrices the same size as A. D is a diagonal matrix with the eigenvalues of A on its
diagonal, and V is a matrix with eigenvectors of A as its columns (with the correct ordering, such that the
i’th column of V is the eigenvector corresponding to the i’th diagonal element of D. □
1.6 Diagonalization
The process of diagonalization is one in which a square matrix is “transformed” into a diagonal one using a
change of coordinates.
Consider a diagonalizable matrix A ∈ Rn×n (which you know what it means from the last section, right?).
As always, we look at A as a map from Rn to Rn :
y = Ax (1.18)
22
ŷ = Âx̂
Let’s look at  = V−1 AV more closely. First, let us look at the product
AV = A v1 v2 · · · vn
Recall from Eq. (1.1) that this is equal to
AV = Av1 Av2 ··· Avn
but vi are not any vectors, they are the eigenvectors of A, so this simplifies to
AV = λ1 v1 λ2 v2 · · · λn vn
λ1 0 · · · 0
0 λ2 · · · 0
= v1 v2 · · · vn . .. .. ..
.. . . .
0 0 · · · λn
Can you convince yourself of the last equality? Notice that this is nothing but the product of two matrices,
so you can first break it into n matrix-vector products from Eq. (1.1), each of which is a linear combination
from Eq. (1.2). The matrix
λ1 0 · · · 0
0 λ1 · · · 0
Λ= .
.. .. ..
.. . . .
0 0 ··· λn
is an important matrix for us, because it contains the eigenvalues of A on its diagonal. Using this new
notation, we get
AV = VΛ
or, in other words,
 = Λ
This is called “diagonalization”, and Λ is also called the diagonalization/diagonalized version of A.
If these seem like a lot to digest,notice that
this is what we did at the beginning of Section 1.5 where we
5 −1
were analyzing what the matrix does to vectors in the plane. After going step by step through
−1 5
the extraction of eigenvalues and eigenvectors, at the end we decomposed any vector x into its components
along v1 and v2 and used the fact that in this new “coordinate system”, A becomes a pure scaling in each
direction, which is precisely what the diagonal matrix Λ does. Diagonalization will be a very valuable tool
later in the study of linear systems.
23
Exercise 1.6.1 (Diagonalization) Diagonalize these matrices (find the eigenvalues and eigenvectors and
check  = V−1 AV against Λ):
−4 1 −2
• A= 0 −4 0
1 4 −2
−1 2 1
• A= 2 −4 −2
1 −2 −1
□
24