Linear Algebra 1
Linear Algebra 1
Keshav Dogra∗
The following material is based on Chapter 1 of Sydsaeter et al., “Further Mathematics for
Economic Analysis” and Sergei Treil, “Linear Algebra Done Wrong” (available at http://www.
math.brown.edu/~treil/papers/LADW/LADW.html)
a11 a12 ... a1n
a21 a22 ... a2n
A = (aij )m×n = .
.. .. ..
. .
am1 am2 ... amn
where aij denotes the element in the ith row and the jth column.
A n-vector is an ordered n-tuple of numbers. We can think of an n-vector as either a 1 × n
matrix (a row vector ) or as an n × 1 matrix (a column vector ).
An m × n matrix A can be written as a set of column vectors or as a set of row vectors. That
is,
α1
α2
A= a1 a2 ... an = .
..
αm
∗ Department of Economics, Columbia University, kd2338@columbia.edu
1
where
a1i
a2i
ai = .
..
ami
The dot product (also called the inner product) of two vectors a = (a1 , a2 , ..., an ), b =
(b1 , b2 , ..., bn ) is defined as
n
X
a·b= ai bi
i=1
If a and b are regarded as column vectors, we can write the dot product as a0 b. The dot product
has the properties
This means that the columns of C are linear combinations of the columns of A. In particular, the
jth column of C is
c1j a11 a12 a1n
c2j a a a
21 22 2n
= b1j + b2j + ... + bnj
.
..
... ... ...
cmj am1 am2 amn
2
Also, the rows of C are linear combinations of the rows of B; the ith row of C is
(ci1 , ci2 , . . . , cip ) = ai1 (b11 , b12 , . . . , b1p ) + ai2 (b21 , b22 , . . . , b2p ) + . . . + ain (bn1 , bn2 , . . . , bnp )
The product AB is defined only if the number of columns in A equals the number of rows in
B. If this is true we say the matrices are conformable.
Matrix multiplication satisfies
(AB)C = A(BC)
A(B + C) = AB + AC
(A + B)C = AC + BC
Matrix multiplication is not commutative: that is, AB 6= BA. AB = 0 does not imply that either
A or B equals the matrix of zeros, 0.
A matrix is square if it has the same number of rows and columns. The nth power of a square
matrix is defined as
An = AA . . . A (n times)
A square matrix is diagonal if all its off-diagonal elements are zero. We sometimes write
d11 0 ... 0
0 d22 ... 0
diag{d1 , d2 , ..., dn } = .
.. .. ..
. .
0 0 ... dnn
The nth power of a diagonal matrix diag{d1 , d2 , ..., dn } is diag{dn1 , dn2 , ..., dnn }.
The identity matrix of order n, denoted by In or I, is the n × n matrix with ones on the main
diagonal and zeros elsewhere,
1 0 ... 0
0 1 ... 0
In = diag{1, 1, ..., 1} =
.. .. ..
. . .
0 0 ... 1
IA = A, BI = B
3
A square matrix is upper triangular if all entries below the main diagonal are zero. A square
matrix is lower triangular if all entries above the main diagonal are zero. A matrix is triangular
if it is either lower triangular or upper triangular.
The transpose of a matrix A, denoted by A0 or AT , is obtained by reversing rows and columns
(if B = A0 , aij = bji ). The following rules apply:
• If two rows (or two columns) of A are interchanged, its determinant changes sign but its
absolute value remains the same;
• The value of |A| remains unchanged if a multiple of a row is added to another row (or if a
multiple of a column is added to another column);
• The determinant of a (lower or upper) triangular matrix equals the product of its diagonal
entries, a11 a22 ...ann . In particular, this holds for diagonal matrices, and the determinant of
the identity matrix equals 1.
• If columns of two square matrices are all identical (x1 , ...., xn ) except for their kth column,
which is equal to uk for one matrix and vk for the other, then the sum of these matrices’
4
determinants is equal to the determinant of the matrix with columns all equal to x1 , ...., xn
except for the kth column, which equals uk + vk . That is,
x1 ··· uk + vk ··· xn = x1 ··· uk ··· xn + x1 ··· vk ··· xn
Having defined the cofactor, we can provide an expression for the determinant. For a n × n
matrix A, the determinant |A| can be defined by expanding along any row i = 1, 2, ..., n:
5
The inverse A−1 of a n × n square matrix A is the matrix B that satisfies
AB = In , BA = In
The inverse of a matrix A may or may not exist; if it does exist, we say A is invertible. A−1 exists
if and only if |A| =
6 0. The unique inverse of A, if it exists, is given by
A11 A21 ... An1
1 A12 A22 ... An2
−1
A = adj(A), where adj(A) = .
|A| .. .. ..
. .
A1m A2m ... Ann
where Aij is the ijth cofactor defined above. Note that the order of subscripts in the adjoint
matrix adj(A) is the opposite of what you might expect. In practice, this formula is almost never
useful except in the special case of 2 × 2 matrices:
−1
a b 1 d −b
=
c d ad − bc −c a
AA−1 = A−1 A = I
(A−1 )−1 = A
(AB)−1 = B−1 A−1
(A0 )−1 = (A−1 )0
(A + B)−1 = A−1 (A−1 + B−1 )−1 B−1
6
which we can write more compactly as Ax = b, where the matrix A and vector b are parameters
and we want to find vectors x that solve this system of equations. The system has a unique solution
if and only if |A| =
6 0. In this case, the solution is
where Aj is the matrix formed by replacing the jth column of A with the vector b.1
If b equals the vector of zeros 0, so we have Ax = 0, the system is homogeneous. A ho-
mogeneous system always has the trivial solution x = 0. It has nontrivial solutions if and only if
|A| = 0.
2 Vectors
A n-vector is an ordered n-tuple of numbers. We can think of an n-vector as either a 1 × n matrix
(a row vector ) or as an n × 1 matrix (a column vector ).
Let S be a set of n × 1 vectors. S is a vector space in n-dimensional space if
Note that:
0
..
• If α = 0, we get the null vector , so any vector space S contains the null vector;
.
0
• If x1 , x2 ∈ S and α, β ∈ R, then αx1 + βx2 ∈ S. (In fact, this condition implies 1. and 2.)
We can give a more general definition of a vector space, in which the objects we call ‘vectors’
are not necessarily n × 1 arrays of numbers. A vector space V is a collection of objects called
vectors, closed under two operations2 , addition of vectors and scalar multiplication, such that the
following properties hold:
• v + w = w + v for all v, w ∈ V ;
7
• For every vector v ∈ V there exists a vector w such that v + w = 0. We denote this ‘additive
inverse’ vector as −v.
• 1v = v for all v ∈ V
For example, the space Pn of all polynomials of degree at most n, consisting of all polynomials
of the form
p(t) = a0 + a1 t + . . . + an tn ,
is a vector space.
If S, T are vector spaces with S ⊂ T , then S is a vector subspace of T .
Let V be a vector space, and v1 , v2 , ..., vp ∈ V a collection of vectors. A linear combination
Pp
of these vectors is a sum k=1 αk vk . A system of vectors v1 , v2 , ..., vp ∈ V is called a basis if any
vector v ∈ V has a unique representation as a linear combination of these vectors.
For example, take the vector space V = Rn . Consider the vectors
1 0 0
0 1 0
e1 = , e2 = . , ..., en = .
.. .. ..
.
0 0 1
v = x1 e1 + x2 e2 + . . . + xn en
8
that is, if the linear span of these vectors is equal to the whole space V - then we say these vectors
span V , and we call them a spanning system (generating system ; complete system). Note
that every basis is a spanning system, but not every spanning system is a basis.
The dimension of a vector space V , denoted dim V , is the number of vectors in a basis of V .
(It can be shown that any basis of V has the same number of vectors.)
2. hx, yi = hy, xi
3. hx + y, zi = hx, zi + hy, zi
4. hαx, yi = αhx, yi
The most common example of an inner product, in a Euclidean space, is the dot product. The
dot product of two vectors a = (a1 , a2 , ..., an ), b = (b1 , b2 , ..., bn ) is defined as
n
X
a·b= ai bi
i=1
If a and b are regarded as column vectors, we can write the dot product as a0 b. The dot product
has the properties
The Cauchy-Schwarz inequality states that if x and y are elements of a vector space S, and
hx, yi is an inner product, then
(hx, yi)2 ≤ hx, xi · hy, yi
Two vectors a and b are orthogonal (we denote this by a ⊥ b) if their inner product is zero, a·b = 0
9
2.2 Normed vector spaces
A normed vector space is a vector space S, together with a function (called a norm) ||·|| : S → R,
such that for all x, y ∈ S and α ∈ R:
√ q
||a|| = a·a= a21 + a22 + . . . + a2n
It is standard to view any normed vector space (S, || · ||) as a metric space where the metric is taken
to be ρ(x, y) = ||x − y||.
3 Linear Independence
The n vectors a1 , a2 , ..., an are linearly dependent if some linear combination of these vectors
equalis zero; that is, if there exist numbers c1 , c2 , ..., cn not all zero, such that
c1 a1 + c2 a2 + ... + cn an = 0
10
If this equation holds only in the case when c1 = c2 = ... = cn = 0, then the vectors are linearly
independent.
Equivalently, a set of vectors is linearly independent if no one of them can be expressed as a
linear combination of the others.
x1 a1 + . . . + xn an = b
where aj is the jth column of A. We can prove that if the system has more than one solution, the
vectors a1 , ..., an are linearly dependent. Suppose the system has two solutions, u and v. Then
u1 a1 + . . . + un an = b and v1 a1 + . . . + vn an = b. Subtracting one from the other,
If the two solutions are different, (u1 − v1 ), ..., (un − vn ) are not all equal to zero, and the vectors
are linearly dependent. Thus if the system has more than one solution, the vectors a1 , ..., an are
linearly dependent. Equivalently, if the vectors are linearly independent, the system has at most
one solution.
A set of vectors v1 , ..., vn in a vector space V is a basis iff it is linearly independent and spans
V.
11
The row rank of A is the dimension of the row space of A.4 The column rank of A is the
dimension of the column space of A.
Theorem 4.1. The row rank and the column rank of a matrix A are equal. We call their value
the rank of A, r(A).
FMEA gives an alternative, equivalent definition: r(A) is the largest number of column vectors
in A that form a linearly independent set. This definition is equivalent because the maximum
number of linearly independent vectors in a set equals the dimension of the linear span of that set.
The rank of a matrix equals the rank of its transpose: r(A) = r(A0 ). It follows that the rank
of a matrix is less than or equal to the number of its rows or columns (whichever is smallest). A
matrix has full rank if its rank is equal to the number of rows or columns (whichever is smallest).
A minor of A of order k is obtained by deleting all but k rows and k columns, and taking the
determinant of the resulting k × k matrix.
The rank r(A) of a matrix A equals the order of the largest minor of A that does not equal
zero.
The rank of a matrix is not affected by elementary row or column operations. Elementary row
(column) operations are:
One way to find the rank of a matrix is to perform row or column operations until the number of
linearly independent row or column vectors is clear.5
12
Proof. Take x1 ,xh such that Ax1 = b, Axh = 0. Let x = x1 + xh . Then we have
Ax = A(x1 + xh )
= A]x1 + Axh )
=b+0=b
Axh = A(x − x1 ) = b − b = 0
A corollary of this theorem is that, assuming Ax = b has some solution, this solution is unique
if and only if Ax = 0 has a unique solution (namely, x = 0).
Consider again the system of m equations in n unknowns
Either r(Ab ) = r(A), or r(Ab ) = r(A) + 1. (In general, adding columns to a matrix can never
decrease the rank; and adding one column can increase the rank - the maximum number of linearly
independent columns - by at most one.)
Theorem 5.2. The system Ax = b has at least one solution if and only if the rank of A equals
the rank of Ab .
13
Corollary 5.3. If A is n × n and full rank, Ax = b has a unique solution. (We already knew that,
if this system had a solution, that solution would be unique; now, we know it does indeed have a
solution.)
Theorem 5.4. Suppose the system has solutions with r(A) = r(Ab ) = k.
• If k < m (the rank of these matrices is less than the number of equations) then m − k
equations are superfluous: if we choose any subsystem of equations corresponding to k linearly
independent rows, any solution to these equations also satisfies the remaining m−k equations.
• If k < n (the rank of these matrices is less than the number of unknowns) then there exist
n − k variables that can be freely chosen, with the values of the remaining k variables uniquely
determined by the choice of these n−k free variables. The system has n−k degrees of freedom.
14