0% found this document useful (0 votes)
50 views115 pages

Combined Notes

Uploaded by

lyr13903570599
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views115 pages

Combined Notes

Uploaded by

lyr13903570599
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 115

Lecture 0

A Brief Revision of Linear Algebra

In this section we will revise some of the basic operations of linear algebra. Most of these operations
should be familiar to you from PHYS 10071. However, we will also introduce some new concepts that
will be useful in the rest of the course, for example working with n ⇥ m matrices. Make sure that you
are comfortable with these concepts and are well practiced with them as they will be used throughout
the course.
By the end of this lecture you should be able to:

• Perform basic operations on vectors and matrices, e.g. matrix multiplication and addition.

• Calculate the determinant of a matrix.

• Transpose, Trace and Inverse of a matrix.

• Apply matrix algebra to tackle problems in physics.

0.1 Matrix algebra


Lets start by defining what a matrix is:

Definition 0.1: Matrix


A matrix is a rectangular array of numbers. For example, the following is an n ⇥ m matrix:
0 1
a11 a12 · · · a1m
B a21 a22 · · · a2m C
B C
A=B . .. .. .. C , (0.1)
@ .. . . . A
an1 an2 · · · anm

where aij is the element in the ith row and jth column, and can in general be complex.

A vector can be represented in terms of an n ⇥ 1-matrix,


0 1
v1
B v2 C
B C
v = B . C, (0.2)
@ .. A
vn

and is often referred to as a column vector.

6
Chapter 0. A Brief Revision of Linear Algebra 7

0.1.1 Mathematical operations on matrices


Many of the operations that we will perform on matrices are similar to those for numbers but with a
few subtle differences. In this section we will recap a few of the basic operations that will be relevant
to this course, make sure to combine reading these notes with working through the examples sheets.

Addition and subtraction:


Matrices can be added together element by element if they have the same shape,

C=A+B ) cij = aij + bij . (0.3)

or in matrix notation, if A and B are n ⇥ m matrices, then we have:


0 1 0 1
c11 c12 · · · c1m a11 + b11 a12 + b12 · · · a1m + b1m
B c21 c22 · · · c2m C B C
B C B a21 + b21 a22 + b22 · · · a2m + b2m C
B .. .. .. .. C B
= .. .. .. .. C. (0.4)
@ . . . . A @ . . . . A
cn1 cn2 · · · cnm an1 + bn1 an2 + bn2 · · · anm + bnm

Addition and subtraction of matrices is commutative and associative,

A + B = B + A, (commutative) (0.5)
(A + B) + C = A + (B + C), (associative). (0.6)

Scalar multiplication:
When multiplied by a scalar , each element of a matrix is multiplied by that scalar, such that:

C= A ) cij = aij . (0.7)

Multiplying a matrix by a scalar is commutative and associative,

A=A , (commutative), (0.8)


( µ)A = (µA), (associative). (0.9)

Matrix multiplication:
The biggest difference between matrix algebra and regular algebra is found in matrix multiplication,
which is defined as follows: For a n ⇥ m matrix A and a m ⇥ p matrix B,
m
X
C = AB ) cij = aik bkj , (0.10)
k=1

where C is a n ⇥ p matrix. Each element of the matrix C is given by the sum of the products of the
rows of A with the columns of B.
An important distinction to make is that matrix multiplication is not commutative, that is, AB 6=
BA in general. It is, however, distributive and associative,

A(B + C) = AB + AC, (distributive) (0.11)


(AB)C = A(BC), (associative) (0.12)

As an example, consider the following 2 ⇥ 2 matrices,


✓ ◆ ✓ ◆
a11 a12 b b
A= , B = 11 12 . (0.13)
a21 a22 b21 b22
Chapter 0. A Brief Revision of Linear Algebra 8

The product of these two matrices is given by,


✓ ◆
a11 b11 + a12 b21 a11 b12 + a12 b22
C = AB = . (0.14)
a21 b11 + a22 b21 a21 b12 + a22 b22

You should try and follow this through yourself, and practice with the following exercises:
Exercise 0.1:
Consider the following matrices,
✓ ◆ ✓ ◆ ✓ ◆
1 2 2 0 0 1
A= , B= , C= . (0.15)
2 1 0 0 1 0

find the following products:

(i) AB

(ii) BA

(iii) ABC

(iv) CBA

Exercise 0.2:
The Pauli matrices may be represented as follows:
✓ ◆ ✓ ◆ ✓ ◆
0 1 0 i 1 0
x = , y = , z = . (0.16)
1 0 i 0 0 1

Prove the following commutator identity,

[ x, y] = 2i z, (0.17)

where [A, B] = AB BA is the commutator of two matrices. The anti-commutator of two


matrices is defined as {A, B} = AB + BA; find the anti-commutator of the Pauli matrices x
and y .

0.1.2 Special operations on matrices


With the basic operations of matrix algebra defined, we can now consider a few special operations that
are useful in physics, and will crop up in our study of vector spaces:

Transpose:
The transpose of a matrix A is defined as the matrix AT such that [aT ]ij = aji , that is, the rows of
A become the columns of AT and vice versa. It has an interesting property on a product of matrices.
For example, consider the product of two matrices A and B, C = AB. Taking the transpose of C we
find,
Xm
T T T T T
C = (AB) = B A ) [c ]ij = ajk bki . (0.18)
k=1

Exercise 0.3:

Let A and B be n ⇥ n matrices, and C = AB. Prove that CT = (AB)T .


Chapter 0. A Brief Revision of Linear Algebra 9

The Adjoint or Hermitian conjugate:


A very important operation that we will encounter in this course is the adjoint or Hermitian conjugate
of a matrix A, which we denote as A† . You have come across this operator in a variety of different
context, but most notably in quantum mechanics, where you applied the Hermitian conjugate to
wavefunctions and differentiable operators.
In the case of a complex matrix A, the adjoint is defined as the conjugate transpose,

A† = (AT ). (0.19)

where we denote the complex conjugate as a (·). A matrix is said to be Hermitian if A† = A.


This operation probably seems quite different to the Hermitian conjugate you used in your quantum
mechanics course last semester, however, in this course we will demonstrate that it is precisely the
same operation.

Trace:
The trace of a matrix A is defined as the sum of the diagonal elements of A, that is,
n
X
Tr(A) = aii . (0.20)
i=1

A crucial property of the trace is that it is invariant under cyclic permutations, that is, Tr(ABC) =
Tr(BCA) = Tr(CAB). If you plan to study quantum mechanics in the future, this property comes
up repeatedly. Try your hand at proving this property for yourself.

Exercise 0.4:
Prove that the trace of a matrix is invariant under cyclic permutations,

Tr(ABC) = Tr(BCA) = Tr(CAB). (0.21)

It is also important to note that the trace of a matrix is equal to the sum of its eigenvalues, which will
come in handy later in the course.

Determinant:
The determinant of a matrix A is defined as,
X n
Y
det(A) = sgn( ) ai (i) , (0.22)
2Sn i=1

where Sn is the set of all permutations of the set {1, 2, . . . , n}. The above general case is a bit
complicated, so lets consider the of a general 3 ⇥ 3 matrix (which is the largest matrix I will ask you
to find a determinant for), 0 1
a11 a12 a13
A = @a21 a22 a23 A . (0.23)
a31 a32 a33
The determinant of A is then given by,
a22 a23 a a a a
det(A) = a11 + a12 23 21 + a13 21 22 ,
a32 a33 a33 a31 a31 a32 (0.24)
= a11 (a22 a33 a23 a32 ) + a12 (a23 a31 a21 a33 ) + a13 (a21 a32 a22 a31 ).

Matrix inverse:
For a general matrix A, we can define the inverse matrix as the matrix A 1 such that AA 1 = 1.
Chapter 0. A Brief Revision of Linear Algebra 10

It is important to note that not all matrices have an inverse, for example, the zero matrix 0 does not
have an inverse. If a matrix does have an inverse, then it is unique. The inverse of a matrix can be
found by using the formula,
1
A 1= CT , (0.25)
det(A)
where CT is the transpose of the matrix of cofactors C corresponding to A. I don’t want to go into
any detail on how to calculate the inverse in with this method, later in the course we will learn how to
do this with eigenvalues and the spectral decomposition of a matrix. However, it is important to note
that if det(A) = 0, then the above formula blows up, and thus A is not invertible.

0.1.3 Matrix algebra in Physics: The Lorentz transformation


As an example of how matrix algebra can be used in physics, we will consider the Lorentz transformation
in special relativity. In special relativity, the Lorentz transformation is a linear transformation that
acts on a given 4-vector. Consider the case of an event in spacetime, in the intertial frame S we can
represent this event as a 4-vector, 0 1
ct
BxC
x=B @yA,
C (0.26)
z
where c is the speed of light and t is time. If we want to transform to another inertial frame S 0 ,
moving at a velocity v in the x-direction relative to S, then we can apply the Lorentz transformation
the 4-vector x to obtain
x0 =⇤( )x,
0 01 0 10 1 0 1
ct 0 0 ct (ct x)
B x0 C B 0 0C B C B ct)C (0.27)
B C =B C B x C = B (x C,
@ y0 A @ 0 0 1 0 A @ y A @ y A
z0 0 0 0 1 z z
p
where = v/c and = 1/ 1 2 . To reverse this mapping between S 0 and S, we can notice that

the inertial frame S is moving at a velocity v relative to an observer in S 0 . Therefore, we can apply
a second Lorentz transformation, this time with ! to obtain:

x = ⇤( )x0 . (0.28)

Now if we substitute in Eq. 0.27 into the above equation, we find,

x = ⇤( )⇤( )x. (0.29)

The above expression gives us a natural definition for the identity matrix, ⇤( )⇤( ) = ⇤( )⇤( ) =
1, where 1 is the identity matrix. We say that ⇤( ) is the matrix inverse of ⇤( ) and vice versa.
Chapter 0. A Brief Revision of Linear Algebra 11

Exercise 0.5:
Two spacetime events x1 and x2 occur in an intertial frame S, where
0 1 0 1
0 ct
B0C B C
x1 = B C , x2 = Bu cos(↵)C . (0.30)
@0A @ u sin(↵) A
0 0

Using matrix algebra, find the corresponding spacetime coordinates in a frame S 0 , which is
moving at a velocity vx in the x-direction and vy in the y-direction relative to S. The Lorentz
transformation for motion in the for the above motion is,
0 1
x y 0
2
B 1+ ( 1) x ( 0 1) x y
0C
0 B x 2 2 C
x =B ( 1) ( 1) 2 C (0.31)
@ y
0
2
x y
1+ 2
y
0A
0 0 0 1

where x = vx /c, y = vy /c and 2 = 2


x + y.
2

0.2 Eigenvalues and eigenvectors


As a final point, we are going to recap the concept of eigenvalues and eigenvectors of a matrix. We will
return to this topic later in the course and build some conceptual understanding of what these objects
are. However, it is crucial that you are able to:

1. Calculate the eigenvalues and eigenvectors of a matrix.

2. Normalise the eigenvectors of a matrix.

Your life for the next 2 years will be much easier if you are able to do this. From this point onwards,
eigenvalue problems will crop up time and again. Indeed, if you are able to diagonalise a matrix then
you can solve any linear system of equations, differential or otherwise.

Definition 0.2: Eigenvalues and eigenvectors

The eigenvalues of a matrix A are the values that satisfy the linear equation,

Av = v , (0.32)

where v is the corresponding eigenvector. To find the eigenvalues of an n ⇥ n matrix, we must


solve the characteristic equation,
det(A 1) = 0, (0.33)
where 1 is the identity matrix.

The procedure for calculating eigenvalues is most easily understood through examples. Consider
the 2 ⇥ 2 matrix, ✓ ◆
0 1
x = . (0.34)
1 0
This matrix is one of the so-called Pauli matrices, which will come up time and again in this course,
Chapter 0. A Brief Revision of Linear Algebra 12

and are used extensively in quantum theory. The characteristic equation for this matrix is given by,
✓ ◆
1
det( x 1) = det = 2 1 = 0, (0.35)
1
which has the solutions = ±1. The corresponding eigenvectors then found by solving the equations,
✓ ◆✓ ◆ ✓ ◆
0 1 v1 v
=± 1 , (0.36)
1 0 v2 v2
which yields v2 = ±v1 , therfore the eigenvectors are:
✓ ◆
1
v± = v1 . (0.37)
±1
Note that the eigenvectors are not unique, the prefactor v1 can be any complex number v± will still
be an eigenvector of x . This is a general feature of eigenvectors, they are only defined up to a
multiplicative constant. However, we quite often place constraints on eigenvectors based on physical
intuition. For example, in quantum mechanics we require eigenvctors to be normalised in order to be
valid states of a quantum system (i.e. the eigenvector corresponds to a wavefunction which has be
normalised). The normalisation condition for an eigenvector v± is given by dot product of the vector
with itself,
hv± |v± i = 1. (0.38)
where we have used the notation h·|·i to denote the dot product of two vectors, where the left hand
side is conjugated. Or to put it mathematically,
hv± |v± i ⌘ v̄± · v± ,
where the bar over the top of vector or variable denotes a complex conjugate ā ⌘ a⇤ . This change
in notation may seem strange, but I promise, the reason for this will become clear. Using the above
expression, we find v1 = p12 , and so the eigenvectors of the matrix x are given by:
✓ ◆
1 1
v± = p . (0.39)
2 ±1

Exercise 0.6:
Find the normalised eigenvalues and eigenvectors of the following matrices:
✓ ◆
0 i
(i) y =
i 0
✓ ◆
1 0
(ii) z =
0 1

Note that in part (i) you will need to use the complex conjugate when normalising the eigen-
vectors.

For good measure, let’s try another example, but with a more complicated 3 ⇥ 3 matrix,
0 1
0 1 0
1
S = p @1 0 1A . (0.40)
2 1 1 0
The characteristic equation for this matrix is given by,
0 1
1 0
det(S 1) = det @ 1 1 A= ( 2
1) = 0, (0.41)
0 1
Chapter 0. A Brief Revision of Linear Algebra 13

which has the solutions = 0, ±1. We can now find the corresponding eigenvectors by solving the
equations, 0 10 1 0 1
0 1 0 v1 v1
1
p @1 0 1 A @v 2 A = @v2 A . (0.42)
2 0 1 0 v3 v3
Here I will show you how to find the eigenvalue for = +1, and leave the others as an exercise. For
= +1, by carrying out the matrix multiplication we find the following equations,
p
v2 = 2v , (0.43)
p 1
v1 + v3 = 2v , (0.44)
p 2
v2 = 2v3 . (0.45)

Following the process of elimination


p used for the 2⇥2 case, we write the eigenvector in terms of only
v1 , finding that v1 = v3 and v2 = 2v1 . Therefore, the eigenvector is given by,
0 1
p1
v+ = v1 @ 2A . (0.46)
1
p
Normalising this eigenvector, we find that v1 = 1/ 4 = 1/2, and so the normalised eigenvector is given
by, 0 1
1
1 @p A
v+ = 2 . (0.47)
2
1

Exercise 0.7:
Find the normalised eigenvectors associated to the = 0, 1 eigenvalues of the matrix,
0 1
0 1 0
1
S = p @1 0 1A . (0.48)
2 0 1 0

0.3 Further reading


If you want to read more about the topics covered in this lecture, I would recommend the following
resources:

• For matrix algebra Chapter 8, section 3 and 4 in Riley, Hobson, and Bence.

• For matrix operations such as the trace and determinant, section 8.8 and 8.9 in Riley, Hobson,
and Bence.

• For eigenvalues and eigenvectors, section 8.13 in Riley, Hobson, and Bence.

There are are additional exercises at the end of this chapter to practice if you need to.
Lecture 1
Vector Spaces

Welcome to PHYS20672. This course is all about vector spaces, complex numbers, and linear algebra.
It is a course that is both abstract and practical, and will form the foundation of your studies in physics
and mathematics. Particularly if you have ambitions to study quantum mechanics, the language of
vector spaces is essential.

By the end of this lecture you should be able to:

• Define a vector space in terms in abstract algebraic terms.

• Determine whether a set of objects form a vector space.

1.1 Housekeeping
A few things to note before we begin:

• This course will start with vector spaces and before moving on to complex analysis.

• Dr Jake Iles-Smith will be teaching the first half of the course, his office number is 7.26 Schuster.

• Dr Mike Godfrey will be teaching the second half of the course, his office number is 7.19 Schuster.

• Lectures will be Mondays 11-12, and Thursdays 13-14.

• Jake will be available for questions for the hour after the Monday and Thursday lecture.

• There are 10 problem sheets, including one sheet of revision surrounding linear algebra. You
shoud work through these sheets in your own time.

• Solutions to problem sheets will be posted on blackboard the week after the problem sheets have
been released.

• There is a Piazza message board for this course. Please engage with this and ask questions, but
as always, please keep things constructive.

• A change from previous years is that exam questions will be mixed! You will not be able to avoid
questions on vector spaces, so please keep abreast with this part of the course!

• The exam will cover definitions, proofs, as well as the application of knowledge covered in the
lectures. If it has a brightly coloured box around it, it’s probably important!

14
Chapter 1. Vector Spaces 15

For the vector spaces portion of the course I am trialling two things this year: 1) Some new lecture
notes– there will be typos, please let me know if you find any. 2) I will be recording a few videos to go
through examples in more detail. Generally I will also repeat these examples in the lecture notes for
those who prefer. You can post feedback on the Piazza.
Disclaimer: I will assume that you are comfortable with basic operations in linear alge-
bra, such as: matrix multiplication; trace, transpose, and determinant of a matrix; and
eigenvalue problems. If you can’t remember how to do these operattions, then I have
placed some revision material in the blackboard folder for week 0. Please practice these
skills!

1.1.1 Suggested reading


For the first part of the course on vector spaces, two suggested texts are

• K. F. Riley, M. P. Hobson and S. J. Bence, Mathematical Methods for Physics and Engineering,
3rd ed. (Cambridge, 2006)

• R. Shankar, Principles of Quantum Mechanics, 2nd ed. (Springer, 1994). [This is also the
recommended book for PHYS30201 Mathematical Fundamentals of Quantum Mechanics next
year.]

Riley et al give a nice introduction to vector spaces and their algebra from a general perspective, but
do not discuss function spaces. Shankar is pitched more towards the elements of linear algebra required
to study quantum mechanics. From the outside, Shankar uses Dirac notation, which we will cover in
lecture 2 and 3. eBooks for Riley and Shankar can be found through the library.
The primary recommended textbook for the second part of the course is the "Schaum’s Outline"
by Spiegel et al. The lectures follow this book fairly closely. The weekly outlines contain detailed
references to sections from it. It’s not written in a standard textbook style – it’s more like lecture
notes. The book has many examples for practice (as well as the "640 fully solved problems" advertised
on the cover). It’s been the recommended book to go to for this subject for many years. (I remember
using it as an undergraduate.)

• M. R. Spiegel, S. Lipschutz, J. Schiller and D. Spellman, Complex Variables (Schaum’s Outlines),


2nd ed. (McGraw-Hill, 2009)

If you are interested in more advanced aspects we touch upon in this course, then you can have a look
at

• Atland and von Delft, Mathematics for Physicists: Introductory Concepts and Methods, (Cam-
bridge, 2019)

This is a slightly more advanced book, but treats the mathematical concepts discussed in this course
carefully and in rigorous manner.
Chapter 1. Vector Spaces 16

1.2 What’s a vector?


Vectors and vector spaces are both concepts that you have encountered before, albeit often steeped
in physical context. For example, in your courses on dynamics you have made extensive use of three
dimensional vectors of the form
v = vx î + vy ĵ + vz k̂. (1.1)
This vector consists of three real numbers the defining the projection in the x, y, and z directions,
and lives in a vector space termed Euclidean space. These vectors can be added and subtracted, they
have both length (or equivalently magnitude) and direction, all of which have a clear geometrical
interpretation in three-dimensional space.
In this course, we will consider a much broader class of vector spaces. In general our new vector
will not necessarily be real, can in general be N -dimensional, and may not even look like a vector at
all! For example, as we shall see, the set of functions f (x) can be considered as vectors in a vector
space, and as such we can apply all the tricks of linear algebra. In any case, we will quickly find that
our tried and tested geometrical interpretation of vectors breaks down.
While this may sound somewhat abstract, higher-dimensional vector spaces are the foundation of
modern physics. In fact, you have already come across numerous examples in your studies so far:
relativistic mechanics is described in terms of 4-dimensional vectors which live in a 4D vector space;
though we hide this fact from you in your second year quantum mechanics course, the wavefunction
of a particle can be expressed in terms of a vector in a high-dimensional space1 ; even solutions to
differential equations can be phrased in terms of vector spaces.
This section of the course will equip you with the tools necessary to phrase physical problems in
terms of vector spaces, and use the powerful tools of linear algebra to solve them.

1
In most cases you have encountered, these are actually infinite dimensional spaces!
Chapter 1. Vector Spaces 17

1.3 Abstract Vector spaces


To begin with, we need some mathematical jargon to full describe vector spaces, lets start by defining
what we mean by a field:

Definition 1.1: Field


A field F is a set with two operations defined on it: addition and multiplication.

A field in mathematics is simply a set of numbers which has addition and multiplication defined
upon it. For example, the set real numbers F = R is a field, as is the set of complex numbers F = C.
These are the two fields that we will be working with in this course.
We can now define precisely what we mean by a ‘vector space’:
Definition 1.1: Vector space

Let F be a field. A vector space V over F is a set of objects,i.e. vectors, u, v, w · · · is that


satisfy the following properties:

(i) Addition: The set is closed under addition, such that if u, v 2 V , then w = u + v 2 V .
In other words, if two vectors in a vector space are added together, the resulting object is also
a vector within the same space. This addition operation is commutative:

a + b = b + a, (1.2)

and associative:
(a + b) + c = a + (b + c). (1.3)

(ii) Scalar Multiplication: The set is closed under multiplication by a scalar (i.e. any complex
number). So that for u 2 V , then u 2 V for 2 F, where F is the field over which the vector
space is defined. Scalar multiplication is associative:

(µu) = ( µ)u for , µ 2 F, (1.4)

as well as distributive:
( + µ)u = u + µu for , µ 2 F. (1.5)
Multiplication by unity leaves a vector unchanged 1 ⇥ u = u.

(iii) Null vector: There exists a null vector 0 defined such that u + 0 = u.

(iv) Negative vector: All vectors have a corresponding negative vector u, which satisfies

u + ( u) = 0. (1.6)

There are two things to note about the above definitions. First is that subtraction is only defined
indirectly, through the definition of the negative vector, i.e. we have

u v = u + ( v). (1.7)

Second, there is no definition of length. This is curious as no doubt you have heard the phrase ‘vectors
have both direction and magnitude’ countless times. As defined above, the notion of vectors is much
more general than this statement suggests. Let’s look at some examples of vector spaces.
Chapter 1. Vector Spaces 18

The first example is the real vectors in three dimensions that we have already encountered. This
is typically denoted as R3 , where the superscript gives the dimensionality of the vector space, and the
R is the field over which the vector space is defined. We can check whether the above definition of a
vector space holds for R3 .

(i) First, if we add two vectors together u, v 2 R3 , then the result is still a vector in R3 . So R3 is
closed under addition.

(ii) Second, if we multiply a vector by a real scalar, then the result is still a vector in R3 . If we
multiply a vector by a real scalar, then we still have a vector in R3 , therefore it is closed under
addition.

(iii) The null vector is simply the vector with all components equal to zero

(iv) The negative vector is simply the vector with all components multiplied by 1.

Therefore, R3 is a vector space. Phew!


Note that we can make exactly the same arguments for RN , where N is any positive integer. In
fact, we can make the same arguments for CN , where C is the set of complex numbers. In this case,
the vectors are a collection of N complex numbers.
It is worth noting that we have to be careful when considering the field over which the vector space
is defined. For example, if we multiply a vector v 2 RN by a complex number 2 C, then the result
is no longer a vector in RN . The converse is not true, however, as multiplying a vector in CN by a
real number is still a vector in CN . This is because R ⇢ C, or in other words, the real numbers are a
subset of the complex numbers.
Try and apply the above definition to the following examples:

Exercise 1.1
Consider the set of real 2x2 matrices, of the form,
✓ ◆

a= ,

where ↵, , , 2 R. Show that these matrices form a vector space over the real numbers.

Exercise 1.2
Show that the set of functions f (x), for x 2 [0, L], form a vector space.

1.4 Further reading


Questions 1-4 in problem sheet 1 cover the material in this lecture. Supporting material for these
lectures can be found in textbooks:

• Section 8.1 in Riley, Hobson and Bence. This is all written in vector notation as opposed to
Dirac notation covered in lecture 2 this week.

• Section 1.1-1.4 Shankar. Shankar uses Dirac notation from the outset, so it might be worth
waiting until the end of Lecture 3 before turning to this reference.
Lecture 2
Linear dependence and basis vectors

By the end of this lecture you should be able to:

• Define linear independence, and check whether a set of vectors are linearly independent.

• Introduce the notion of dimensionality.

• Define a basis vector, and show that any vector can be written as a linear combination of basis
vectors.

• Define and construct a subspace from a given basis.

2.1 Linear independence


Now that we have our space defined, the next important property of vector spaces we must consider
is linear independence. If two vectors are linearly independent, then neither can be written as a linear
combination of the other. This concept is best illustrated with an example: Consider the vectors,
✓ ◆ ✓ ◆
1 1
u= and w = . (2.1)
1 1

These vectors are linearly independent if the only solution to the equation ↵u+ v = 0, is for ↵ = = 0.
Substiting the vectors in, we obtain two equations,

↵+ =0,
↵ =0.

We therefore find that ↵ = and ↵ = , a contradition unless ↵ = = 0.


Let’s add a third vector now, ✓ ◆
3
w= . (2.2)
2
For this set of vectors all to be linearly independent, we require ↵u + v + w = 0, or in other words,
to satisfy the equations,

↵+ + 3 =0,
↵ + 2 =0.

Solving this, we find ↵ = 5 , and = . These equations are overdetermined, and therefore there
are an infinite number of solution parameterised by . More formally we define linear independence
as follows:

19
Chapter 2. Linear dependence and basis vectors 20

Definition 2.1: Linear independence

A set of vectors {ui for i = 1, 2, · · · , n} is linearly independent if the equation,


n
X
j uj = 0, (2.3)
j=1

has only one solution: i = 0 8 i. Otherwise, they are said to be linearly dependent.

2.2 Postulate of dimensionality and Basis vectors


So far, we have thrown around term dimension with no proper definition. So what precisely do we
mean by dimension of vector space?

Definition 2.2: Dimensionality

A set of vectors {ui for i = 1, 2, · · · , n} A vector space V has dimension N if it can accomodate
no more than N linearly independent vectors ui .

We often denote such a vector space by V N (R) if the vector space is real, and V N (C) if it is complex,
or just V N if we want to be vague!
Notice that we have already seen an example of this definition of dimensionality in action. Consider
the example 2-component vectors we used to explain linear dependence. In this case, we had the two
linearly independent vector u and v, but when we added the third vector w, they became linearly
dependent. This is because the vector space we are considering here is 2 dimensional, so it can only
accomodate 2 linearly independent vectors. So regardless of our choice of w the vectors would have be
linearly dependent!
In reading around the suggested texts, you may also come across the term span. This is a term
that is intimately linked to concepts of linear independence and dimensionality. So let me define it
here for good measure:

Definition 2.3: Span

The span of a set of vectors {ui for i = 1, 2, · · · , n} is the set of all vectors that can be written
as a linear combination of the ui .

So in our previous example, the span of {u, v} is the set of all vectors that can be written as a
linear combination of u and v. In this case, we found that upon adding a third vector w, we could
write w as a linear combination of u and v, we say that w is in the span of u and v. In fact we can
go further, and write any arbitrary vector in R2 as a linear combination of u and v. Therefore, we can
say that R2 is spanned by {u, v}.
Chapter 2. Linear dependence and basis vectors 21

The above definitions naturally leads us to the following theorem:

Theorem 2.1:

In an N -dimensional vector space V N , any vector u can be written as a linear combination of


linearly independent basis vectors ej .

Proof. This follows from the definition of linear independence and dimensionality: Since there
are no more than N linearly independent vectors, therefore there must be a relation of the form
N
X
i ei + 0u = 0, (2.4)
i=1

where u 2 V N is an arbitrary vector, and not all i are zero. In particular, the definition of
linear dependence requires 0 6= 0. We therefore have:
N N
1 X X
u= i ei = u i ei . (2.5)
0 i=1 i=1

with ui = i/ 0.

The above theorem is a very important result, and we will use it extensively throughout the course.
Its main implication is that it allows us to define a basis for a vector space:

Definition 2.2: Basis

Any set of linearly independent vectors in V N is called a basis, and they are said to span V N ,
or synonymously, they are complete. This allows us to write any vector v 2 V N as,
N
X
v= v i ei , (2.6)
i=1

where the set {ej }N


j=1 is a complete basis.

You have already come across basis vectors in previous courses, for example for R3 , one choice of
basis vectors are the cartesian unit vectors, e1 = i, e2 = j, and e3 = k. Notice that I wrote one choice
of basis vectors, it is important to note that the choice of ei is not unique. For example, a vector in
cartesian coordinates will remain unchanged by a rotation of the coordinate system, with the rotation
impacting only the definition of the unit vectors. Or equally we could represent our vector in terms
of spherical coordinates, in which case the basis vectors would be different again. All are completely
valid representations of the same vector. We will return to this point later in the course.

Exercise 2.4

If {eP
i } is a basis of V , prove that for any u 2 V , the coefficients ui in the expansion
N N
N
u = i=1 ui ei are unique.
Chapter 2. Linear dependence and basis vectors 22

Exercise 2.5
For the set of real 2 ⇥ 2 matrices, show that
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 0 0 1 0 0 0 0
e1 = , e2 = , e3 = , e4 = , (2.7)
0 0 0 0 1 0 0 1

are linearly independent, and can therefore be used as a basis.

2.3 Linear Subspaces


We have already seen that a vector space V N can be spanned by a set of N linearly independent
vectors. However, we can also consider a vector space spanned by a set of M < N linearly independent
vectors. This is called a subspace of V N , and is denoted by V M . A subspace V M is a vector space in
its own right, and therefore must satisfy the following properties:

1. It must contain the zero vector 0.

2. It must be closed under addition and scalar multiplication.

For example, lets consider the vector space R3 , this is the set of numbers (x, y, z), where x, y, z 2 R.
A subspace of R3 is the set of vectors (x, y, 0), where x, y 2 R, which define xy-plane in R3 . We can
check that these vectors satisfy the requirements of a subspace:

• For two vectors u = (x1 , y1 , 0) and v = (x2 , y2 , 0), we have u + v = (x1 + x2 , y1 + y2 , 0), which
is also in the subspace. Therefore it is closed under addition.

• For a vector u = (x, y, 0), and a scalar 2 R, we have u = ( x, y, 0), which is also in the
subspace. Therefore it is closed under scalar multiplication.

• The zero vector is (0, 0, 0), which is in the subspace.

So our set of vectors is indeed a subspace of R3 .


This example is a special case of a more general result:

Theorem 2.2:

Any set of M linearly independent vectors {ei }M


i=1 in V
N span a subspace V M of V N .

It is also constructive to think about a counter examples. For example, the set of vectors H =
{(x, y) : x2 + y 2  1}, i.e. the vectors that lie within a circle of radius 1, is not a subspace of R2 .
To see this consider an arbitrary point within the unit circle, say (x1 , y1 ), and some scalar . We can
easily choose a such that x1 or y1 > 1, thus lying outside of the unit circle. Therefore, the set is
not closed under scalar multiplication, so it is not a subspace of R2 .

2.4 Further reading


The problems 3 and 6 on sheet 1 cover material on this lecture. For more information this lectures
material, see:

• Section 8.1 of Riley, Hobson, and Bence.

• 3blue1brown is a great youtube channel for visualising linear algebra concepts. Check out the
video on linear combinations and basis vectors, https://www.youtube.com/watch?v=k7RM-ot2NWY.
Lecture 3
Inner Product Spaces & Dirac notation

Up to now, we have established a framework for describing vectors in an arbitrary vector space.
Missing from this rather abstract construction is the notion of direction and magnitude that, prior to
this course, underpinned our understanding of what a vector is. In Euclidean space both the magnitude
of a vector and its direction (relative to another vector) is given by the familiar scalar/dot product,
defined as a · b = |a||b| cos ✓, where ✓ is the angle between a and b, and |a| is the magnitude of vector
a. In this lecture we will generalise the notion of the dot/scalar product to abstract vector spaces,
which will be referred to as the inner product.
By the end of this lecture you should be able to:

• Define an inner product space, and calculate the inner product between two vectors.

• Define the notion of an orthonormality and an orthonormal basis.

• Use Gram-Schmidt orthogonalisation to construct an orthonormal basis.

• Proof the Cauchy-Schwarz and triangle inequalities.

• Write vectors in Dirac notation, and use this to calculate inner products.

3.1 Inner product spaces


We now want to develop a generalised notion of the dot product for arbitrary vector spaces, which
takes one or more vectors and returns a scalar. We will call this product the inner product, and is
defined formally as follows:

Definition 3.1: Inner product

The inner product between two vectors, denoted ha|bi, is defined as a scalar function, that is,
a function that takes two vectors and returns a scalar quantitity. For a complex vector space
V N (C), the inner product has the following properties:

(i) It is linear in the second argument, that is if w = u + µv, then ha|wi = ha|ui + µha|vi.

(ii) Under complex conjugate we have ha|wi = hw|ai. This is trivial requirement in the case
of real vector spaces.

The above definition also allows us to naturally define the idea of a magnitude of a vector, which we
call the norm:
kak2 = ha|ai 0. (3.1)

23
Chapter 3. Inner Product Spaces & Dirac notation 24

By definition we have kak2 = 0 if and only if a = 0.

Exercise 3.1

Confirm that the dot product a · b for a, b 2 R3 satisfies the above properties.

A vector space which has a linear inner product is called an inner product space. Inner product are
exceptionally important in physical sciences. Clearly from the above exercise, Euclidean space is an
inner product space, therefore Newtonian mechanics can be phrased in such language. Though it may
not be obvious just yet, Quantum mechanics is also described in terms of a complex inner product
space called Hilbert space.

Exercise 3.2

Show that if w = u + µv, then we have: hw|ai = ¯ hu|ai + µ̄hv|ai. This property is called
antilinearity or conjugate linearity.

3.1.1 Orthogonality
We have now recovered the notion of magnitude of a vector using the inner product, we can also develop
the concept of direction as well using the ideas of orthogonality:

Definition 3.2: Orthogonality

Consider two vectors a and b that are elements of the inner product space V N . a and b are
said to be orthogonal if they satisfy ha|bi = 0.

This naturally allows us to define the concept of an orthogonal and orthonormal basis:

Definition 3.3: Orthogonal basis

A basis set {ej }N


j=1 of the inner product space is said to be orthogonal if hei |ej i = A ij , where
A 2 C or R and ij is the Kronecker- . If A = 1 then we call the basis orthonormal.

So, how do we actually calculate the inner product of two vectors? Consider the set of vectors
j=1 , which form a complete andPorthonormal basis
{ej }N
P
set for the vector space V N . Two vectors
a, b 2 V N with representations a = j aj ej and b = j bj ej will have an inner product of the form

N
X
ha|bi = āi hei |ej ibj , (3.2)
i,j=1

where we have used the linear and antilinear properties of the inner product. Using the definition of
an orthonormal basis, hei |ej i = ij , we then have:
N
X N
X N
X
ha|bi = āi bj = āi ij bj = āi bi . (3.3)
i,j=1 i,j=1 i=1

Note that this is exactly the same form as the scalar product used in Euclidean vector spaces.
Finally, for a given vector a which can be written in the orthonormal basis {ej }N
j=1 the definition
of the inner product also allows us provides us some understanding of what it means to write a vector
Chapter 3. Inner Product Spaces & Dirac notation 25

in terms of a basis. If we consider


N
X
a= a j ej , (3.4)
j=1

and take the inner product with a particularly basis vector ek , we have:
N
X N
X
hek |ai = aj hek |ej i = aj kj = ak (3.5)
j=1 j=1

therefore the coefficient ak = hek |ai, often called the projection of a on ek , tells us how much of the
vector a is in the ek direction.

3.2 Gram-Schmidt orthogonalisation


It is quite a common situation (particularly in the numerical solutions to equations) that you may be
given a set of basis states {vj }N
j=1 for the inner-product space V
N which are not orthonormal, but we

wish to construct an orthonormal basis from it. This process is called orthogonalisation. There are
many different approaches to orthogonalisation, in this section we will present a particularly famous
method called Gram-Schmidt orthogonalisation.

Definition 3.3: Gram-Schmidt orthogonalisation

Consider a general basis {vj }N


j=1 for the inner-product space V . We can construct an or-
N

thonormal basis {ej }N


j=1 using the following procedure:

(1) For an arbitrary starting vector, we have e1 = v1 /kv1 k

(2) Then we have u2 = v2 he1 |v2 ie1 . Normalising, we have e2 = u2 /ku2 k.


..
.
⇣ Pm ⌘
1
(m) We have um = vm j=1 hej |vm iej . Normalising, we have em = um /kum k.

..
.

(N) Finally we have


N
X1
uN = vN hej |vN iej .
j=1

Normalising, we have eN = uN /kuN k.

Physically, what is happening in this procedure: In step 1, an initial vector is set. In step 2, we
take the second vector and subtract the component of the first vector from this vector. In step 3, we
take the third vector and subtract the components of the first two vectors from this vector. This is
repeated iteratively until we have cycled through out full set of vectors. The fact that we can write
this procedure in such general terms shows that you can always find an orthonormal basis for any
inner-product space!
Chapter 3. Inner Product Spaces & Dirac notation 26

3.2.1 Example: Gram-Schmidt orthogonalisation


Let’s apply this procedure to a simple example. Consider the following vectors in R3 :
0 1 0 1 0 1
1 1 1
v1 = @ 1 A , v2 = @ 0 A , v3 = @ 1 A . (3.6)
1 1 0
We want to use these vectors to construct a orthogonal basis using the Gram-Schmidt procedure.
Starting with step one, we have: 0 1
1
v1 1 @ A
e1 = =p 1 . (3.7)
kv1 k 3 1
For step two, we first find the projection of the second vector onto this basis vector:
2
he1 |v2 i = p . (3.8)
3
We then subtract this projection from the second vector, and normalise the result:
0 1
1
1@
u2 = v2 he1 |v2 ie1 = 2 A. (3.9)
3
1
Normalising, we then have: 0 1
1
u2 1 @
e2 = =p 2 A. (3.10)
ku2 k 6 1
Finally, for step three, we find the projections of the third vector onto the two new basis vectors:
2 1
he1 |v3 i = p , hu2 |v3 i = p . (3.11)
3 6
Subtracting these projections from the third vector, and normalising the result, we have:
0 1
1
1
u3 = v3 he1 |v3 ie1 he2 |v3 ie2 = @ 0 A . (3.12)
2
1
Normalising, we have: 1 0
1
u3 1
e3 = = p @ 0 A. (3.13)
ku3 k 2 1

3.3 Dirac notation


We will now introduce Dirac notation or Braket notation, an extremely compact and useful notation
for dealing with vectors and linear operators. This notation is almost universally used in quantum
mechanics, and will be very useful when we start dealing with linear transformations of vectors.
In Dirac notation, a vector a vector space V can be written as called a ket 1
P|N i, this is what is N
Let’s say we have a vector space V with an arbitrary vector v = j=1 vj ej where {ej }j=1 is a basis
N

for V N , in Dirac notation this vector looks like:


N
X
|vi = vj |ej i . (3.14)
j=1
1
Notice the similarity to the inner-product notation. This is not by accident and will become clearer later.
Chapter 3. Inner Product Spaces & Dirac notation 27

All vectors are replaced with a ket, | i, and components are included as a scalar multiple as usual. This
allows us to clearly distinguish between vector and scalar quantities without having to phaff around
with bold symbols or arrows over the top!
With every ket, | i, there is also a bra, h |, associated with it. This bra has both addition and
scalar multiplication defined in the same way as ket’s, meaning that they form their own vector space.
If our vector | i is an element of vector space V , then h | is a vector in V ⇤ which we call dual space
or simply dual of V .
There exists a one-to-one mapping between the vector space V and its dual V ⇤ , which we denote
as a dagger
(| i)† = h | , (h |)† = | i , (3.15)
and is often referred to as the adjoint or hermitian conjugate. This map is antilinear, such that:
(↵ | i + | i)† = ↵ ¯ h | + ¯h |. (3.16)
One of the key virtues of Dirac notation is that it naturally allows us to write inner products such that
h | i = h | | i, that is we have an inner product by acting a bra from the left on a ket. For example,
(↵ h | + h |) |!i =↵h |!i + h |!i,
(3.17)
h!| (↵ | i + | i) =↵h!| i + h!| i.
The best way to understand how useful Dirac notation can be is to see it in action: Consider a
two dimensional complex vector space V 2 , with orthonormal basis {|e1 i , |e2 i}. From the definition of
orthonormality, we have:
he1 |e1 i = he2 |e2 i = 1 and he1 |e2 i = he2 |e1 i = 0. (3.18)
As this is a basis, then by definition we can represent any vector in V 2 (C) in terms of it, such that
| i = a |e1 i + b |e2 i where a, b 2 C. (3.19)
The corresponding bra is then,
h | = ā he1 | + b̄ he2 | .
Using the definition of orthonormality we then have:
h | i = (ā he1 | + b̄ he2 |)(a |e1 i + b |e2 i) = |a|2 + |b|2 .
Go through the following exercises to try and become practice in Dirac notation:
Exercise 3.3
For two vectors | i = a |e1 i + b |e2 i and | i = c |e1 i + d |e2 i where a, b, c and d 2 C, show that:
¯
h | i = ac̄ + bd. (3.20)

Show that h | i = h | i, for this specific case, and generalise this to V N (C).

Exercise 3.4
Write the column vectors, ✓ ◆ ✓ ◆
2+i 5+i
and , (3.21)
i i
in Dirac notation, and calculate their norms and the inner product assuming that the basis used
is orthonormal.

Exercise 3.5
PN
A vector |ai 2 V N can be represented in the orthonormal basis {|eij }N
j=1 as |ai = j=1 aj |ej i.
Show that aj = hej |ai.
Chapter 3. Inner Product Spaces & Dirac notation 28

3.4 Inequalities for inner product spaces


As a consequence of the above properties, we can derive two very important and general inequalities:
Theorem 3.1: Cauchy-Schwarz inequality

Consider an inner product space V N . The vectors |ai , |bi 2 V N satisfy,

|ha|bi|  kakkbk, (3.22)

where the equality holds if |ai is scalar multiple of |bi.

Proof. This result is trivially proven it either |ai or |bi are the null vector, |0i. Assuming |ai or
|bi are not |0i, the proof is found by first considering |ui = |ai |bi, where = hb|ai /kbk2 .
can for this inequality can be proven using

kuk2 = (ha| hb|)(|ai |bi) =kak2 +| |2 kbk2 ¯ hb|ai ha|bi 0,


|hb|aii|2 (3.23)
=kak2 0.
kbk2

Rearranging, we find: |hb|aii|  kakkbk as required.

The above inequality holds for any inner product space, and is a very useful result. In the context of
quantum mechanics you can use Cauchy-Schwarz inequality to prove a general form of the uncertainty
principle A B |h[A, B]i|/2, where A and B are two Hermitian observables.
Another useful result ios the triangle inequality, which follows from the Cauchy-Schwarz inequality:

Theorem 3.2: The triangle inequality

Consider a inner product space V N , the vectors |ai , |bi 2 V N satisfy,

ka + bk  kak + kbk. (3.24)

Exercise 3.6
Prove the triangle inequality using properties of the Cauchy-Schwarz inequality.

3.5 Further reading


Inner product spaces are well discussed in most linear algebra textbooks, from which you can find
many more examples and exercises. The following are some suggestions:

• Section 8.1.2 and 8.1.3 in Riley, Hobson and Bence. This is all in vector notation.

• Section 1.2-1.5 Shankar. This is all in Dirac notation, and would act as a nice revision of the
basic principles of Vector spaces.

Please do ensure that you are comfortable with Dirac notation, as we will be using it extensively
throughout the rest of the course. Problems Q7-11 on problem sheet 1 and Q1-5 on problem sheet 2
cover the material for this lecture.
Lecture 4
Linear operators

Now we have introduced vector spaces, alongside the notion of distance and direction provided to us
by the inner product, we can now consider how a vector can be manipulated or transformed. In the
language of linear algebra, vectors are transformed by linear operators, which we will introduce in this
lecture.
By the end of the lecture, you should be able to:

• Define a linear operator

• write linear operators in terms of Dirac notation using the outer product.

• Define projection operators and construct them using an orthonormal basis.

• Define the completeness relation for a given inner product space.

• Define the adjoint of an operator, and determine whether an operator is unitary or Hermitian.

4.1 Introducing linear operators


A linear operator or linear map associates with every vector another vector. Let’s unpack that a little
bit:

Definition 4.1: Linear operators

consider a vector |ci = µ |ai + |bi, where |ai , |bi 2 V and , µ are scalars. A linear operator Â
is defined such that: ↵
c0 = Â |ci = µ(Â |ai) + (Â |bi) 2 W. (4.1)
We say that the operator  has mapped our vector |ci 2 V to a new vector |c0 i in another vector
space W .

For the rest of this course we will restrict ourselves to the simplifying case W = V , this is the most
important case for applications of linear algebra in quantum mechanics, however it is not always true
in more general applications of linear operators.

We define the following properties for linear operators:

(i) The addition of linear operators is distributive, such that, for two operators  and B̂, then

(Â + B̂) |vi = Â |vi + B̂ |vi .

(ii) We also define that scalar multiplication for a linear operator is given by ( Â) |vi = (Â |vi).

29
Chapter 4. Linear operators 30

(iii) The identity operator is defined as 1̂ |vi = |vi.

(iv) The null operator is given by Ô |vi = |0i, where |0i is the null vector.

Exercise 4.1
Use the above definitions to show that the set of all linear operators acting on a vector space is
itself a vector space.

We can then define the product of two operators as:


⇣ ⌘ ⇣ ⌘
B̂ Â |vi = B̂ Â |vi 8 |vi 2 V,

where  operates on |vi first, followed by B̂. Note that in general products of operators are not
commutative, that is B̂ Â 6= ÂB̂.
Using this definition of the product of operators we have one final definition, the inverse of an
operator:
Definition 4.2: Inverse operator

If for an operator  there exists an operator B̂ such that:


⇣ ⌘
B̂ Â |vi = |vi 8 |vi 2 V, (4.2)

then B̂ is called the inverse of Â, which we denote B̂ = Â 1, and satisfies ÂÂ 1 = Â 1 Â = 1̂.

Note: Not all operators have an inverse!

4.2 The outer-product, projectors and the completeness relation


You will notice that we have written the above definitions of linear operators in Dirac notation, which
we introduced last lecture. This is a consequence of the natural way linear operators can be expressed
in this notation. Consider the inner-product between two vectors ha|bi, we can mutliply the vector |ci
by this scalar value and by definition we still have a vector:

|ci ha|bi = (|ciha|) |bi. (4.3)

By the above definitions, we can identify |ciha| as a linear operator. This operator is sometimes referred
to as dyad, or described as the outer product of the vectors |ci and |ai,

Definition 4.3: Outer or dyadic product

Consider the vector the complex vector space V N (C). The outer product of vectors |ai , |bi 2
V N , denoted |aihb|, is a linear operation which constructs a linear operator on V N . The outer
product has the properties:

• It is linear in the first argument, that is if |ci = µ |ai + |bi, then |cihd| = µ |aihd| + |bihd|.

• It is anti-linear in it’s second argument, that is if |di = µ |ai + |bi, then |cihd| = µ̄ |ciha| +
¯ |cihb|.
Chapter 4. Linear operators 31

4.2.1 Projection operators


The definition of the outer product allows us to straightfowardly define a very special class of operators,
known as projection operators.

Definition 4.4: Projection operator

A projection operator P̂ is defined as an operator acting on vector space V which satisfies


P̂ 2 = P̂ , this is a property called idempotent.

Let us illustrate the action of projection operators with an example: Suppose an inner product space
V N is spanned by the orthonormal basis {|ej i}N j=1 , then we may construct an operator:

P̂j = |ej ihej | .

We can see trivially that P̂j2 = |ej i hej |ej i hej | = P̂j , and P̂j is therefore a projection operator. If we
P
consider the action on an arbitrary vector in V N , |bi = N k=1 bk |ek i, then we have:

N
X N
X
P̂j |bi = bk P̂j |ek i = bk |ej i hej |ek i = bj |ej i , (4.4)
k=1 k=1

where we have used that hej |ek i = jk for an orthonormal basis. So the projection operator P̂j therefore
projects out the j th -component of the vector |bi.
Exercise 4.2

Suppose an inner product space V N is spanned by the orthonormal basis {|ej i}N j=1 , show that

P1 ihe1 | + |e2 ihe2 | is projection operator, and determine its action on an arbitrary state
P̂ = |e
|bi = N k=1 bk |ek i.

4.2.2 Completeness relation


The definition of the projector in Eq. 4.4 allows us to construct the representation of the identity
operator. Consider an inner product space V N spanned by the orthonormal basis {|ej i}N j=1 . If we
take the definition of the projection operator P̂j = |ej ihej |, then by summing over all such projection
operators, we have
XN N X
X N XN
P̂j |bi = bk P̂j |ek i = bj |ej i , (4.5)
j=1 j=1 k=1 j=1
PN
therefore we can identify j=1 P̂j = 1̂ as the identity operator. This leads us to the definition of a
completeness relation or equivalently a resolution of the identity
N
X
1̂ = |ej ihej | .
j=1

It is important to recognise that their exists a completeness relation for any orthonormal basis V N .
We will see later that can be used to change the representation of a vector between orthonormal basis
sets.
Chapter 4. Linear operators 32

4.3 The Adjoint or Hermitian conjugate

Definition 4.3: Adjoint or Hermitian conjugate

For a linear operator Â, we can define its adjoint (or synonymously Hermitian conjugate) †
as:
hu| † |vi = hv|  |ui 8 |ui , |vi 2 V. (4.6)

If we choose |vi = |ej i and |ui = |ek i, where the vectors {|ej i}N j=1 are an orthonormal basis for V ,
then we find:
(† )ij = hei | † |ej i = hej |  |ei i = Āji , (4.7)
so in matrix form, the Hermitian conjugate or adjoint of an operator corresponds to its conjugate
transpose. This is the same operator that mapped a ket to a bra.

Exercise 4.3
Prove the following statements,

1. (ÂB̂)† = B̂ † † .

2. ( Â)† = ¯ † where is a scalar.

3. If Q̂ = |ciha| then Q̂† = |aihc|.

The definition of the adjoint allows us to define two very important class of operators, which we
encounter frequently in both linear algebra and quantum mechanics:

Definition 4.5: Self-adjoint (Hermitian) operators

We call a linear operator  self-adjoint, or Hermitian, if  = † , that is

hu| Â |vi = hv| Â |ui 8 |ui , |vi 2 V. (4.8)

In matrix language, we find that the components satisfy Aij = Āji . For a real vector space, a
self-adjoint matrix is equivalent to a real symmetrix one.

Definition 4.6: Unitary operators

A linear operator Û is said to be Unitary if

Û † Û = Û Û † = 1̂ (4.9)

that is Û 1 = Û † .
For a real vector space then the Hermitian conjugate reduces to the transpose, and we have
Û T Û = 1̂, which is the definition of an orthogonal matrix.

4.4 Further reading


For further reading on operators see:
• Shankar, Section 1.5 and 1.6.
To practive some of the concepts introduced in this lecture, try Q6,7 & 8 of problem sheet 2.
Lecture 5
Changing basis of vectors and operators

As we discussed in lecture 2, while a decomposition of a vector into a particular basis set is unique, the
choice of basis is not. For a variety of reasons, it is often useful to change the basis of a vector or linear
operator. For example, in quantum mechanics if we wish to measure a particular property of a quantum
system, it is often convenient to write the state of the system in the eigenbasis of the measurement.
The expansion coefficients of the resulting basis will then give the probability of measuring each of the
possible outcomes. For example, if we wish to measure the spin of an electron in the Z direction, we
can write its state as,
| i = ↵ |"i + |#i , (5.1)
the probability of measuring spin up can then be trivially identified as |↵|2 and similarly for spin down
| |2 .
By the end of this lecture you should be able to:
• Change the basis of a vector and linear operator.
• Write vectors and linear operators as matrices.

5.1 Changing basis of vectors and operarators


Consider a vector space V N with two orthonormal bases, {|ej i}N
j=1 , and {|fj i}j=1 . As these basis sets
N

are complete and orthonormal they each have a completeness relation associated to them,
N
X N
X
1̂ = |ej ihej | and 1̂ = |fj ihfj | . (5.2)
j=1 j=1

Both completeness relation are completely valid representations of the identity, but written in terms
of a different set of basis vectors. These completeness relations will be crucial for changing the basis
of a given vector or operator.

5.1.1 Changing the basis of a vector


As shown in Lecture 4, we can use the completeness relations to write a vector |vi in terms of the basis
j=1 , such that,
{|ej i}N
N
X
|vi = 1̂ |vi = hej |vi |ej i , (5.3)
j=1
but equally, we could also use the other completeness relation to write the vector in terms of the basis
j=1 ,
{|fj i}N
N
X
|vi = 1̂ |vi = hfj |vi |fj i . (5.4)
j=1

33
Chapter 5. Changing basis of vectors and operators 34
Basis transformation (a

|yi
<latexit sha1_base64="5rYgc3qZ+59nc/9DfqHhP/WOP7I=">AAACA3icbVDLSsNAFJ3UV62vqks3wSK4KomIuiy6cVnBPqANZTK9aYdOJmHmRgghSz/ArX6CO3Hrh/gF/obTNgvbeuDC4Zx7ufcePxZco+N8W6W19Y3NrfJ2ZWd3b/+genjU1lGiGLRYJCLV9akGwSW0kKOAbqyAhr6Ajj+5m/qdJ1CaR/IR0xi8kI4kDzijaKROfwKYpfmgWnPqzgz2KnELUiMFmoPqT38YsSQEiUxQrXuuE6OXUYWcCcgr/URDTNmEjqBnqKQhaC+bnZvbZ0YZ2kGkTEm0Z+rfiYyGWqehbzpDimO97E3F/7xegsGNl3EZJwiSzRcFibAxsqe/20OugKFIDaFMcXOrzcZUUYYmoYUt8TjVnOm8YpJxl3NYJe2LuntVv3y4rDVui4zK5IScknPikmvSIPekSVqEkQl5Ia/kzXq23q0P63PeWrKKmWOyAOvrF9jpmNI=</latexit>

<latexit sha1_base64="Vr1t9vTdoy9eb0BOfjthC3QV40U=">AAACCnicbVDLSsNAFJ3UV62vqks3g0VwVRIRdVl047KCfUCTlsn0ph06mYSZiRBC/sAPcKuf4E7c+hN+gb/htM3Cth64cDjnXs7l+DFnStv2t1VaW9/Y3CpvV3Z29/YPqodHbRUlkkKLRjySXZ8o4ExASzPNoRtLIKHPoeNP7qZ+5wmkYpF41GkMXkhGggWMEm2kvjsBnaV9N5YshHxQrdl1ewa8SpyC1FCB5qD64w4jmoQgNOVEqZ5jx9rLiNSMcsgrbqIgJnRCRtAzVJAQlJfNvs7xmVGGOIikGaHxTP17kZFQqTT0zWZI9Fgte1PxP6+X6ODGy5iIEw2CzoOChGMd4WkFeMgkUM1TQwiVzPyK6ZhIQrUpaiElHqeKUZVXTDPOcg+rpH1Rd67qlw+XtcZt0VEZnaBTdI4cdI0a6B41UQtRJNELekVv1rP1bn1Yn/PVklXcHKMFWF+/g0eb7w==</latexit>

|y i
|vi
<latexit sha1_base64="Vd4n+VV9tNqwOghk/uHW+qi2VWI=">AAACA3icbVDLSsNAFJ3UV62vqks3g0VwVRIp6rLoxmUF+4A2lMl00g6ZTMLMTSGELP0At/oJ7sStH+IX+BtO2yxs64ELh3Pu5d57vFhwDbb9bZU2Nre2d8q7lb39g8Oj6vFJR0eJoqxNIxGpnkc0E1yyNnAQrBcrRkJPsK4X3M/87pQpzSP5BGnM3JCMJfc5JWCk7iBgkE3zYbVm1+058DpxClJDBVrD6s9gFNEkZBKoIFr3HTsGNyMKOBUsrwwSzWJCAzJmfUMlCZl2s/m5Ob4wygj7kTIlAc/VvxMZCbVOQ890hgQmetWbif95/QT8WzfjMk6ASbpY5CcCQ4Rnv+MRV4yCSA0hVHFzK6YToggFk9DSlniSak51XjHJOKs5rJPOVd25rjceG7XmXZFRGZ2hc3SJHHSDmugBtVAbURSgF/SK3qxn6936sD4XrSWrmDlFS7C+fgHUG5jP</latexit>

<latexit sha1_base64="G2KKyZSpT86ea2mQRoow+SwFxMs=">AAACCnicbVDLSsNAFJ3UV62vqks3g0VwVRIp6rLoxmUF+4AmLZPpTTt0JgkzE7GE/IEf4FY/wZ249Sf8An/DaZuFbT1w4XDOvZzL8WPOlLbtb6uwtr6xuVXcLu3s7u0flA+PWipKJIUmjXgkOz5RwFkITc00h04sgQifQ9sf30799iNIxaLwQU9i8AQZhixglGgj9dwx6PSp58aSCcj65YpdtWfAq8TJSQXlaPTLP+4goomAUFNOlOo6dqy9lEjNKIes5CYKYkLHZAhdQ0MiQHnp7OsMnxllgINImgk1nql/L1IilJoI32wKokdq2ZuK/3ndRAfXXsrCONEQ0nlQkHCsIzytAA+YBKr5xBBCJTO/YjoiklBtilpIiUcTxajKSqYZZ7mHVdK6qDqX1dp9rVK/yTsqohN0is6Rg65QHd2hBmoiiiR6Qa/ozXq23q0P63O+WrDym2O0AOvrF4Gmm+4=</latexit>

|x i
✓ ↵
<latexit sha1_base64="HGHhHjYU7o6elSXqVk0Cyk2XkZ8=">AAACD3icbVBLTgJBFOzBH+Jv0KWbQWLiRjJjiLokunGJiXwShpA3TcN06Pmk+41mQjiEB3CrR3Bn3HoET+A1bGAWAlbykkrVe6mX8mLBFdr2t5FbW9/Y3MpvF3Z29/YPzOJhU0WJpKxBIxHJtgeKCR6yBnIUrB1LBoEnWMsb3U791iOTikfhA6Yx6wYwDPmAU0At9cyiiz5DcEvnbskFEfvQM8t2xZ7BWiVORsokQ71n/rj9iCYBC5EKUKrj2DF2xyCRU8EmBTdRLAY6giHraBpCwFR3PHt9Yp1qpW8NIqknRGum/r0YQ6BUGnh6MwD01bI3Ff/zOgkOrrtjHsYJspDOgwaJsDCypj1YfS4ZRZFqAlRy/atFfZBAUbe1kBL7qeJUTQq6GWe5h1XSvKg4l5XqfbVcu8k6ypNjckLOiEOuSI3ckTppEEqeyAt5JW/Gs/FufBif89Wckd0ckQUYX7+V35xR</latexit>


<latexit sha1_base64="YJ2j9N4oXTSUXt0szr1nxBU+hxA=">AAACAnicbVDLSgNBEJyNrxhfUY9eBoPgKexKUI9BLx4jmAckS+idzGbHzM4OM7PCEnLzA7zqJ3gTr/6IX+BvOEn2YBILGoqqbrq7AsmZNq777RTW1jc2t4rbpZ3dvf2D8uFRSyepIrRJEp6oTgCaciZo0zDDaUcqCnHAaTsY3U799hNVmiXiwWSS+jEMBQsZAWOlVg+4jKBfrrhVdwa8SrycVFCORr/80xskJI2pMISD1l3PlcYfgzKMcDop9VJNJZARDGnXUgEx1f54du0En1llgMNE2RIGz9S/E2OItc7iwHbGYCK97E3F/7xuasJrf8yETA0VZL4oTDk2CZ6+jgdMUWJ4ZgkQxeytmESggBgb0MIWGWWaET0p2WS85RxWSeui6l1Wa/e1Sv0mz6iITtApOkceukJ1dIcaqIkIekQv6BW9Oc/Ou/PhfM5bC04+c4wW4Hz9Apl+mBk=</latexit>

|xi
<latexit sha1_base64="RoHT7McsiwvljiZvTsXAkuHDqqY=">AAACA3icbVDLSsNAFL2pr1pfVZduBovgqiRS1GXRjcsK9gFtKJPppB0ymYSZiRhCln6AW/0Ed+LWD/EL/A2nbRa29cCFwzn3cu89XsyZ0rb9bZXW1jc2t8rblZ3dvf2D6uFRR0WJJLRNIh7JnocV5UzQtmaa014sKQ49TrtecDv1u49UKhaJB53G1A3xWDCfEayN1B0EVGdP+bBas+v2DGiVOAWpQYHWsPozGEUkCanQhGOl+o4dazfDUjPCaV4ZJIrGmAR4TPuGChxS5Wazc3N0ZpQR8iNpSmg0U/9OZDhUKg090xliPVHL3lT8z+sn2r92MybiRFNB5ov8hCMdoenvaMQkJZqnhmAimbkVkQmWmGiT0MKWeJIqRlReMck4yzmsks5F3bmsN+4bteZNkVEZTuAUzsGBK2jCHbSgDQQCeIFXeLOerXfrw/qct5asYuYYFmB9/QLXT5jR</latexit>

Figure 5.1.1: A figure showing a change of basis for a vector |vi. Here we have rotated the basis vectors |xi and
|yi by some angle ↵.

We have not changed the vector, applying only the idenity operator through the completeness relation,
but we now have two representations of the same vector. Effectively, we have changed the basis of our
vector. Intuitively, we can think of changing basis as a rotation of the coordinate system which leaves
the vector |vi invariant.
To make this more concrete, let’s take a simple example of a two-dimensional vector space, with
orthonormal basis vectors {|xi , |yi}. Since this is a complete orthonormal basis, there exhists a com-
pletness relation 1̂ = |xihx| + |yihy|. An arbitrary vector in this space, |vi, may be written as,

|vi = vx |xi + vy |yi , (5.5)

where vx = hx|vi and vy = hy|vi. Now, suppose we wish to rotate the coordinate system by some angle
↵, as shown in Fig. 5.1.1. In this case we have a new basis {|x0 i , |y 0 i}, which can be written in terms
of the unrotated basis as,
↵ ↵
x0 = cos ↵ |xi + sin ↵ |yi and y 0 = cos ↵ |yi sin ↵ |xi . (5.6)

Associated with this new basis is a completness relation, 1̂ = |x0 ihx0 | + |y 0 ihy 0 |. This allows us to write,
↵ ↵
|vi = vx0 x0 + vy0 y 0 , (5.7)

with vx0 = hx0 |vi and vy0 = hy 0 |vi. We have not done anything to our vector, but simply changed the
definition of the axes we use to represent it.
Let’s do a quick example. Consider the vector space R2 , which has a orthonormal basis {|xi , |yi}.
A vector |vi in this space is given by,
|vi = |xi + 2 |yi . (5.8)
Let’s say we want to rewrite |vi in terms of the linearly independent basis,

|ui = |xi + |yi and |wi = |xi |yi . (5.9)

To do this, we first rewrite the basis vectors |xi and |yi in terms of |ui and |wi,
1 1
|xi = (|ui + |wi) and |yi = (|ui |wi). (5.10)
2 2
Subbing this into our expression for |vi, we have,
1 3 1
|vi = (|ui + |wi) + (|ui |wi) = |ui |wi . (5.11)
2 2 2
Chapter 5. Changing basis of vectors and operators 35

Exercise 2.4

Consider the two dimensional inner-product space V 2 (C). This space has an orthonormal
bases {|0i , |1i}. We can use this basis to construct a new basis {|+i , | i}, such that
|±i = p12 (|0i ± |1i).

(a) Show that {|+i , | i} is an orthonormal basis for V 2 .

(b) Write the vector | i = ↵ |0i + |1i in the {|+i , | i} basis.

(c) Consider another vector | i = |+i + | i. Show that the inner product
1
h | i = p ¯(↵ + ) + ¯ (↵ ) ,
2
is the same in both the {|0i , |1i} and {|+i , | i} basis sets.

(d) Show that any inner product is left invariant by a change of basis.

It is worth noting that while a vector |vi might look quite different when written in terms of
different basis vectors, the underlying vector is the same. One way to see this is to consider the inner
product of the vector with itself:
Consider a vector |vi 2 V N , the vector space V N has two orthonormal bases {|ej i}N
j=1 and {|fj i}j=1 ,
N

such that |vi can be written in terms of both bases as,


N
X N
X
|vi = vj |ej i = cj |fj i . (5.12)
j=1 j=1

If we take the inner product of |vi with itself, we have,


0 1† !
XN XN XN N
X N
X
hv|vi = @ vj |ej iA vk |ek i = vj⇤ vk hej |ek i = vj⇤ vk jk = |vj |2 . (5.13)
j=1 k=1 j,k=1 j,k=1 j=1

If we repeat this procedure for the other basis, we have,


0 1† !
XN XN XN N
X N
X
hv|vi = @ cj |fj iA ck |fk i = c⇤j ck hfj |fk i = c⇤j ck jk = |cj |2 . (5.14)
j=1 k=1 j,k=1 j,k=1 j=1

If the inner product is left invariant by a change of basis, then we must have,
N
X N
X
|vj |2 = |cj |2 . (5.15)
j=1 j=1

Now recall that vj = hej |vi and cj = hfj |vi, therefore we have,
N
X N
X
hv|ej i hej |vi = hv|fj i hfj |vi . (5.16)
j=1 j=1

but using the completeness relation, we have,

hv| 1 |vi = hv|vi (5.17)

Therefore the inner product/norm is left invariant by a change of basis.


Chapter 5. Changing basis of vectors and operators 36

5.1.2 Changing the basis of a linear operator


We can also change the basis of a linear operator in a similiar fashion. Consider a linear operator Â
which acts on a vector space V N with orthonormal basis {|ej i}N j=1 . We can apply the completeness
relation to this operator as,
0 1 !
N
X N
X
 = 1Â1 = @ |ej ihej |A  |ek ihek | ,
j=1 k=1
(5.18)
N
X
= hej |Â|ek i |ej ihek | ,
j,k=1

where the term hej |Â|ek i specifies how the operator acts on the basis vectors. We can also apply
the completeness relation in terms of the another orthonormal basis {|fj i}N j=1 , yielding a completely
equivalent representation of the operator,
0 1 !
XN N
X
 = 1Â1 = @ |fj ihfj |A  |fk ihfk | ,
j=1 k=1
(5.19)
N
X
= hfj |Â|fk i |fj ihfk | .
j,k=1

This operator would have completely the same action on the same vector, but we have simply changed
the basis with which we represent it.
As an example, let us consider us consider a complex vector space C2 with orthonormal basis
{|0i , |1i}. One of the Pauli operators that act on this space can be written as,

ˆx = |0ih1| + |1ih0| . (5.20)

Say we want to rewrite ˆx in terms of a second orthonormal basis for C2 as {|+i , | i}, where |±i =
p1 (|0i ± |1i). We can do this in two ways, first we can write the basis vectors {|0i , |1i} in terms of
2
{|+i , | i}, such that,
1 1
|0i = p (|+i + | i) and |1i = p (|+i | i). (5.21)
2 2
Subbing these expressions into ˆx , we have,

ˆx = |0ih1| + |1ih0|
1 1
= (|+i + | i)(h+| h |) + (|+i | i)(h+| + h |)
2 2 (5.22)
1 1
= (|+ih+| + |+ih | | ih+| | ih |) + (|+ih+| |+ih | + | ih+| | ih |)
2 2
= |+ih+| | ih | .

Alternatively, we could have written the completeness relation for the in terms of the basis {|+i , | i},
and found the action of ˆx on these basis vectors. For example, we have,

h+| ˆx |+i = h+| (|0ih1| + |1ih0|) |+i = 1


h+| ˆx | i = h+| (|0ih1| + |1ih0|) | i = 0
(5.23)
h | ˆx |+i = h | (|0ih1| + |1ih0|) |+i = 0
h | ˆx | i = h | (|0ih1| + |1ih0|) | i = 1
Chapter 5. Changing basis of vectors and operators 37

Using Eq. 5.19, we would then have,

ˆx = |+ih+| | ih | . (5.24)

Both approaches are completely valid, and yield the same result. However, the latter approach is
often more convenient and systematic when dealing with high-dimensional vector spaces. The above
representation of ˆx is actually a very special representation, known as the spectral decomposition of
the operator, which we will return to in Lecture 7.

5.2 Further reading


To follow on some of the ideas introduced in this lecture, you may wish to read the following:

• Shankar Pg. 10-13

Supporting questions for this lecture can be found in Problem Sheet 2 Question 9.
Lecture 6
Matrix representations of vectors and operators

The more observant of you may have noticed that there is an intimate link between matrices, operators,
and vectors. Indeed, as we shall see in this lecture, we can represent any vector or operator in terms
a matrix. When doing this it is important to specify which basis you are using to write the matrix
representation, as —unlike the underlying vector or operator— the matrix representation will change
depending on the basis used.
In this lecture we will,

• Show how to represent vectors and operators in terms of matrices.

• Show that matrix representations differ depending on the basis used.

• Show how to change the matrix representation of a vector or operator.

6.1 Vectors and Operators in matrix form


While representing vectors and operators in terms of braket notation is useful for many purposes, it
is often more convenient to represent them in terms of matrices, for example, when trying to find the
eigenvalues and eigenvectors of an operator, or simply inputing vectors/operators into a computer. In
this section we will show how to represent vectors and operators.

6.1.1 Writing a vector in matrix form


Consider a vector |vi 2 V N , as shown in the previous section, this vector can be uniquely decomposed
in terms of a complete basis {|eij }N
j=1 of V , which may be expressed as
N

N
X
|vi = vj |eij . (6.1)
j=1

This is what is referred to as a representation of |vi. Now as this complete basis leads to a unique set
of coefficients vj , it is common to express this simply as an N ⇥ 1 matrix, or equivalently a column
vector 0 1
v1
B v2 C
B C
|vi ! B . C . (6.2)
@ .. A
vN

38
Chapter 6. Matrix representations of vectors and operators 39

The arrow notation above should read as ‘is represented by’. With this notation, then we can identify
the basis vectors {|ej i} as having matrix representations:
0 1 0 1 0 1
1 0 0
B 0 C B 1 C B 0 C
B C B C B C
B 0 C B 0 C B C
|e1 i ! B C , |e2 i ! B C , · · · , |eN i ! B 0 C . (6.3)
B .. C B .. C B .. C
@ . A @ . A @ . A
0 0 1

It is important to note that the above matrix representation is not unique. Just as we can represent a
vector in terms of a different basis, we can also write its matrix representation in terms of a different
basis. The resulting vector will look the same, but the context in which it is represented will be
different. Therefore, when you write a matrix representation of a vector, it is crucial that you state
the basis in which it is represented.
Just as there is a matrix representation of ket vectors, we can also associate a matrix representation
with bra vectors. This can be found by recognising that a bra vector is simply the adjoint of a ket
vector, or in other words, the conjugate transpose, hv| = |vi† . Therefore, we have,
N
X
hv| = vj⇤ he|j ! v̄1 v̄2 · · · v̄N . (6.4)
j=1

So a bra is simply a row vector or equivalently, a 1 ⇥ N matrix.

6.1.2 Writing a linear operator in matrix form


In a similar way, we can write a linear operator in matrix form as well. Consider an inner-product
space V N spanned by the orthonormal basis {|ej i}N j=1 , the action of linear operator  on a vector |bi
is given by:
|ci = Â |bi , (6.5)
PN
if we insert the completeness relation 1̂ = j=1 |ej ihej |, we obtain

1̂ |ci = 1̂Â1̂ |bi ,


0 1 !
N
X N
X N
X
) hej |ci |ej i = @ |ej ihej |A Â |ek ihek | |bi ,
j=1 j=1 k=1
N
X N X
X N (6.6)
) cj |ej i = hej | Â |ek i hek |bj i |ej i ,
j=1 j=1 k=1
N
X N X
X N
) cj |ej i = Ajk bk |ej i ,
j=1 j=1 k=1

where we have used that cj = hej |ci and bj = hej |bi. We have also introduced Ajk = hej | Â |ek i, which
we refer to as the matrix elements of Âjk . Indeed, if you look back at the week 0 lecture notes (or your
P
first year maths notes), you can see that cj = N k=1 Ajk bk is the definition of matrix multiplication!
So we can represent the expression |ci = Â |bi in matrix form as:
0 1 0 10 1
c1 A11 A12 · · · A1N b1
B c2 C B A21 A22 · · · A2N C B b2 C
B C B CB C
B .. C = B .. .. .. C B .. C . (6.7)
@ . A @ . . ··· . A@ . A
cN AN 1 AN 2 · · · AN N bN
Chapter 6. Matrix representations of vectors and operators 40

where the matrices and vectors are represented in the {|ej i}N 1
j=1 basis .
Another useful and equivalent representation of the operator  is in terms of the outer product
introduced in the previous section
0 1 !
N
X N
X XN
 = 1̂Â1̂ = @ |ej ihej |A  |ek ihek | = Ajk |ej ihek | ,
j=1 k=1 j,k=1

where Ajk = hej | Â |ek i as before. Since the above representation is equivalent to the matrix represen-
tationP
of Â, this allows us toP write a matrix definition of the outer product, such that for two vectors
|ai = j=1 aj |ej i and |bi = N
N
j=1 bj |ej i, the matrix representation of the outer product is
0 1 0 1
a1 b̄1 a1 b̄2 ··· a1 b̄N a1
B a2 b̄1 a2 b̄2 ··· a2 b̄N C B a2 C
B C B C
|aihb| ! B .. .. .. C=B .. C b̄1 b̄2 · · · b̄N . (6.8)
@ . . ··· . A @ . A
aN b̄1 aN b̄2 · · · aN b̄N aN

The outer product is often also referred to as the Dyadic product, where two vectors are used to
construct a matrix, and can sometimes be seen written as ab† .
In the case of the completeness relation, we can use the dyadic product to construct a particularly
simple form for the identity operator. If we consider the matrix representation of the outer product
|e1 ihe1 |, 0 1 0 1
1 1 0 ··· 0
B 0 C B 0 0 ··· 0 C
B C B C
|e1 ihe1 | ! B . C (1 0 · · · 0) = B . . . C, (6.9)
@ .. A @ .. .. · · · .. A
0 0 0 ··· 0
Repeating this procedure for the full basis, and adding them together, we have
0 1
1 0 ··· 0
XN B 0 1 ··· 0 C
B C
1̂ = |ej ihej | ! B . .. .. C , (6.10)
@ .. . ··· . A
j=1
0 0 ··· 1

where we have unity on the diagonal, and zeroes everywhere else. Alternatively, one could find the
matrix elements as 1̂jk = hej | 1̂ |ek i = hej |ek i = ij .

6.2 Changing the representation of a vector


Although the choice of basis above does not change the vector, it will change the matrix representation.
For example, for the two-dimensional vector space defined above, we can define a matrix with respect
to the basis {|xi , |yi} as, ✓ ◆
{|xi,|yi} vx
|vi ! , (6.11)
vy
but equally we could write the vector with respect to the basis {|x0 i , |y 0 i}, yielding the representation,
✓ ◆
{|x0 i,|y 0 i} v x0
|vi ! . (6.12)
vy 0
1
Note that the j th column of  gives the components of  |ej i. If these are not linearly independent, then  |bi
belongs to a vector space of lower dimension than V N . Phrased another way, if columns are not linearly independent,
then det(Â) = 0, which also implies that there is no inverse to Â.
Chapter 6. Matrix representations of vectors and operators 41

The underlying vector is left unchanged, however the matrix representations have changed. Therefore,
it is crucial to state what basis a particular matrix representation is given in. Though it is slightly
cumbersome notation, an arrow with the basis stated above it is useful for keeping track of which basis
has been used in the representation of a vector or matrix.
Let’s now consider a vector space V N with two different orthonormal basis {|ej i}N
j=1 and {|fj i}j=1 .
N

Each basis can be used to represent a vector |vi, with matrix elements aj = hej |vi and bj = hfj |vi.
How can we change between the two equivalent representations of |vi?
Since the vector is left unchanged by our choice of basis, we can write,
N
X N
X
|vi = aj |ej i = bk |fk i . (6.13)
j=1 k=1

If we take the inner product with the basis vector |el i and using the orthonormality of the basis, then
we have:
N
X N
X
al = hel |vi = bk hel |fk i = (U)lk bk . (6.14)
k=1 k=1
Notice that the righthand side of this expression is simply the matrix product between a matrix U
with matrix elements (U)lk = hel |fk i, where I have used boldface to denote a matrix. So, if we want
to change the basis with which we represent a vector, we simply take the matrix product:
0 1 0 10 1
a1 he1 |f1 i · · · he1 |fN i b1
B a2 C B he2 |f1 i · · · he2 |fN i C B b2 C
B C B CB C
B .. C = B .. .. .. C B .. C . (6.15)
@ . A @ . . . A @ . A
aN heN |f1 i · · · heN |fN i bN
We can interpret the action of the matrix U as a rotation of the column vector into a new basis. This
can be shown explicitly by considering the example given in Fig. 5.1.1. For this example, we transform
the representation of a vector written in the basis {|xi , |yi} into the new basis {|x0 i , |y 0 i} using the
matrix equation
✓ ◆ ✓ 0 ◆✓ ◆ ✓ ◆✓ ◆
v x0 hx |xi hx0 |yi vx cos ↵ sin ↵ vx
= = , (6.16)
vy 0 hy 0 |xi hy 0 |yi vy sin ↵ cos ↵ vy

where we have used the definition of the |x0 i and |y 0 i given in Eq. 5.6. Notice that the operator use
to change representation has the same form as a rotation matrix in two dimensions. If you are unsure
about this result, then you can check using geometric arguments.
An important property of the matrix used to transform between representations is that it is unitary:
N N
!
X X X
† †
(U U)jk = (U )jm (U)mk = hfj |em i hem |fk i = hfj | |em ihem | |fk i = jk , (6.17)
m=1 m=1 m

which are the components of the identity matrix. So we only need one matrix to switch between
representations of vectors.

6.3 Changing the representation of a matrix


As we saw in Sec. 5, we can represent a linear operator as matrix with respect to a particular basis.
As in the case of vectors, a linear operator has an equivalent matrix representation associated with
each basis. So, for an a vector space V N with basis sets {|ej i}N
j=1 and {|fj i}j=1 , we can represent a
N

linear operatos as:


N
X N
X
 = 1̂Â1̂ = Ajk |ej ihek | = Ãlm |fl ihfm | . (6.18)
j,k=1 l,m=1
Chapter 6. Matrix representations of vectors and operators 42

Although the operator is equivalent in both basis, the matrix elements Ajk 6= Ãjk are different. We
can find the relationship between the two representations in a similiar fashion as Eq. 6.14:
X
Ajk = hej |fl i Ãlm hfm |ek i , (6.19)
lm

where we see that this is once again in matrix product form. If we suppose that A is the matrix
representation of  with respect to the basis {|ej i}N j=1 , and likewise à is its representation w.r.t.
{|fj i}j=1 , then the representations can be linked through the product,
N

A = UÃU† , (6.20)

where (U)jk = hej |fk i as before.


To see this in action, let’t consider the case of the Pauli matrix ˆx , which we encountered in the
previous lecture. We can represent this operator in the {|0i , |1i} basis as:
✓ ◆
{|0i,|1i} 0 1
ˆx = |0ih1| + |1ih0| ! x= . (6.21)
1 0

However, say we want to transform the representation of x into the basis {|+i , | i}. First, we find
the unitary matrix U that transforms between representations,
✓ ◆ ✓ ◆
h+|0i h+|1i 1 1 1
U= =p . (6.22)
h |0i h |1i 2 1 1

and what is left is to do the matrix multiplication:



˜x = U xU ,
✓ ◆✓ ◆✓ ◆
1 1 1 0 1 1 1
= ,
2 1 1 1 0 1 1
✓ ◆✓ ◆
1 1 1 1 1
= ,
2 1 1 1 1 (6.23)
✓ ◆
1 2 0
= ,
2 0 2
✓ ◆
1 0
= ,
0 1

which is the same as the spectral representation we found in the previous lecture. An important note to
finish this lecture with is that the underlying operator has not changed, only its matrix representation.
Therefore, whenever you write a matric for an operator (or a vector), be clear exactly which basis you
are using to represent it in.

6.4 Further reading


To follow on some of the ideas introduced in this lecture, you may wish to read the following:

• Shankar Pg. 10-13

Supporting questions for this lecture can be found in Problem Sheet 2 Question 8.
Lecture 7
Eigenvalue problems and Diagonalisation

It is hard to emphasise enough the importance of the eigenvalue problem. It emerges in all branches
of science and mathematics: you’ve already seen that quantum mechanics boils down to solving eigen-
value problems, and in maths of waves and fields you have seen that they are solutions to differential
equations. Here we are going to study the eigenvalue problem in the context of linear operators and
vector spaces, but keep in mind that these ideas are transferrable to all manner of physical contexts!
By the end of this lecture you should be able to:

• Define eigenvalues and eigenvectors of a linear operator.

• Derive the properties of the eigenvalues and eigenvectors of Hermitian and Unitary operators.

• Define the spectral representation of a linear operator and diagonalise the matrix representation
of linear operators.

7.1 Eigenvalue problems

Definition 3.1: Eigenvalues and eigenvectors

For a linear operator Â, the eigenvalue equation is defined as,

 |ui = |ui , (7.1)

where and |ui are an eigenvalue and eigenvector associated to  respectively, where |ui =
6 |0i,
the null vector.

We can rearrange this equation to obtain,


⇣ ⌘
 1̂ |ui = |0i , (7.2)

where we have used |ui = 1̂ |ui. For an N -dimensional vectorspace, this equation has non-trivial
solutions (i.e. |ui =
6 |0i) if
det(Â 1̂) = 0, (7.3)
which generates a polynomial of degree N , with N solutions for 2 C. One can then find the
eigenvectors by substituting a particular value of into the eigenvalue equation.

43
Chapter 7. Eigenvalue problems and Diagonalisation 44

Exercise 7.1

A vector space V 2 has an orthonormal basis {|0i , |1i}. The Pauli operators ˆx , ˆy and ˆz have
the following matrix representations with respect to the basis {|0i , |1i}:
✓ ◆ ✓ ◆ ✓ ◆
1 0 0 1 0 i
z = , x = , and y = . (7.4)
0 1 1 0 i 0

Find the eigenvalues and eigenvectors for each of the Pauli operators.

It is worth noting that each distinct eigenvalue gives a distinct eigenvector. If an eigenvector is
repeated m > 1 times, we say that it is a degenerate, and there may be up to m linearly independent
eigenvectors corresponding to this eigenvalue.

7.1.1 Eigenvalues and eigenvectors of Hermitian operators


Let us consider the eigenvalues and eigenvectors of a Hermitian operator. This is a very special case
in physics, as Hermitian operators form the backbone of contemporary quantum mechanics.
Theorem 7.2: The eigenvalues of Hermitatian operators

Let  be a Hermitian operator, that is † = Â. The eigenvalues for this operator are real, and
its eigenvectors are orthogonal.

Proof. The eigenvalue equation for an operator is given by,

 |ui i = i |ui i . (7.5)

Consider the matrix element of  with respect to two eigenvectors

huj | Â |uk i = k huj |uk i , (7.6)

using the Hermitian property, we have

huj |  |uk i = huj | † |uk i = huk |  |uj i = ¯ j huj |uk i , (7.7)

Equating the two expressions, we obtain,

( k
¯ j ) huj |uk i = 0. (7.8)

There are two cases to consider, for j = k, we must have j = ¯ j 2 R since huj |uj i > 0 for all
6 0. For j 6= k, eigenvectors are distinct therefor j 6=
|uj i = k , so we must have huj |uk i = 0.

7.1.2 Eigenvalues and eigenvectors of unitary operators


The eigenvalues of unitary operators also have a rather interesting form:
Chapter 7. Eigenvalue problems and Diagonalisation 45

Theorem 6.2: The eigenvalues of unitary operators

A unitary operator Û which satisfies Û Û † = 1̂, has eigenvalue equation,

Û |uj i = j |uj i , (7.9)

which satisfy the following conditions,

(1) | j | = 1, with j = ei✓j where ✓j 2 R.

(2) If j 6= k then huj |uk i = 0.

(3) The eigenvalues of the Û can be chosen to be an orthonormal basis for V N .

Exercise 7.3
Prove the above theorem as an exercise.

7.2 Spectral representation and diagonalising operators


If  is Hermitian or Unitary, then the set of its eigenvectors {|uj i}N
j=1 are an orthonormal basis,
PN
therefore we can write a completeness relation 1̂ = j=1 |uj ihuj |. From the eigenvalue equation, we
have
 |uj i = j |uj i , (7.10)
using the completeness relation we have,
N
X N
X
 = Â1̂ =  |uj ihuj | = j |uj ihuj | . (7.11)
j=1 j=1

This is called the spectral representation of Â. In matrix form, this corresponds to a diagonal matrix
with matrix elements huj | Â |uk i = k huj |uk i = k jk . This yields,
0 1
1 0 ··· 0
{|uj i}N B 0 2 ··· 0 C
j=1 B C
 ! Adiag = B . . C. (7.12)
@ .. .
. ··· 0 A
0 0 ··· N

Given a matrix representation of  in terms of the orthonormal basis {|ej i}N j=1 , which we denote A,
then we can rewrite this transform this operator into its eigenbasis following the same procedure given
in 6.3,
Adiag = TAT† , (7.13)
where the elements of the unitary operators is given by (T)jk = huj |ek i, where the j th column of the
matrix is the j th eigenvector of Â.
Exercise 7.4
Using the eigenvalues and eigenvectors found previously, write the Pauli operators in diagonal
form.
Chapter 7. Eigenvalue problems and Diagonalisation 46

7.2.1 Diagonalisation of commuting operators


Two operators are said to commute if ÂB̂ = B̂ Â. If we consider the eigenvalues of B̂ given by

B̂ |ui = |ui , (7.14)

then for commuting operators we have

ÂB̂ |ui = Â |ui ,


(7.15)
)B̂ Â |ui = (Â |ui).

So  |ui is also an eigenvector of B̂ with eigenvalue . The implications of this statement depend on
whether eigenvalues are non-degenerate or degenerate:

(1) Non-degenerate: If is non-degenerate, then there is one eigenvector |ui associated to the
eigenvalue . This means that  |ui = µ |ui, where µ is a scalar. Therefore |ui is a simultaneous
eigenvector of B̂ and Â.

(2) Degenerate: If has degeneracy m > 1, with corresponding eigenvectors {|uj i}m j=1 , then a
linear combination of these eigenvectors is still and eigenvector of Â. The operator B̂ acts in this
subspace, and it is possible to find a basis that simultaneously diagonalises  and B̂.

7.3 Further reading


For follow up work, it is absolutelky crucial that you are able to find the eigenvalues and eigenvectors
of an operator and matrix. If you would like a reminder of how to do this, see the week 0 content
folder, where there are some notes on linear algebra operations, as well as some pre-recorded videos of
eigenvalue problems. You may also want to look at:

• Riley, Hobson, and Bence, pgs 272-280.


Lecture 8
Functions of operators and the operator trace

Let’s start by doing some quantum mechanics: The time-dependent Schrödinger equation can be
written in vector form as,
@ | it
i~ = Ĥ | it . (8.1)
@t
This is precisely the same Schödinger equation you used in Introduction to QM, but we replace the
wavefunction with a ket, and Ĥ is the Hamiltonian operator. Since this is a simple first-order ODE,
the solution to this differential equation is then (which we will prove later),
iĤt/~
| it = e | i0 , (8.2)
⇣ ⌘
where we have an exponentiated the Hamiltonian. But how do we calculate exp  ? More generally,
how do we apply functions f to operators?
In this lecture we will,
• Apply a function to an operator using the spectral representation.
• Find the trace of an operator and show that it is basis independent.

8.1 Functions of operators


We know how to add and multiply operators (normally in matrix form); using these concepts, we can
develop a power series to calculate a number of functions. Consider a function f (z) which has a Taylor
series about the point z = 0, we would like to apply this function to the linear operator Â. By using
the Taylor expansion for f , we have:
1
X f (n) (0)
f (Â) = Ân , (8.3)
n!
n=0
dn f
where f (n) (0) = dxn |x=0 ,and Ân = ÂÂÂ · · · Â n times and Â0 = 1̂. If we now apply f (Â) to an
eigenvector |ui of  with eigenvalue , then we have:
1
X 1
X
f (n) (0) n f (n) (0) n
f (Â) |ui = A |ui = |ui = f ( ) |ui . (8.4)
n! n!
n=0 n=0

If we build up the matrix representation of f (Â) in the eigenbasis of  in the usual way, then we
obtain: 0 1
f ( 1) 0 ··· 0
B 0 f ( 2) · · · 0 C
B C
f (Â) = B .. .. C, (8.5)
@ . . ··· 0 A
0 0 · · · f( N)

48
Chapter 8. Functions of operators and the operator trace 49

i.e. we have a diagonal matrix with the function applied to the eigenvalues of  along the diagonals.
This is only valid if the series of expansion of f (x) converges for all x = j .
Let’s do an example: Consider the 2D vector space C2 , with orthonormal basis {|0i , |1i}. The
Pauli operator ˆx = |0ih1| + |1ih0|, has a spectral decomposition,

ˆx = |+ih+| | ih | , (8.6)
p
where |±i = (|0i ± |1i)/ 2 are the eigenvectors of ˆx . We can define a rotation in C2 as,

R̂(✓) = exp( iˆx ✓), (8.7)

where ✓ 2 R is the angle of rotation. Using the spectral representation of ˆx , we have:

R̂(✓) = exp( iˆx ✓) = exp( i✓) |+ih+| + exp(i✓) | ih | . (8.8)

Expanding |±i into the {|0i , |1i} basis, we have:

R̂(✓) = cos(✓)(|0ih0| |1ih1|) + i sin(✓)(|0ih1| + |1ih0|). (8.9)

or in matrix form, ✓ ◆
|0i,|1i cos(✓) i sin(✓)
R̂(✓) ! . (8.10)
i sin(✓) cos(✓)

An alternative way to find the matrix representation of R̂(✓) would be to use the definition of the
Taylor expansion:
1
X ( i✓)n n
R̂(✓) = ˆx . (8.11)
n!
n=0

and note that ˆx2 = 1̂ so that we have two terms in the expansion:
X (i✓)n X (i✓)n
R̂(✓) =ˆx + 1̂ ,
n! n! (8.12)
j=odd j=even

= cos(✓)1̂ + i sin(✓)ˆx .

Both are valid approaches which lead to the same result. One advantage of the latter, however, is
that it does not require the knowledge of the eigenvectors of ˆx . This is useful from a computational
point of view, as finding the eigenvectors of an operator in N -dimensional space can be a compu-
tationally expensive task. Specifically, the computational complexity1 of eigen decomposition scales
as O(N 3 ). Therefore it is often advantageous to use the Taylor expansion and repeatedly multiply
matrices together to find the function of an operator.
Exercise 8.1
⇣ ⌘
Find the exponential f (Ĥ) = exp Ĥ , where Ĥ is a Hermitian operator, with eigenbasis
i=1 , and
{|✏i i}N 2 R.

As a final note, and a property that will be useful in your studies of quantum mechanics, is that if
 is a Hermitian operator, then the function,
⇣ ⌘
Û (✓) = exp iÂ✓ ,
1
i.e. the time taken for a computational task to run
Chapter 8. Functions of operators and the operator trace 50

where ✓ 2 R, isP
also unitary. This can easily be seen using the spectral representation of Â, where we
have that  = N j=1 j | j ih j |, then we have:

⇣ ⌘ XN
Û (✓) = exp iÂ✓ = exp( i j ✓) | j ih j | . (8.13)
j=1

and since the adjoint of this operator is,


N
X
Û † (✓) = exp(i j ✓) | j ih j | , (8.14)
j=1

then we have that,


X X
Û † (✓)Û (✓) = e i( j k )✓
h k | j i | k ih j | = | j ih j | = 1̂. (8.15)
jk j

This is exactly the form of the time-evolution operator in quantum mechanics, where  is the Hamil-
tonian operator, and ✓ is the time.

8.2 The Trace


A final operation which is extremely useful in the study of linear operators is the trace:

Definition 8.1: The trace

Consider a linear operator  which acts on the vectorspace V N . For a given orthonomal basis
j=1 , the trace of  is the sum of its diagonal matrix elements,
{|ej i}N

N
X
Tr(Â) = hej | Â |ej i . (8.16)
j=1

The trace is independent of the choice of orthonormal basis. P To see this, we consider a second or-
thonormal basis {|fj i}N
j=1 , which allows us to write  = lm Ãlm |fl ihfm |, with Ãlm = hfl |  |fm i.
Taking the trace with respect to the basis |ej i, we have:
N N
!
X X X
Tr(Â) = hej | Â |ej i = hej | Ãlm |fl ihfm | |ej i
j=1 j=1 lm
X
= Ãlm hej |fm i hfl |ej i
lmj
X
= Ãlm hfl |ej i hej |fm i
lmj
0 1
X X (8.17)
= Ãlm hfl | @ |ej ihej |A |fm i
lm j
N
X
= Alm hfl |fm i
lm=1
XN
= hfl | Â |fl i .
l=1
Chapter 8. Functions of operators and the operator trace 51

So the trace is equivalent in both basis.


From the above definition we can derive several very important properties of the trace:

(i) The trace is linear, that is for operators  and B̂, we have

Tr(↵Â + B̂) = ↵Tr(Â) + Tr(B̂) (8.18)

where ↵ and are scalar.

(ii) The trace of a Dyad is given by the corresponding inner-product

Tr(|vihu|) = hu|vi . (8.19)

(iii) The trace of the product of operators  and B̂ is independent of the order of the product, that
is, Tr(ÂB̂) = Tr(B̂ Â). This is true even if the operators are non-commuting.

(iv) It follows that the trace of product of three or more operators is invariant under cyclic invariance:

Tr(ÂB̂ Ĉ) = Tr(B̂ Ĉ Â) = Tr(Ĉ ÂB̂). (8.20)

(v) The complex conjugate of a trace corresponds to the adjoint of it’s argument:

Tr(Â) = Tr(† ),
(8.21)
Tr(ÂB̂ Ĉ) = Tr(Ĉ † B̂ † † )

where in the latter case, we have taken care to reverse the order of the operators in the product.

Exercise 8.2
Prove properties (i-v) of the trace.

If we have the eigenvalues and eigenvectors of an operator, then we can find a very simple expres-
sion for the trace. Consider the spectral representation of a Hermitian operator Â, with normalised
eigenvectors {| j i}N
j=1 and corresponding eigenvalues j . Using the definition of the trace, we have:

N
X N
X N
X
Tr(Â) = h j | Â | j i = j h j| ji = j. (8.22)
j=1 j=1 j=1

So the trace of an operator is simply the sum of its eigenvalues.


Chapter 8. Functions of operators and the operator trace 52

Exercise 8.3

The quantum harmonic oscillator (QHO) is described by the Hamiltonian Ĥ = ~!(n̂+1/2). The
hermitian operator n̂ is often called the number operator as its eigenvectors {|ni}1
n=0 correspond
to the number of excitations in the QHO, and satisfy the eigenvalue equation n̂ |ni = n |ni.

(a) Show that the Ĥ is diagonal in the eigenbasis of n̂.



(b) A QHO in thermal equilibrium can be described by the Gibbs state, ⇢ˆ = e Ĥ , show
tr(e )
that
e ~!/2
tr(e Ĥ ) =
1 e ~!
P
[Hint: You will need to make use of the formula for a geometric series 1 n
n=0 r = 1/(1 r)
for r < 1.]

(c) The average energy of the QHO while in thermal equilibrium can be found using the trace
formula, hĤi = tr(Ĥ ⇢ˆ), show that,

~!
hĤi = ~!
+ ~!.
e 1
This is a formula you will come across a number of times in P
your statistical mechanics
1 @ P1 1
course. [Hint: you can use the identity ~! @ n=0 e n~! =
n=0 ne
n~! ]

8.3 Further reading


To follow up on this sections materials check out:

• Shankar pg 54-57.

• Riley et al. pg 258.


Lecture 9
Functions as vectors

Over the past 8 lectures we have made nebulous references to functions being linked to vectors and
linear algebra. In this lecture we will make this link explicit and formalise the relationship between
functions and linear algebra.
Specifically, by the end of this lecture you should be able to:

• Appreciate that functions can be considered as vectors in a vector space.

• Write a function in terms of a set of basis functions.

• Define the inner product of function space.

• Define the concept of completeness in function space and the coordinate representation.

• Define linear operators in function space.

9.1 Handwaving to a vector space of functions


At the start of this course we showed that real valued functions form a vector space. In fact, throughout
your undergrad we have made a number of nebulous references to functions being linked to vectors and
linear algebra. For example, orthogonality, operators, eigenvalues and their associated eigenfunctions,
are all concepts you touched upon in Fourier analysis, the study of differential equations, and quantum
mechanics. In this chapter we will make explicit the link between functions and linear algebra explicit.

x2 x3
x1 x4
A B A .. B
.

(b)
(a) (b)

Figure 9.1.1: A string clamped between two points A and B. The continuous function of the amplitude (x)
pictured in (a) can be approximated by discrete points shown in (b).

Let’s consider an example of a vibrating string clamped between two point A and B, given in
Fig. 9.1.1. At a given instant, the oscillation of this string is described by the continuous function (x)
for x 2 [A, B]. A natural way to store this function in a computer would be discretise the x axis into
N pieces, and evaluate function at each point xn = n x, where x = (A + B)/N . The result will be

53
Chapter 9. Functions as vectors 54

the N -component vector, 0 1


(x0 )
B (x1 ) C
B C
B .. C. (9.1)
@ . A
(xN )
This vector can be considered an approximation to the function . The more precisely we wish to
represent the smaller we take x, and the larger the dimension of the vector. Taking this analogy
to it’s extreme and considering infinitesimal x will result in an infinite dimensional vector. This is
the intuition that underpins the description of continuous functions as vectors.
It is worth mentioning at this point that the study of functions as vectors is fraught with mathe-
matical difficulties. When replacing the discrete finite dimensional vectors which we typically consider
in linear algebra with continuous functions of infinite dimension, we have to be cautious about the con-
vergence of sums and products. We will mostly dance around this issue, and apply a physicists natural
blasé attitude towards infinities. However, for those of you that find this approach unsatisfactory, I
would direct you to the branch of mathematics functional analysis, which deals with these issues in a
rigorous fashion.

9.2 Representing functions as vectors


The loose analogy above allows us to make a concrete correspondence between vectors and functions:
If we consider a function f to be a vector, then f (x) is a component of the vector, and x plays the role
of an index. Unless otherwise specified, we will refer to vector spaces of functions as function spaces,
denoted with the symbol F, so the expression f 2 F tells us f is part of the function space F.
As in the discrete linear algebra case, to go further, we need to define a scalar/inner product:

Definition 9.1: The inner product of function space

For two functions f and g which are defined in the interval x 2 [a, b], the innerproduct can be
quite naturally defined as
Z b
hf |gi = f (x)g(x) dµ(x), (9.2)
a
where dµ(x) is called the measure of a function space. The choice of this measure can vary
significantly depending on the function space considered.

For example, in the case of applications in quantum mechanics, we have dµ(x) = dx, and the inner
product of two wavefunctions (x) and (x) is:
Z b
h | i= (x) (x) dx. (9.3)
a

This will be the measure that we most often choose.


However, there alternative choices. For example, Hermite polynomials are orthonormal with respect
2
to an inner product with measure dµ(x) = e x dµ(x). If an alternative measure is used, then this will
be clearly stated from the offset.
As this is an innerproduct, then it must also satisfy
Z b
kf k2 = hf |f i = |f (x)|2 dµ(x) 0 (9.4)
a

which is 0 only if f (x) = 0 8 x. If kf k2 is finite, then we say that f is square integrable. Vector spaces
of functions with an inner product defined, where all functions have finite norm are of such importance
Chapter 9. Functions as vectors 55

to physics that they have a special name, they are called Hilbert spaces. In fact, last semester you
made extensive use of Hilbert spaces in quantum mechanics!
Take the time to check for yourselves that the above definition of an inner product satisfies the
same requirements as in Sec. 3.1, i.e., it is linear in the second argument, antilinear in the first, and
hf |gi = hg|f i.
The above definition of an inner product allows us to naturally extend the notion of orthogonality
to functions: Two functions f and g are orthogonal if hf |gi = 0.
Note: This is the first area we have to be cautious about convergence. If the domain of integration
is infinite (i.e. a or b = ±1) then f and g must tend to zero sufficiently quickly as |x| ! 1 for
the inner product to be finite. This is a similar requirement that we have for the existence of Fourier
transformations.

9.2.1 Basis functions and completeness


Note: For the rest of these notes, we will take dµ(x) = dx. This mainly due to convenience. All the
concepts and definitions can be straightfowardly generalised to arbitrary measure dµ(x).
In order to do anything interesting with vectors in the previous sections, it was necessary to define
a set of basis vectors. This can be extended to the case of functions: if a function f 2 F, defined on
the domain x 2 [a, b], can be represented by the set of basis vectors {un (x), x 2 [a, b]}1n=1 , then we
may write
1
X
f (x) = fn un (x), (9.5)
n=1

where the coefficients fn are defined as in Sec. 3.1 through the inner product,
Z b
fn = hun |f i = un (x)f (x) dx. (9.6)
a

Note Eq. 9.5 is written in terms of the components of f , since we reference some value of x. We could
instead
Pmake the vector nature of f more explicit by writing the above expansion in Dirac notation,
|f i = 1 n=1 fn |un i.
While the language we have used to describe the expansion in 9.5 may be somewhat new to you,
the idea of expanding one function in terms of a set of functions should be quite familiar to you. Here
are two examples you should have encountered before:
2
Example 1. Consider the wavefunction of the quantum harmonic oscillator un (x) = An Hn (x)e x /2 ,
where Hn (x) are the set of Hermite polynomials and An is a normalisation constant. These functions
form an orthonormal basis set the space of square-integrable functions on the domain x 2 ( 1, 1),
and
P1 we can write the wavefunction for a general state of the quantum Harmonic oscillator as (x) =
n=1 ↵n un (x), where ↵n is a probability amplitude.

Example
p 2. Consider the set of periodic functions on [ ⇡, ⇡]. The set of plane waves un (x) =
e / (2⇡) can be used as an orthonormal basis:
inx

Z ⇡ ⇢
1 i(n m)x 1 for n = m
hun |um i = e dx = . (9.7)
2⇡ ⇡ 0 for n 6= m

Now a set of basis functions is said to be complete if any function in the space can be represented
by the basis, and necessarily implies the existence of a completeness relation. For example, consider
the functions f, g 2 F. If the set of basis functions are {un }1
n=1 a complete orthonormal basis for F,
then this implies
X1 X1
f (x) = fn un (x) and g(x) = gn un (x), (9.8)
n=1 n=1
Chapter 9. Functions as vectors 56

with fn = hun |f i and gn = hun |gi. Taking the innerproduct, we then have,
1 1 1
!
X X X
hf |gi = f¯n gn = hf |un i hun |gi = hf | |un ihun | |gi , (9.9)
n=1 n=1 n=1

which implies that we have:


1
X
|un ihun | = 1̂. (9.10)
n=1
which is the same expression we have in the discrete case.
Note: It is important to note that this is not a proof of the completeness relation, but rather a
consequence of our assumption that the basis in question is complete. Proving a basis is complete is
challenging issue for infinite dimensional vector spaces, and we shall restrict ourselves to situations
where we are gifted completeness, such as the quantum harmonic oscillator basis functions.
Aside: Notice that we have written the above completeness relation in terms of Dirac notation.
This is intentional, as this form will be most useful in your studies of quantum mechanics. We can
alternatively write the completeness in a more functional form as:
1
X
un (x)un (y) = (x y), (9.11)
n=1

where (x) is the Dirac -function. Though this expression looks more intimidating with the presence
of the -function, it is equivalent to the completeness relations we have studied previously. To see this,
consider the components of the identity operator we used in the discrete case, (1̂)ij = ij , where the
Kronecker- is 1 when i = j, and 0 for all other values. Now, recall that for function spaces un is a
vector and un (x) is a specific component of un labelled by x. So we can understand Eq. 9.11 as the
components of the identity operator in function space, where (x y) is zero everywhere except when
x = y. There is a complication here that when x = y, the -function is ill defined unless under an
integral sign, however the intuition holds.

9.3 The coordinate representation


A particularly important basis that we have been using implicitly already is the referred to as coordinate
representation. We regarded the values of the function f (x) as the components of the vector, with
implicit reference to a basis defined by the coordinate x. Let us be more explicit, if we consider the
basis vector |xi, which we call a position vector, then we can write the function f 2 F as
Z b
|f i = dxf (x) |xi , (9.12)
a

where rather than a sum over the basis, we have used an integral since x is a continuous variable. This
definition allows us to straightforwardly identify,
Z b
f (x) = hx|f i = (y x)f (y)dy, (9.13)
a

where we have noticed that the -function satisfies the requirements to recover f (x) from the inner-
product. So the Dirac -function plays the role of basis vector for position vectors. We will generally
stick with Dirac notation |xi as the basis vector, as manipulating -functions is fraught with problems
since they are not well defined outside of integrals. Ignoring these strict mathematical considerations,
we can define the overlap between two position vectors as,
Z b
⌦ 0↵
xx = (y x) (y x0 )dy = (x x0 ), (9.14)
a
Chapter 9. Functions as vectors 57

if and only if x, x0 2 [a, b].


Since by construction any function in F can be written in the position basis |xi, it is complete. We
can therefore find the completeness relation,
Z b
|xihx| dx = 1̂. (9.15)
a

Note: The basis |xi is not square integrable, since hx|xi = (x x) = (0) = 1. So the basis |xi is
not normalisable!

Aside: We can see the equivalence with the functional form of the completeness relation given in
Eq. 9.11 explicitly. If we consider the matrix elements of the 1̂ with repsect to the position basis,
hx| 1̂ |yi = hx|yi = (x y).

9.4 Operators in function space


It is also possible to define linear operators in function space: a linear operator  maps a function
|f i 2 F to another function |gi = Â |f i 2 F 0 , where in general the function spaces can be different
before and after the mapping.
A simple example that you have come across in quantum mechanics is the position operator X̂. This
operator is defined as the corresponding eigen operator to the position basis |xi, such that X̂ |xi = x |xi.
This implies that for a function f we find the following equivalence:
|xi
|gi = X̂ |f i ! g(x) = xf (x). (9.16)

Notice that we are using the arrow to indicate that g(x) is the representation of |gi with respect to
the basis |xi. We can show this by considering the components of g(x),
Z b Z b
0 0
↵⌦ 0 ↵ ⌦ ↵⌦ ↵
g(x) = hx|gi = hx| X̂ |f i = dx hx| X̂ x x f = dx0 x0 x x0 x0 f = x hx|f i , (9.17)
a a

where we have used hx|x0 i = (x x0 ). Alternatively, since |xi is an eigenstate of X̂, then we could
write the spectral decomposition of X̂ as,
Z b
X̂ = x |xihx| dx. (9.18)
a

Exercise 9.1

Show the spectral decomposition of X̂ given in Eq. 9.18. Demonstrate that X̂ is a hermitian
operator.

Another important class of linear operators in function spaces are differential operators, e.g.
|xi X dn
|gi = K̂ |f i ! g(x) = hn (x) f (x), (9.19)
n
d xn

where hn (x) are arbitrary functions of x. Check that the above operator satisfies the definitions of
linearity.
As an example, lets consider a linear operator D̂, whose action on f is to take the first derivative,
that is,
d
|gi = D̂ |f i $ g(x) = f (x). (9.20)
dx
Chapter 9. Functions as vectors 58

An interesting propert of the operator D̂ can be seen by considering the inner product:
Z b
hh|gi = hh| D̂ |gi = h(x)g(x) dx, (9.21)
a

where h is some arbitrary function. Using Eq. 9.20 and integrating by parts, we see that
Z b ✓Z b ◆
d d
hh|gi = h(x) f (x) dx = [h(x)f (x)]ba f (x) h(x) dx = hf | D̂ |hi = hh| D̂† |f i
a dx a dx

where we have assumed the functions vanishes at the boundaries x = a and x = b, and the last step
is found through the definition of the adjoint. From this definition we see that D̂† = D̂, so the
differential operator is anti-Hermitian, also referred to as skew-Hermitian. A hermitian operator can
be constructed from D̂ using:
K̂ = iD̂, (9.22)
which should look very familiar as the momentum operator you used in quantum mechanics!

9.5 Further reading


Further reading on the topic of viewing functions as vectors can be found from pg 57 onwards in
Shankar. Problem sheet 3 covers functions as vectors in detail.
Lecture 10
Changing basis and the momentum representation

In the last lecture, we showed that functions can be readily represented as vectors in a vector space,
defining the concept of basis functions and differential operators in the language of linear algebra. In
this lecture we will show how to change basis and introduce the momentum representation of functions,
which compliments the coordinate representation introduced last lecture.
By the end of this lecture you should be able to:
• Change the basis used to represent a function.
• Define the momentum representation of a function, and change between coordinate and momen-
tum representation.
• Define the spectral representation of a linear operator in the momentum representation.

10.1 Changing basis


As in the case of discrete vector spaces, we can change the basis used to represent a function. Consider
two set’s of complete orthonormal bases {|un i}n = 01 and {|vn i}n = 01 for the function space F,
where we have assumed these basis functions are discrete. An arbitrary function f 2 F can be written
in either of these basis functions:
1
X 1
X
|f i = fn |un i = cn |vn i , (10.1)
n=1 n=1
Rb Rb
where we have that fn = hun |f i = a un (x)f (x) dx and cm = hvm |f i = a un (x)f (x)dx. As in the
discrete casel, the functions itself is left unchanged by the choice of basis, however, the representation
of that function becomes basis dependent. We can transform between representations using,
X
cm = fn hvm |un i . (10.2)
n

If the basis chosen to represent a function is a continuous variable, as in the position/coordinate


representation, then we simply replace the summations above with integrals over the relevant basis
vectors.
For example, if we are interested in a complete and orthonormal basis |ki, where k is a continuous
variable, then the resolution of the identity is defined as,
Z b
1̂ = d k |kihk| . (10.3)
a

Inserting this resolution of the identity into the coordinate representation of f , we obtain,
Z b Z b ✓Z b ◆ Z b
|f i = dxf (x)1̂ |xi = dk dxf (x) hk|xi |ki = dk f˜(k) |ki . (10.4)
a a a a

59
Chapter 10. Changing basis and the momentum representation 60

where we have defined,


Z b
f˜(k) = dxf (x) hk|xi , (10.5)
a

10.2 Momentum representation


From this point on, we will take a = 1 and b = 1, and assume that all f 2 F satisfy f (x) ! 0 as
|x| ! 1. A very special type of basis transformation emerges when we consider the eigenstates of the
momentum operator K̂ = iD̂, defined as,

K̂ |ki = k |ki . (10.6)

Recall that K̂ is Hermitian, and therefore the eigenvalues are real k 2 R and the eigenvectors are
orthogonal satisfying hk|k 0 i = (k k 0 ). If applying the position basis bra, we have:

hx| K̂ |ki = k hx|ki , (10.7)

and using the adjoint, we also obtain:


d
hx| K̂ |ki = hk| K̂ |xi = i k (x), (10.8)
dx
where k (x) = hx|ki. Combining these expressions, we obtain the ordinary differential equation:

d
k (x) =i k (x) =) hx|ki = k(x) = N eikx , (10.9)
dx
where N is a constant of normalisation. Subbing this expression back into Eq. 10.5, we have:
Z 1
˜
f (k) = N dxf (x)eikx , (10.10)
1

which is simply the Fourier transformation of the function f (x)!

Exercise 10.1
p
Use the orthonormality of the momentum eigenstates |ki to show that N = 1/ 2⇡.

We can therefore define two completely equivalent representations of the function f as,
R1
|f i = 1̂ |f i = 1 dx f (x) |xi , (10.11)
R1
|f i = 1̂ |f i = dk f˜(k) |ki . 1 (10.12)

In the first expression the expansion coefficients are values of f (x), we call this the position representa-
tion. In the second, the expansion coefficients are values of f (k), and we refer to this as the momentum
representation.

10.2.1 Parcevals theorem


You may recall from your Math’s of waves and fields course two important theorems, the multiplication
theorem, which states that for two function f (x) and f (x), with corresponding Fourier transforms f˜(k)
and g̃(k) respectively, we have:
Z 1 Z 1
f (x)g(x) dx = f˜(k)g̃(k) dk. (10.13)
1 1
Chapter 10. Changing basis and the momentum representation 61

On first acquaintance, this theorem probably seems slight mysterious. However, with the machinary
of vector spaces, then it has a clear interpretation: the inner product of two functions is invariant of
choice of basis. This becomes even clearer when we consider Parceval’s theorem, which for g(x) = f (x),
we have Z 1 Z 1
2
|f (x)| dx = |f˜(k)|2 dk, (10.14)
1 1
which essentially states that the norm of a vector f is independent of the choice of basis.

Exercise 10.2
R1
Using the definition of the inner product in function space hf |gi = 1 f (x)g(x) dx, prove the
mulitplication theorem.

10.3 Operators in momentum representation


Now let us consider operators the representation of operator in the basis of momentum eigenstates.
The momentum operator is trivial,

hk| K̂ |f i = hf | K̂ |ki = k hk|f i , (10.15)

this leads to a spectral decomposition of the operator as,


Z 1
K̂ = dk k |kihk| . (10.16)
1

Notice the similarities with the spectral decomposition of the position operator X̂ in the position
representation.
Now things are a little more interesting for the position operator:
Z 1 Z 1
1
hk| X̂ |f i = hk|xi hx| X̂ |f i dx = p xf (x)eikx dx,
1 2⇡ 1 (10.17)
d
=i f (k).
dk
So in momentum representation the position operator looks just like the momentum operator in position
space!

10.4 Further reading


The material in this lecture is covered at length in Shankar pg 57 onwards. For practice in manipulating
functions in the momentum representation, take a look at problem sheet 3.
Lecture 11
Complex numbers

It is hard to emphasise just how important complex numbers are to modern physics and mathematics.
Throughout your first and second year courses, you will no doubt represented classically oscillating
systems in terms of complex exponentials, e.g. in simple harmonic motion. Then in quantum me-
chanics, complex numbers began to take a more central role, appearing in linear operators and in the
wavefuction of a system. In the remainder of this course, we are going to develop ideas of complex
analysis, and demonstrate just how powerful complex numbers can be. This will culminate in our
proof of Cauchy’s theorem—quite possibly the most beautiful theorem in mathematics— where real
integrals that are impossible to solve analytically with standard tools from real analysis, are evaluated
by simply counting the number of infinities in the complex plane. But before we get to this, we have
some work to do and tools to develop.
In this lecture we will:

• Revise the basic properties of complex numbers, and their polar representation.

• Introduce the fundamental theorem of algebra in terms of complex numbers.

• Define the concept of branch points and branch cuts of multivalued functions.

11.1 Complex numbers


A complex number z can be written as

z = x + iy, where x, y 2 R,

where we have introduced the complex unit i2 = 1, and we denote that if a number is complex then
z 2 C. We will often see that x = Re(z) and y = Im(z), which define the real and imaginary parts
respectively. Two complex numbers can be added:

(x + iy) = (a + ib) = (x + y) + i(b + y),

as well as multiplied:
(x + iy)(a + ib) = (xa yb) + i(ya + xb).
We also have the complex conjugate z = x iy, which allows us to define the modulus of a complex
number:
|z|2 = zz = x2 + y 2 0. (11.1)

62
Chapter 11. Complex numbers 63

(a) Im[z] Im [z]


(b)
z= rej✓
z p
y
w
p zz
|=
|z z
r=
✓+
✓ ✓
O x Re [z] O Re [z]

Figure 11.1.1: (a) An argand diagram for the complex number z = x + iy. (b) An argand diagram for a product
of complex numbers, note that the magnitude of p has not been drawn to scale.

11.1.1 Polar representation and the argand diagram


It is very useful to write a complex number in its polar representation,

z = rei✓ , (11.2)
p
where r = x2 + y 2 is the magnitude of z, and ✓ = arg(z) = arctan(y/x) is its argument or sometimes
phase. Graphically, we can represent a complex numbers using an argand diagram (which is sometimes
simply referred to as the complex plane), as shown in Fig. 11.1.1.
Using the polar representation, multiplication of complex numbers has a simple geometric inter-
pretation: if z = rei✓ and w = ⇢ei , then the product of the two is:

p = zw = (rei✓ )(⇢ei ) = r⇢ei(✓+ )

where the magnitude of the new complex number is |p| = r⇢ and the argument is ✓ + . We can also
write a complex exponential in terms Euler’s identity,

z = rei✓ = r cos(✓) + ir sin(✓). (11.3)

Combining the above representations with the properties of the exponential function yields de Moivres
theorem:
z n = rn ein✓ = rn (cos(n✓) + i sin(n✓)). (11.4)
It is important to note that from the polar representation of z, the argument of z is not unique, since:

ei✓+2⇡i = e2⇡i ei✓ = ei✓ .

So we have that arg(z) = ✓ + 2m⇡, where m 2 Z. To get around this multi-valued nature of the
argument, we define the principal value of arg(z), which is restricted to the range of ⇡ < ✓ ⇡. The
argument restricted to this range will be denoted Arg(z).

11.2 Fundamental theorem of Algebra


With some of the basics covered, we can state our first theorem in complex analysis (and its an
important one!):
Chapter 11. Complex numbers 64

Theorem 11.1: Fundamental Theorem of Algebra

Given any positive integer n 1 and any choice of complex numbers a0 , a1 , · · · , an , such that
an 6= 0, the polynomial equation

P (z) = an z n + · · · + a1 z + a0 = 0, (11.5)

has at least one solution z 2 C.

Put another way, the polynomial P (z) has n (not necessarily distinct) roots, allowing us to write,

P (z) = an (z z1 )(z z2 ) · · · (z zn ), (11.6)

where {zk 2 C}nk=1 are the roots of P (z) = 0. Although this may appear obvious, or perhaps not
very interesting on first acquaintance, its worth taking a moment to reflect on just how remarkable
this statement is: if I were to restrict the coefficients an to the reals, then there are clearly inumerable
situations where the roots of P (z) cannot be expressed as reals numbers. Similarly if an are the taken
topbe rational, then its perfectly possible to get irrational roots, e.g. for P (z) = z 2 2, the roots are
± 2. Clearly complex numbers are a rather special set of numbers, with the roots of P (z) belonging
to the same field as the polynomial. Unfortunately, we are not quite equipped to prove this theorem
at the moment, we will tackle this later, but it is sufficiently important for us to state it here without
proof.

Exercise 11.1:
Consider the polynomial P (z) = an z n + · · · + a1 z + a0 . If the coefficients ak 2 R 8k, prove that
the roots are either real or occur in complex conjugate pairs (i.e. of zk is a root then so is zk is
also a root).

As an example of the fundamental theorem of al-


gebra, consider the polynomial z 6 1 = 0. If we use Im [z]
de Moivres theorem, then we obtain,

z 6 = r6 ei6✓ = r6 (cos(6✓) + i sin(6✓)) = 1. z3 z2

So the imaginary part is given by r6 sin(6✓) = 0 so


✓ = m⇡/6 where m 2 Z. The real part can then be
found as r6 cos(m⇡) = r6 ( 1)m = 1, this implies that z4 z1
r = 1 with even m = 2p, for p 2 Z. The values of the
argument then read, Re [z]

p⇡ ⇡ 2⇡ 4⇡ 5⇡
✓= = 0, , , ⇡, , ,
3 3 3 3 3
plugging this back in, we have, z5 z6

zp = eip⇡/3 = 1, ei⇡/3 , ei2⇡/3 , 1, ei4⇡/3 , ei5⇡/3 , Figure 11.2.1: The 6th -roots of unity.

these are the six 6th roots of unity. The argand dia-
gram given in Fig. 11.2.1 shows the roots, which live
on the unit circles.
Chapter 11. Complex numbers 65

11.3 Branches and branch points


Consider the function w = z 1/2 . We can quite naturally express this function using the polar repre-
sentation of z, yielding
w = r1/2 ei✓/2 ,
if we plot this on an argand diagram, we have,

Im[z] Im[w]
z

⇡/3
⇡/6
O Re [z] O Re [w]

so a point in the complex plane maps to half the angle and with magnitude reduced by a square root.
More interesting things happen if we consider the action of Im[z]
the function z 1/2 on all points in the complex plane. Before the
applying the function, z can be defined anywhere in the complex
plane, as indicated by the hatched area in Fig. 11.3.1.
Now applying the function and assuming we take the prin-
cipal range for the arguments, i.e. ⇡ < Arg(z)  ⇡, then the Re [z]
new complex variable w has argument ⇡/2 < Arg(w)  ⇡/2.
After applying the function, the new complex number is defined
only for Re(w) 0, only half the complex plane as shown in
Fig. 11.3.1. Clearly this is not the full picture, what happened Im[w]
to the negative square root?
This is a consequence of not being sufficiently general with
our definition of z: we need to use the definition,

z = rei✓+2n⇡i where n 2 Z, (11.7) Re [w]


this generalises z without changing its value, since e2n⇡i = 1 for
all integer n.
Using this definition, we then have,
Figure 11.3.1: (Top) The full domain.
w(z) = z 1/2 = r1/2 ei✓/2+n⇡i . (Bottom) The mapped domain.

Since en⇡i = 1 for odd values of n, and en⇡i = 1 for even n, we


find that the square root has two branches:

weven (z) =r1/2 ei✓/2 ,


(11.8)
wodd (z) =r1/2 ei✓/2+i⇡ = weven (z).

So the square root function is multi-valued, i.e. every value of z 2 C can be mapped to two different
points in the w-plane. The exception to this is occurs at the origin z = 0. This is a special point
Chapter 11. Complex numbers 66

as weven (z = 0) = wodd (z = 0) = 0, known as a branch point, where the function is discontinuous.


Formally, we define a Branch point as:

Definition 11.1: Branch point

A point z0 is a branch point of a multivalued function f (z) if the value of f (z) does not return
to its initial value as a closed circuit is traced out around that point in such a way that f varies
continuously along the circuit.

Let’s apply this definition to the function f (z) = z 1/2 . Taking a small circuit in the complex plane
z = ✏ei✓ for ⇡  ✓  ⇡, where ✏ is very small. This defines a circle of radius ✏ centred about the
point z = 0. The function z 1/2 has two possible values,

✏1/2 ei✓/2 branch 1,
z 1/2 =
✏ ei(✓/2+⇡) branch 2.
1/2

Considering branch 1: As we cross the line seperating ✓ = ⇡ and ✓ = ⇡, the circuit becomes
discontinuous:
✓ = ⇡ ) z 1/2 = ✏1/2 ei⇡/2 = i✏1/2 ,
(11.9)
✓= ⇡ ) z 1/2 = ✏1/2 e i⇡/2
= i✏1/2 .
Therefore, z = 0 is branch point for z 1/2 . So we say that z 1/2 has two branches and one branch point
at z = 0.
Exercise 11.2:

Show that the functionb f (z) = [(z a)(z b)]1/2 has two branches, and branch points at z = a
and z = b.

While this all seems abstract and not so useful, considering


the continuity of a function becomes very important when we Im[z]
look at its differentiability discussed in the next chapter. When
working with multivalued functions, we need to make a definite
choice as to which branch we are working with to avoid ambigu-
ity. We must also be cautious not to make circuits around branch z0
points, where the function becomes discontinuous. These issues Re [z]
can be avoided by using branch cuts.
Defnition 11.2: Branch cuts
A branch cut is a line or ‘cut’ in the complex plane, either:
(a) Between a branch point and infinity or (b) between 2 Im[z]
branch points. Across the branch cut our function is made
discontinuous allowing us to choose the branch we want.
z1 z2
We can think of a branch cut as a barrier in the complex
Re [z]
plane that we must never cross, preventing us from completing a
full circuit in the complex plane. This means that the function
in question will remain single valued. For the example of our
function z 1/2 , we can take a branch cut in any direction from
the origin to infinity, which therefore prevents a full circuit to Figure 11.3.2: (Top) A branch cut
be drawn about the origin. Another example of branch cuts and for the function z 1/2 , starting at the
points can be found for the function f (z) = [z(z 1)]1/2 . This branchpoint z0 to 1. (Bottom) A
function has two branches and two branch points at z1 = 0 and branch cut between the point z1 = 0 and
z2 = 1. We can define a branch cut between these two points z2 = 1.
where a circuit cannot cross this branch cut.
Chapter 11. Complex numbers 67

11.4 Sets within the complex plane


It is often useful to classify sets of complex numbers depending
on there neighbourhood of points, particularly when we begin to
consider the convergence of power series and integrals of comn- Im[z]
plex functions.
BP
There are three sets of points that we shall consider: IP
Interior point:- is a point for which there exists a disk of radius ✏ 6= 0 centered
on the point, which contains only points of the set. Re [z]
Boundary point:- are those points for which a sufficiently small disk centred
on them contains both members and non members of the set considered.
EP
Exterior point:- is a point whose neighbourhood contains no members of the
set. Figure 11.4.1: Inte-
For example, if we consider the set of points for which |z|  1 (i.e. a unit rior point (IP), Exte-
circle), then any point with |z| < 1 is an interior point, any point with |z| > 1 rior point (EP), and
is an exterior point, and any point with |z| = 1 is a boundary point. In this Boundary point (BP).
example the boundary points are also members of the set |z|  1, where as if
we considered the set |z| < 1, then the boundary points would not belong to the set. If a set includes
only interior points, i.e. there are no boundary points, it is called open.
As a few final pieces of jargon that we will come across later in the course:
Connected sets:- Are sets for which any two points can be joined by a continuous path, as shown in
Fig. 11.4.2. If a set is both connected and open, then we call this a region of the complex plane.

z1
z2 z2
z1

(a) Connected sets. (b) Disconnected sets.

Figure 11.4.2: An example of connected and disconnected sets.

11.5 Further Reading


On the topic of multi-valued functions and branch cuts,

• Spiegel M.R. (1974) Complex Variables. Schaum’s Outlines. McGraw-Hill.

• Riley et al Section 24.5 pg 835-837


Lecture 12
Functions and differentiability in the complex plane

In this chapter we will consider more generally functions in the complex plane, and in particular,
consider what it means for a complex function to be differentiable. We will discuss what it means
for a complex function to be continuous and differentiable. This will set the scene for one of the
most important results in complex analysis, the Cauchy-Riemmann equations, which tells us whether
a function is differentiable or not. This will be relied on to prove many of the results developed later
in the course.

12.1 Functions in the complex plane


In general a complex function may be written as,

f (z) = f (x + iy) = u(x, y) + iv(x, y)

where u, v 2 R. Any complex function can be written in this form with varying degrees of algebra.
For example, consider the function f (z) = z/z, we have:

z x iy x iy (x2 y 2 ) i2xy
f (z) = = ⇥ = , (12.1)
z x + iy x iy x2 + y 2

so in the above example u(x, y) = (x2 y 2 )/(x2 + y 2 ) and v(x, y) = 2xy/(x2 + y 2 ).


Even complex exponential can be written in this form,

ez = ex+iy = ex eiy = ex (cos(y) + i sin(y)) = u(x, y) + iv(x, y). (12.2)

In this case, u(x, y) = ex cos(y) and v(x, y) = ex sin(y). Note that this also means that trig and
hyperbolic functions can be written in this form for example,
1 iz iz 1 ix y ix y
sin(z) = e e = e e e e = sin(x) cosh(y) + i cos(x) sinh(y). (12.3)
2i 2i
The more astute of you may have spotted a potential ambiguity in how we destinguish between between
trigonometric functions and their hyperbolic counterparts, since sin(z) contains both hyperbolic and
trig terms. For contreteness, we define the two families of functions using the convention,
1 z z 1 iz iz
cosh(z) = e +e , cos(z) = e +e ,
2 2 (12.4)
1 1 iz
sinh(z) = ez e z
, sin(z) = e e iz
.
2 2i
The familiar identities still work, for example cos2 (z) + sin2 (z) = 1, but in general it is not guaranteed
that | sin(z)|  1.

68
Chapter 12. Functions and differentiability in the complex plane 69

12.2 Continuity and differentiability of complex functions


A complex function f (z) is said to be continuous at a point z0 if limz!z0 f (z) exists and is equal to
f (z0 ). Existence of the limit means that f (z) ! f (z0 ) regardless of the direction that is taken to arrive
at the point z0 .
If a function is continuous at a point z0 , then this allows us to define a complex derivative:
✓ ◆
df f (z) f (z0 )
(z0 ) = lim . (12.5)
dz z!z0 z z0
This is in direct analogy to the derivates you have used for real functions, though normally we write
these in terms of an infinitesimal quantity x. We are being a little more cautious, specifying our
definition to a particular point in the complex plane, z0 .
Now, unlike real derivatives, the complex variable z has an argument associated to it, which the
above definition of the limit does not specify. The choice of this argument is equivalent to describing
the direction from which we approach the point z0 , and in principle, we might get different results
depending on the direction of approach.
Let us see this in action by considering the example,
f (z) = z = x iy. Here we can define the limit, Im[z]
✓ ◆
z z0 z = i✏
lim , (12.6)
z!z0 z z0
Now if we parameterise the complex variable z in some an- z0
z=✏
ular region around z0 , using z = z0 + ✏ei✓ , which defines a
circle of radius ✏ about z0 . The limit is then replaced with,
✓ ◆ z = ✏e i4⇡/3
✏e i✓ 2i✓
lim =e . (12.7)
✏!0 ✏ei✓
Re [z]
the value of ✓ then defines the point on the circle that we
approach z0 from, as depicted in Fig. 12.3.1. Crucially, for Figure 12.2.1: The direction of approach to
this function, the limit changes depending on the direction a point in the complex plan changes with the
choice of ✓.
of approach we take, and this behaviour is replicated at any
point in z0 . We therefore say that z is not analytic anywhere
in C.
In contrast, if a function is analytic at this point, the limit is identical no matter how we approach
z0 , and we can take the derivative in an analagous way to real valued functions.
As a final piece of jargon for this section:

Definition 12.1: Singularity

A point at which a function is not analytic is called a singularity.

There are various types of singularity that we will discuss in this course; we have already met one
family of them: Branch points are singularities since the associated function takes different values as
we approach it from different directions. Another family of singularities are those points at which a
function is ill-defined, for example f (z) = 1/z has a singularity at z = 0.

12.3 The Cauchy Riemann equations


In this section we will derive a general condition for determining whether a function is analytic, namely
the Cauchy Riemann equations. These equations will form the bedrock of our study of calculus in the
complex plane.
Chapter 12. Functions and differentiability in the complex plane 70

Im[z]

Path 2
z0
Path 1

Re [z]

Figure 12.3.1: A figure showing the approach to point z0 along two different paths.

Let us assume that the functions f (z) = u(x, y) + iv(x, y) is analytic in the complex plane. This
means that the derivative exists, such that,
✓ ◆
df (u(x, y) u(x0 , y0 )) + i(v(x, y) v(x0 , y0 ))
(z0 ) = lim (12.8)
dz z!z0 (x x0 ) + i(y y0 )

where we have used that z0 = x0 + iy0 .


Now since f is analytic, then the limit is independent of the approach taken to z0 . So if we
consider two possible paths to obtain z0 , in path 1, we hold y = y0 constant, and approach z0 from
the x-direction such that x = x0 + x. In path two, we hold do the opposite, and approach from the
y-direction holding x = x0 constant.
Considering path 1, we have:
✓ ◆
df (u(x0 + x, y0 ) u(x0 , y0 )) + i(v(x0 + x, y0 ) v(x0 , y0 ))
(z0 ) = lim ,
dz x!0 x
(12.9)
@u(x, y) @v(x, y)
= +i .
@x z=z0 @x z=z0

We can do the same along path 2, to obtain,


✓ ◆
df (u(x0 , y0 + y) u(x0 , y0 )) + i(v(x0 , y0 + y) v(x0 , y0 ))
(z0 ) = lim ,
dz y!0 i y
(12.10)
@u(x, y) @v(x, y)
= i + .
@y z=z0 @y z=z0

Using the path independence of analytic functions, we can equate the real and imaginary parts of both
of these expressions to obtain the Cauchy Riemann equations,

@u(x, y) @v(x, y) @v(x, y) @u(x, y)


= and = . (12.11)
@x @y @x @y

It is important to note that we have made no assumpions about our function f (z) other than that it is
analytic. This means that the Cauchy Riemann equations provide a necessary and sufficient condition
for a complex function to be differentiable. In other words, if a function is analytic then it must
satisfy the Cauchy Riemann equations. The if a function satisfies the Cauchy Riemann then it must
be analytic. Therefore, when we are asking whether a function is differentiable, we need only see if it
satisfies the Cauchy Riemann equations.
Chapter 12. Functions and differentiability in the complex plane 71

Theorem 12.1: Cauchy-Riemmann equations

A function f (z) = u(x, y) + iv(x, y) is said to be analytic at a point z0 if and only if it satisfies
the Cauchy Riemann equations at that point.

Proof. Follows as above.

12.3.1 Taking derivatives in the complex plane


Now, if our function is indeed analytic, then a consequence of the Cauchy Riemann equations is that,

df (z) @u(x, y) @v(x, y) @u(x, y) @v(x, y)


= +i = +i . (12.12)
dz @x @x @y @y
So we need only find the gradient with respect one of the variables x and y, and we obtain the full
derivative of f .
It is worth noting a few useful results. L’Hôpital’s rule is valid for finding limits: if f (z) and g(z)
are analytic functions vanishing at z0 , then

f (z) f 0 (z)
lim = lim 0 (12.13)
z!z0 g(z) z!z0 g (z)

The chain rule is also valid, such that if f (z) is analytic, and g(w) is analytic at w = f (z), then g(f (z))
is analytic at z. This yields,
dg dg df
= . (12.14)
dz df dz
Similarly, the product rule is also valid for f (z) and g(z) analytic, we have

df g dg df
=f + g. (12.15)
dz dz dz

12.4 Harmonic functions

Definition 12.2: Harmonic functions


A harmonic function of two real variables g(x, y) is one that satisfies Laplace’s equation,

@ 2 g(x, y) @ 2 g(x, y)
r2 g(x, y) = + = 0. (12.16)
@x2 @y 2

If f (z) = u(x, y) + iv(x, y) is analytic, then u(x, y) and v(x, y) are necessarily harmonic, that is

r2 u(x, y) = r2 v(x, y) = 0. (12.17)

This can be proven using the Cauchy Riemann equations, taking the partial derivative with respect to
x,
@ 2 u(x, y) @ 2 v(x, y)
= , (12.18)
@x2 @x@y
similarly we can take the derivative with respect to y to obtain,

@ 2 u(x, y) @ 2 v(x, y)
= . (12.19)
@y 2 @x@y
Chapter 12. Functions and differentiability in the complex plane 72

Equating these expressions, we obtain that,

@ 2 u(x, y) @ 2 u(x, y)
= , (12.20)
@x2 @y 2

which is clearly a solution to Laplace’s equation. A similar result can be obtain for v(x, y).
It is quite common to see the function u and v satisfying the above condition called conjugate
functions: If we are given one of these functions, we can work out the other (up to an additive factor).
To make this clear, let’s do an example:

Suppose f (z) is analyic, with u(x, y) = x3 + 3xy 2 . Determine and v(x, y).

Since f (z) is analytic, u must be harmonic, such that r2 u = 0. We therefore have,

@2u
= 6 x, (12.21)
@x2
and
@2u
= 6x, (12.22)
@y 2
equating the two, we have,
@2u @2u
+ 2 = 6 x + 6x = 0. (12.23)
@x2 @y
Therefore, we have = 1. From the Cauchy Riemann equations, we have,
@u @v
= = 3x2 + 3y 2 . (12.24)
@x @y

Integrating, with respect to y, we find v(x, y) = 3yx2 + y 3 + g(x). Taking the derivative with respect
to x and using the Cauchy Riemann equation,
@v @u
= 6xy + g 0 (x) = = 6xy. (12.25)
@x @y

Therefore, we find that g 0 (x) = 0 and g(x) = A is a constant. Putting this together, we obtain,

f (z) = x3 + 3xy 2 + i( 3xy + y 3 + A)


(12.26)
= z 3 + iA.

12.5 Further Reading


For further discussion on the topic of differentiability and Cauchy Riemann equations see

• Riley et al, Pg 825-830.


Chapter 6

Conformal mappings

In this chapter we will build on the ideas of functions in the complex plane, where we will now consider
how functions change entire regions of the complex plane rather than single points. In this way, we
consider functions as mapping. We will use these methods alongside the ideas of analytic functions
to drastically simplify problems that are often encountered in physics. This will be done through the
use of conformal mappings, which are mappings that preserve the angle between lines post-mapping.
Before we dive into conformal mappings though, we will review functions, and considered how they
may be thought of in the context of mappings.

6.1 Functions as mappings


When considered over regions of the complex plane, we can view functions as mappings in which
the complex plane is transformed into a different topology. Consider the example of f (z) = z 2 =
x2 y 2 + 2ixy. If we first look at all the straight lines in the z-plane where y is free to vary but x = c,
for constant c:

Figure 6.1.1: The function z 2 as a mapping. Lines of constant x in the z-variable (blue) are mapped to the blue
parabolas in the w-plane. Similarly lines of constant y (red curves) are mapped to parabolas. Notice that the
angle between the red and blue is preserved by the mapping.

Consider the lines with constant x = c, where c is a real constant, These curves are illustrated by the
blue lines in Fig. 6.1.1. After the mapping, the lines of constant real part become w = f (c + iy) =
c2 y 2 + 2icy, which are naturally parabolic, as can be seen by the blue curves in the right hand plot of
Fig. 6.1.1. Similarly, we can consider the lines of constant imaginary part of z, i.e. y = c, given by the
red lines in Fig. 6.1.1. In this case, these lines map to w = f (x + ic) = x2 c2 + 2ixc, which are also

44
Chapter 6. Conformal mappings 45

parabolas, as shown in Fig. 6.1.1 by the red curves. The savvy reader might notice something curious
about the above plots: though the red and blue curves are parabolic in nature, they are orthogonal to
one another (with the exception of the point at w = 0)– the mapping has preserved the angle to one
another. This is a general feature of analytic functions which we shall prove later.

6.1.1 The exponential function


Another important function to consider is the exponential functions, w = f (z)ez = ex eiy . Notice that
|w| = ex , and arg(w) = y. So, the exponential function maps a line of thickness 2⇡ in the z-plane to
a circle of radius ex in the w-plane. Similarly if we take a strip of thickness 2⇡, this will map to the

Im[z]
Im[w]

Re [z] Re [w]

Figure 6.1.2: The exponential function as a mapping. A line of length 2⇡ maps to a circle in the complex plane.
A strip of thickness 2⇡, as given by the shaded region in the z-plane, maps to the entire w-plane.

entire w-plane. You can see this from |w| = ex , as we move along the strip the radius of the circle gets
larger and larger, encompassing the entire w-plane.

6.2 Analytic functions and mappings


In the example of the function z 2 we saw that, with the exception of the point at w = 0, the lines of
x = const. and y = const. where perpendicular both before and after the mapping. This actually a
general feature of analytic functions. To show this, let us that the function f (z) = u(x, y) + iv(x, y) is
analytic. A direct consequence of the anlytic nature of f is that u and v are harmonic, that is, they
satisfy Laplace’s equation, r2 u = r2 v = 0, where
@ @
r = ex + ey ,
@x @y
is the standard definition for the gradient differential operator.
If the curves u and v are everywhere orthogonal, then their gradient lines, which are tangent to
the functions u and v, must also be orthogonal. So if we can show that ru · rv = 0, then this will be
sufficient to showing that the contours of u and v are orthogonal.
So considering,
@u(x, y) @u(x, y)
ru(x, y) = ex + ey ,
@x @y
and
@v(x, y) @v(x, y) @u(x, y) @u(x, y)
rv(x, y) = ex + ey = ex + ey ,
@x @y @y @x
where the equality follows from the Cauchy Riemann equations. Taking the scalar product between
these vectors, we have,
@u(x, y) @u(x, y) @u(x, y) @u(x, y)
ru(x, y) · rv(x, y) = + = 0. (6.2.0.1)
@x @y @y @x
therefore, the contours of u and v are everywhere orthogonal. Note that this does not hold at points
where f is not analytic.
Chapter 6. Conformal mappings 46

6.3 Conformal mappings in physics


Previously we have seen that if f (z) = u(x, y) + iv(x, y) is analytic, the contours of u and v are
perpendicular. More generally, if any two lines intersect at some point z0 in the z plane, then these
can be mapped to some w = f (z) plane where they will intersect at w0 = f (z0 ) by the same angle.
Mappings like this, where the angle is conserved, are known as conformal mappings1 .

Proof. Let f (z) be analytic at z = z0 and let w0 = f (z0 ). Consider

w = f (z0 + z), (6.3.0.1)

where z is a small increment. Then,

w=w w0 ,
= f (z0 + z) f (z0 ),
(6.3.0.2)
df 0
= z = zf (z0 ).
dz z0

As f 0 (z0 ) is complex, its phase is the angle through which the mapping rotates the tangent to the
curve at this point. Any other curve passing through the same point, will also be rotated by the same
amount. So the angle between them is conserved.

Lets make this statement more explicit by supposing z = ✏ei✓ where ✏ ⌧ 1 and writing f 0 (z0 ) =
M ei↵ . Then,
⇣ ⌘
w = ✏ei✓ M ei↵ ,
(6.3.0.3)
= (✏M ) ei(✓+↵) .

So locally f scales z by some factor M and rotates by a fixed angle ↵. Note: we require f 0 (z0 ) 6= 0,
otherwise the angles are clearly not preserved. If a shape exists over a region where f 0 (z) is roughly
constant in the z plane, then it is mapped (rotated and magnified) in the w plane.
Conformal mappings are useful for physicists because u and v are solutions to the Laplace equation;
~ 2 = 0. We use a conformal mapping to transform a problem in a given geometry into a solvable
r
problem. Then, with the inverse mapping, obtain the solution. If we can find a solution that satisfies
the boundary conditions, then it is the only solution as a consequence of the uniqueness theorem.
In this course we restrict the discussion to two dimensions, however this still includes many useful
problems in 3D which are symmetric around an axis. Note: exam questions will give guidance on what
mapping should be used (see the example sheets).
The general strategy here is:
1. Define the problem in the xy plane.

2. Find a mapping that takes curves of a constant quantity (e.g. electric potential) to a new, simpler
problem in the plane Z = X(x, y) + iY (x, y).

3. Solve in XY plane. Then,

u(x, y) + iv(x, y) = (X(x, y), Y (x, y)) + i (X(x, y), Y (x, y)) , (6.3.0.4)

where (X, Y ) is our solution in the XY plane and (X, Y ) is chosen to make u + iv analytic.
Or, more simply, and must satisfy the Cauchy-Riemann equations:
@ @ @ @
= , = . (6.3.0.5)
@X @Y @Y @X
1
Conformal means same shape.
Chapter 6. Conformal mappings 47

L u = V0

x
0 u=0

Figure 6.3.1: Two infinite conducting plates (black) held at different potentials. The arising electric field is
drawn in blue and the equipotentials are represented in red.

6.3.1 Electrostatics
As a reminder,in electrostatics we have

~ =
E ~
ru,
~ = ⇢,
~ ·E (6.3.1.1)
r
✏0
where u is the potential. In a vacuum (⇢ = 0), these yield the Laplace equation
~ 2 u = 0.
r (6.3.1.2)

Solutions of eq. 6.3.1.2 differ according to the boundary conditions they satisfy. For instance, we can
have the potential is constant on a conductor, or the field is constant at large distances.

Pair of parallel infinite conducting plates


Lets start with a simple problem where the mapping is trivial just to illustrate the method. We will
consider the potential and field lines arising from a pair of parallel conducting plates, as depicted in
figure 6.3.1. Here, we have
V0
u(x, y) = y, (6.3.1.3)
L
though generally in these types of problems we will not be able to immediately identify this. We define
the complex potential to be
W = u + iv, (6.3.1.4)
with v chosen to make u + iv analytic. Here,
V0 V0 V0
W = i z= y i x. (6.3.1.5)
L L L
So we are left with
V0
u(x, y) = y,
L (6.3.1.6)
V0
v(x, y) = x.
L
Lines of constant u are the equipotentials, which are parallel to the plates in this case. Lines of
constant v are the field lines, which are vertical in this example. These are depicted as red and blue
lines respectively in figure 6.3.1.
Chapter 6. Conformal mappings 48

u = V0 u=0 x
F B= 1 E C A=1 D

Figure 6.3.2: Two semi-infinite conducting plates (black) held at different potentials separated by an insulator
at the origin. The labelled points assist understanding the mapping.

Y =⇡
E0 B0 F0

Y =0 X
C0 A0 D0

Figure 6.3.3: Mapping of two conducting plates to Z = Ln z where the solution is trivial. Equipotentials are
drawn in red and field lines are drawn in blue. Labelled points are described in the text.

Conducting plates in the same plane separated by an insulator


Lets now consider an example where applying a suitable conformal map greatly assists solving the
problem. Suppose we have two conducting plates on the x axis, either side of an insulator located
at the origin (see fig 6.3.2). Here, we wish to transform the geometry of the problem to that of two
parallel plates, as we have seen that solution is trivial. Lets use the mapping:
Z = Ln z = ln |z| + i arg z. (6.3.1.7)
This transforms the positive x-axis to the line Y = Im Z = 0 and the negative x axis to the line
Y = Im Z = ⇡. I.e., this maps the points drawn in the figure
(x, y) ! (X, Y ),
(0, 0) ! ( 1, 0),
(6.3.1.8)
A(1, 0) ! A0 (0, 0),
B( 1, 0) ! B 0 (0, ⇡).
This mapping is drawn in figure 6.3.3. In Z, the solution is trivial:
V0 V0
= Y, = X, (6.3.1.9)
⇡ ⇡
i.e.,
iV0
+i = Z,

iV0
= Ln z,
⇡ (6.3.1.10)
iV0 ⇣ p 2 y⌘
= ln x + y 2 + i arctan ,
⇡ x
V0 y iV0
= arctan ln x2 + y 2 .
⇡ x 2⇡
Chapter 6. Conformal mappings 49

y
V0
u= 2
3V0 V0
u= 4 u= 4

x
u = V0 u=0

Figure 6.3.4: Two semi-infinite conducting plates (black) held at different potentials separated by an insulator
at the origin. Field lines are drawn in blue and equipotentials represented in red.

Y = ⇡/2 u = T0

u = T0
R
)
u=0 X
x Y =0
u=0

Figure 6.3.5: Left: two perpendicular heat conductors held at different temperatures with an insulator of radius
R closing the arc. Right: Z = Ln z map that transforms the conductors such that they are parallel.

We can now identify u(x, y) + iv(x, y) to be


V0 y
u(x, y) = arctan ,
⇡ x (6.3.1.11)
V0
v(x, y) = ln x + y 2 .
2
2⇡
So our field lines have constant radius and the equipotentials have constant angle. This is drawn in
figure 6.3.4.

6.3.2 Heat flow


We can also find solutions to heat flow problems using conformal mapping. Here, the temperature, u,
satisfies
@u ~ 2 u,
= Dr (6.3.2.1)
@t
where D is the thermal diffusivity. Steady-state solutions to this equation have @u ~2
@t = 0, hence r u = 0.
Typical boundary conditions in heat flow are constant temperature on thermal conductors and no heat
flow across insulators.
Here we will consider an example of two conductors perpendicular to one another held at different
temperatures with an insulator across a 90 arc of radius R. To solve this problem, we again use the
mapping Z = Ln z to rotate the conductors such that they are parallel. These configurations are shown
in figure 6.3.5. With this mapping we see that the lines of heat flow are lines of constant X = ln |z|,
which correspond to circles in the z plane. The solution for the complex potential is
2
(X, Y ) + i (X, Y ) = i T0 Z. (6.3.2.2)

Chapter 6. Conformal mappings 50

This is analogous to the previous example where field lines are replaced by heat flow and equipotentials
by isotherms.

6.3.3 Fluid flow


The last application we will consider here is that of fluid flow. Again we find the Laplace equation
arises for the case of an incompressible fluid. The details of how we derive this are not essential for
this course2 , however we present them for completeness.
An incompressible fluid of density ⇢(t, x, y) and velocity ~v = (vx (x, y), vy (x, y) must satisfy the
continuity equation: Z Z
~ @
⇢~v · dS = ⇢dV. (6.3.3.1)
S @t V
Applying the divergence theorem to the left hand side, we have

~ · (⇢~v ) = @⇢
r . (6.3.3.2)
@t
Rearranging and expanding we have,
@⇢ ⇣ ~ ⌘ ⇣ ⌘
~ · ~v = 0.
+ r⇢ · ~v + ⇢ r (6.3.3.3)
@t
As the fluid is incompressible, its density cannot change over time; i.e.:
d⇢
= 0. (6.3.3.4)
dt
Expanding the total derivative we find
d⇢ @⇢ @⇢ dx @⇢ dy
= + + . (6.3.3.5)
dt @t @x dt @y dt

If we assume the volume of interest moves at the same rate as the fluid3 , then
d⇢ @⇢ ~
= + r · ~v = 0. (6.3.3.6)
dt @t
From the continuity equation (eq. 6.3.3.3), we are left with
~ · ~v = 0.
r (6.3.3.7)

Another assumption we make is that the flow is irrotational,


~ ⇥ ~v = 0.
r (6.3.3.8)
~ for some scalar field u, and so
This implies ~v = ru
~ 2 u = 0.
r (6.3.3.9)

This idealised fluid, which ignores friction/viscosity, is called potential flow. Lines tangential to the
flow, i.e. there is no flow across them, are known as streamlines. These lines represent the trajectory a
particle would follow in the flow. The lines perpendicular to the streamlines are known as the velocity
potential, this is the scalar field u.
2
You can study this in great detail in courses on fluid dynamics.
3 dx dy
,
dt dt
= ~v .
Chapter 6. Conformal mappings 51

y Y

B0
x
) X
E C A F E0 C0 D0 A0 F 0
D

Figure 6.3.6: Left: a cylinder centred at the origin of the xy plane with various positions indicated. Right: the
Z = z + z1 map to the XY plane with the transformed points shown. The cylinder is ‘squashed’ onto the X axis.

Flow around a cylinder


Far from the cylinder we expect the flow to be parallel to the x axis. We also expect there to be no
flow across the surface of the cylinder, so we require both this surface and the x axis to be streamlines.
This problem is solved with the mapping
1
Z =z+ . (6.3.3.10)
z
This is illustrated in figure 6.3.6 along with the proposed mapping. Again, we have labelled a number
of points to assist interpreting the transformation. If the cylinder has unit radius, then point A lies
at (1, 0) and is mapped to A0 at (2, 0). Whilst at point B, (0, 1) ! (0, 0) in Z. The other points are
mapped:

(x, y) ! (X, Y ),
C( 1, 0) ! C 0 ( 2, 0),
D(0, 1) ! D0 (0, 0), (6.3.3.11)
0
E( 2, 0) ! E ( 2.5, 0),
F (2, 0) ! F 0 (2.5, 0).

Writing ✓ ◆ ✓ ◆
1 1
X + iY = r+ cos ✓ + i r sin ✓, (6.3.3.12)
r r
means that Y = 0 if r = 1 or if ✓ = 0 or ✓ = ⇡. So, lines parallel to the X axis are streamlines in this
mapping, ✓ ◆
1
= V0 Y = V0 r sin ✓, (6.3.3.13)
r
where V0 is the magnitude of the flow and r 1. The velocity potentials are parallel to the Y axis
and given by ✓ ◆
1
= V0 Y = V0 r+ cos ✓. (6.3.3.14)
r
~ field if
This solution is drawn in figure 6.3.7. This problem is identical to finding the resultant E
a conducting cylinder is placed in a uniform field, here the fluid streamlines represent the electric
equipotentials and the fluid velocity potentials represent the electric field.
Chapter 6. Conformal mappings 52

1 y

0.8

0.6

0.4

0.2

x
0.2 0.4 0.6 0.8 1

Figure 6.3.7: Streamlines (red) and velocity potentials (blue) for fluid flowing around a cylinder.

y Y
D F E D0 E0 ⇡ F0

1
x
) -1 1
X


A CB A0 B0 ⇡ C0

Figure 6.3.8: Left: a channel of width 2⇡ with various points defined. Right: Z +eZ = z mapping that transforms
the channel such that it ‘folds back’ on itself.

Flow out of a channel


Inside a horizontal channel the streamlines will be horizontal and velocity potentials will be vertical.
As the fluid exits the channel, it will disperse around the opening. This problem is analogous to finding
the electric field lines around the edge of a parallel plate capacitor. This problem can be solved with
the mapping z = Z + eZ for X 2 ( 1, 1) and Y 2 [ ⇡, ⇡]. An illustration of the channel and the
mapping are shown in figure 6.3.8. In this mapping, we have

z = x + iy = X + iY + eX (cos Y + i sin Y ), (6.3.3.15)

therefore

x = X + eX cos Y,
(6.3.3.16)
y = Y + eX sin Y.
Chapter 6. Conformal mappings 53

Figure 6.3.9: Streamlines (red) and velocity potentials (blue) for fluid flowing out of a channel.

This maps the the points displayed:

(x, y) ! (X, Y ),
A( 1 e, ⇡) ! A0 ( 1, ⇡),
B( 1, ⇡) ! B 0 (0, ⇡),
C(1 e, ⇡) ! C 0 (1, ⇡), (6.3.3.17)
0
D( 1 e, ⇡) ! D ( 1, ⇡),
E( 1, ⇡) ! E 0 (0, ⇡),
F (1 e, ⇡) ! F 0 (1, ⇡).

So the mapping breaks the plot at B 0 and E 0 and folds back. This has the desired effect of maintaining
parallel streamlines within the channel which will diverge on exit. This inverse mapping takes the
region ⇡  Y  ⇡ to the whole of the z-plane.
To proceed, we could solve for + i , but it is considerably easier to parametrically plot the
steamlines and velocity potentials in the xy plane by setting one of X and Y constant in each case,
for each line. This is shown in figure 6.3.9. Note that the velocity potential changes discontinuously
on the channel. These are branch cuts of the function Z = f (z).
Chapter 7

Complex (contour) integration

We have previously seen how to differentiate com-


plex functions. We now turn our attention to in- Im(z)
tegration involving complex variables. Though
this appears abstract at first, there are many
areas of mathematics and physics where these b (= zn )
ξn
calculations occur. Again, we distinguish be-
ξn−1
tween the real and imaginary parts of the function zn−1
and study how this behaves on a path or con- zn−2
ξ2
tour. We can divide some curve between a(= z0 ) z2
ξ1
and b(= zn ) with n points of subdivision zi and z1
in each interval we pick a point ξi such1 that
zi−1 < ξi < zi ; this is drawn in figure 7.0.1. Let a (= z0 )
n
X Re(z)
Sn = f (ξk )(zk − zk−1 ),
k=1 Figure 7.0.1: A path from a to b divided into n segments
n (7.1) with points ξ in each segment.
X
= f (ξk )∆zk .
k=1
We define the integral of f (z) along a path C as:
Z
f (z)dz = lim Sn , (7.2)
C n→∞

where limit is taken such that all ∆zk = 0.


For a real function of a real variable x, we know that the limit as n → ∞ gives us the area under
the curve f (x) between a and b. The case for a complex function is the natural extension of this,
though the output is no longer an area. We can write eq. 7.1 as
X n
Sn = (uk + ivk ) (∆xk + i∆yk ) ,
k=1
n (7.3)
X
= (uk ∆xk − vk ∆yk ) + i (vk ∆xk + uk ∆yk ) .
k=1
So, in the limit of n → ∞,
Z Z
f (z) = lim Sn = u(x, y)dx − v(x, y)dy
n→∞
C C Z (7.4)
+i v(x, y)dx + u(x, y)dy,
C
1
ξ is the greek letter ‘xi’, pronounced ‘ksi’. This symbol is often used to represent a dummy complex variable. To
write it, try writing an ‘e’ with a ‘5’ underneath.

54
Chapter 7. Complex (contour) integration 55

b
Im(z)

s
C1 C2

a Re(z)

Figure 7.0.2: A curve C from a to b that can be split into two small paths C1 and C2 at some intermediate
point s.

where these are now line integrals defined for functions defined over the real plane. dx and dy are not
independent, but related by the path C. If we define d⃗r = (dx, dy), then
Z Z Z
f (z)dz = (u, −v) · d⃗r + i (v, u) · d⃗r. (7.5)
C C C

A number of conventions we use in real integrals also hold for complex integrals defined on a path.
For instance, if a path C runs from a to b, the reversed path will run from b to a; ∆zk → −∆zk .
Therefore,
Z b Z a
f (z)dz = − f (z)dz, (7.6)
a b
or Z Z
f (z)dz = − f (z)dz. (7.7)
C −C
If s is a point on C between a and b, then we are also free split the integral:
Z b Z s Z b
f (z)dz = f (z)dz + f (z)dz. (7.8)
a a s

Or, equivalently Z Z Z
f (z)dz = f (z)dz + f (z)dz. (7.9)
C1 C2
C=C1 +C2

Such a path where these relationships will hold is drawn in figure 7.0.2. We can also define an integral
as the difference of two other integrals:
Z Z Z
f (z)dz = f (z)dz − f (z)dz. (7.10)
C C2
C1 =C−C2

If the path C is closed (a = b), the integral is written as ‘ C ’. If the curve is non-selfintersecting,
H

this is known as a Jordan curve. The convention is that these are traversed anticlockwise2 .

7.1 The estimation lemma

If M is the absolute maximum value of |f (z)| along C (i.e. |f (z)| ≤ M ) then


Z
f (z) dz ≤ M L, (7.11)
C

2
unless specified otherwise
Chapter 7. Complex (contour) integration 56

x
1

Figure 7.2.1: The curve y = x2 from 0 to 1 + i.

where L is the length of the path.


This result is intuitive: if an integral has some maximum value M along a contour, the value of this
summed over the entire path (M L), must be greater than or equal to the integral along this path.

Proof. We start by looking again at Sn = nk=1 f (ξk )∆zk . Taking the absolute value,
P

n
X n
X
|Sn | ≤ |f (ξk )| |∆zk | ≤ M |∆zk | , (7.12)
k=1 k=1

where weR have made us


Pnof the triangle inequality (see section REF): |z1 + z2 | ≤ |z1 | + |z2 |. As n → ∞,
|Sn | → C f (z) and k=1 |∆zk | → L. Therefore,
Z
f (z)dz ≤ M L (7.13)
C

as required.

7.2 Examples
R i+1
7.2.1 0
z 2 dz
To start, we will integrate along y = x2 as drawn in fig. 7.2.1. On this path we have dy = 2x dx. The
function f (z) = z 2 = x2 − y 2 + 2ixy. On the path y = x2 , this reduces to
f (z) = x2 − x4 + 2ix3 = u + iv. (7.14)
The integral becomes
Z Z Z
f (z)dz = udx − vdy + i udy + vdx,
C C C
Z 1 Z 1 Z 1 Z 1 (7.15)
2 4 3 2 4 3

= x − x dx − 2x dy + i (x − x )dy + i 2x dx.
0 0 0 0

Changing integration variables dy to 2x dx, we find


Z Z 1 Z 1
2 5
f (z) = (x − x )dx + i (4x3 − 2x)dx,
C 0
  0 (7.16)
1 2 2
= −1+i 1− = (−1 + i).
3 6 3
Chapter 7. Complex (contour) integration 57

C2′

C1′
x
1

Figure 7.2.2: C1′ from 0 to 1 and C2′ from 1 to 1+i.

We could also consider a different path C ′ = C1′ + C2′ where C1′ is y = 0 for x ∈ [0, 1] and C2′ is
x = 1 for y ∈ [0, 1], as shown in figure 7.2.2. On C1′ we have

y = 0, dy = 0, u = x2 , and v = 0. (7.17)

The integral along C1′ is


Z Z 1
1
f (z)dz = udx = . (7.18)
C1′ 0 3
On C2′ we have
x = 1, dx = 0, u = 1 − y2, and v = 2y. (7.19)
Therefore, the integral along C2′ is
Z Z 1
f (z)dz = (−v + iu)dy,
C2 0
Z 1 Z 1
=− 2ydy + i (1 − y 2 )dy, (7.20)
0 0
 
1
= −1 + i 1 − .
3

Overall, we are left with


Z Z Z
1 2
f (z)dz = f (z)dz + dz = −1+i ,
C′ C1′ C2′ 3 3
(7.21)
2
= (−1 + i).
3
This path yields the same result as the curve y = x2 . We will see in the next section why it turned
out this way.

1
H
C z dz along |z| = 1
On this path we choose to write x and y in terms of polar coordinates such that

x = cos θ, dx = − sin θ dθ,


(7.22)
y = sin θ, dy = cos θ dθ.
Chapter 7. Complex (contour) integration 58

So we just integrate θ from 0 to 2π. On this path, our function becomes


1 z∗ x − iy
= ∗ = 2 = cos θ − i sin θ. (7.23)
z zz x + y2
The integral is therefore
I I I
1
dz = udx − vdy + i udy + vdx,
C z
Z 2π Z 2π
= cos θ(− sin θ)dθ − (− sin θ) cos θdθ + i cos2 θ + sin2 θdθ, (7.24)
0 0
Z 2π
=i dθ = 2πi.
0

An easier approach is to use:


z = eiθ , dz = ieiθ dθ, (7.25)
so the integral becomes trivial:
ieiθ
I I
1
dz = dθ = 2πi. (7.26)
C z C eiθ
This result is important. We will return to it numerous times throughout the course as it underpins
many future findings.
Chapter 8

Cauchy’s theorem

If f (z) is analytic within and on a closed contour with surface S (see fig. 8.0.1), then
I
f (z)dz = 0. (8.1)
C

Proof. To prove this relation, we will start by considering purely real variables and then extend to
complex. Recall Stoke’s theorem:
I Z  

A · d⃗r = ⃗ ×A
∇ ⃗ · dS.
⃗ (8.2)
C S

For a path in the xy plane, we have


⃗ = (0, 0, dx dy).
d⃗r = (dx, dy, 0) and dS (8.3)

Therefore the left-hand side of Stoke’s theorem is


I I

A · d⃗r = Ax dx + Ay dy. (8.4)
C C

Whilst the right-hand side of eq. 8.2 becomes


ZZ  
∂Ay ∂Ax
Z  
⃗ ⃗ ⃗
∇ × A · dS = − dx dy. (8.5)
S S ∂x ∂y

Im(z)

Re(z)

Figure 8.0.1: Some closed contour C bounding a surface S.

59
Chapter 8. Cauchy’s theorem 60

Im(z)

C1

C2

Re(z)

Figure 8.0.2: Two contours C1 and C2 with the same initial and final points.

Recall, for a complex integral, we have


I I I
f (z)dz = u dx − v dy + i v dx + u dy,
C
ZCZ  C ZZ  
∂v ∂u ∂u ∂v
= − − dx dy + i − dx dy, (8.6)
S ∂x ∂y S ∂x ∂y
| {z } | {z }
=0 =0
= 0,

where we have used the Cauchy-Riemann equations (equations REF). Cauchy’s theorem therefore only
holds provided f (z) is analytic.

Exercise:
Why does C z1 dz ̸= 0 as we saw in subsubsection 7.2.1?
H

Cauchy’s theorem may appear innocuous, but it allows us to state there is path dependence when
integrating between two points, provided the function is analytic. Suppose we have two open paths C1
and C2 with the same start and end points (see fig. 8.0.2), such that we could define a closed contour
C = C2 − C1 . If f (z) is analytic in the region contained by C, then
I Z Z
f (z)dz = 0 = f (z)dz − f (z)dz,
C C2 C1
Z Z (8.7)
∴ f (z)dz = f (z)dz.
C2 C1

Cauchy’s theorem also allows us to define the integral; i.e., allowing the endpoint variable z to vary.
Let Z z
F (z) = f (ξ)dξ, (8.8)
a
Chapter 8. Cauchy’s theorem 61

Im(z)
z2
C2
a

C1
z1
Re(z)

Figure 8.0.3: The integral of along some paths C1 and C2 only depends on the endpoints z1 and z2 .

for paths in the region of analyticity. The derivative of F (z) is

dF (z) F (z + ∆z) − F (∆z)


= lim ,
dz ∆z→0 ∆z
Z z+∆z Z z 
1
= lim f (ξ) dξ − f (ξ) dξ ,
∆z→0 ∆z a a
Z z+∆z
1
= lim f (ξ) dξ,
∆z→0 ∆z z
Z z+∆z
1 (8.9)
= lim f (z) dξ,
∆z→0 ∆z z
Z z+∆z
1
= lim f (ξ) dξ,
∆z→0 ∆z z
Z z+∆z
1
= lim f (z) dξ,
∆z→0 ∆z z
= f (z).

In the above we have made use of the fact that f (z) is continuous, so
Z z+∆z Z z+∆z
f (ξ) dξ = f (z) dξ as ∆z → 0, (8.10)
z z
R z+∆z
and the path
R z independence of contour integrals, so z dξ = ∆z for any line. So, for analytic
functions, a f (ξ) dξ is the anti-derivative of f (z).
Consider the integral of f (ξ) along some path C from z1 to z2 as displayed in figure 8.0.3. We
could equally traverse paths C1 and C2 that pass through some point a. The integral along C therefore
becomes
Z z2 Z Z
f (ξ) dξ = f (ξ) dξ + f (ξ) dξ,
z1 C1 C2
(8.11)
= −F (z1 ) + F (z2 ),
= F (z2 ) − F (z1 ),

demonstrating explicit dependence only on the end points. If we change the starting point a, F (z)
only varies by an additive constant. This allows us to define indefinite complex integrals:
Z z
f (ξ) dξ, (8.12)
Chapter 8. Cauchy’s theorem 62

Im(z)

C′

Re(z)

Figure 8.0.4: Contours C and C ′ enclosing some non-analytic region depicted by the shaded area.

which are not unique, up to an additive constant. This allows us to immediately write down indefinite
integrals for a variety of common functions including:
Z Z
n 1 n+1
z dz = z + c, cos(z) dz = sin(z) + c,
n+1
Z Z (8.13)
1 z z
dz = ln z + c (for z ̸= 0), e dz = e + c, etc.;
z
where c ∈ C. Revisiting the example we saw in subsubsection 7.2.1,
Z 1+i
1  3 1+i 1
z 2 dz = z 0 = (1 + i)3 ,
0 3 3 (8.14)
2
= (−1 + i).
3
So we formally see the path independence of the integral z 2 dz.
R

Cauchy’s theorem can also be applied to scenarios where the function is not analytic at isolated
points, otherwise known as a meromorphic function. For instance we could consider f (z) = z1 which is
not defined at z = 0 (subsubsection 7.2.1). These points are called singularities.

In these cases, if two distinct contours contain the same region of non-analiticity, as shown in
figure 8.0.4; then I I
f (z) dz = f (z) dz ̸= 0. (8.15)
C C′

Proof. Consider
H a composite contour as shown in figure 8.0.5. From Cauchy’s theorem, this contour
must satisfy f (z) dz = 0. We can write this contour integral in terms of the four paths displayed:
Z B Z E Z D Z A
f (z) dz + f (z) dz + f (z) dz + f (z) dz = 0. (8.16)
A B E D
|{z}
R
|{z}
R
C C′

However, f (z) is continuous, therefore in the limit that A → B and D → E,


Z B Z D
lim f (z) dz + f (z) dz = 0; (8.17)
A→B A E
D→E

the two straight lines cancel. In this limit the integrals along paths C and C ′ become closed:
Z E I Z A I
f (z) dz → f (z) dz and f (z) dz → − f (z) dz, (8.18)
B C D C′
Chapter 8. Cauchy’s theorem 63

Im(z)

A B
D E

C′

Re(z)

Figure 8.0.5: Path deformation of contours C and C ′ such that the enclosed area no longer contains the non-
analytic region. Labelled points described in main body.

Im(z) Im(z)

C2
=
C1

C C3
Re(z) Re(z)

Figure 8.0.6: The integral around a closed path that encompasses singularities is equivalent to the sum of smaller
contours that circle each singularity individually.

where we introduce a minus sign for contour C ′ as the path is clockwise. Therefore equation 8.16
becomes
I I
f (z) dz − f (z) dz = 0,
C′ I
C I
(8.19)
f (z) dz = f (z) dz.
C C′

This technique of deforming contours can be extended to paths that enclose multiple singularities;
such a case is shown in figure 8.0.6.
What is the result of an integral around a contour that encloses one singularity? We have already
found this in the example seen in subsubsection 7.2.1:
I
1
dz = 2πi. (8.20)
z
|z|=1

By Cauchy’s theorem this is true for any contour which encloses the point z = 0. Similarly, for
(
0, for a outside of C,
I
1
= (8.21)
C z−a 2πi, for a inside of C.
Chapter 8. Cauchy’s theorem 64

Im(z)

Re(z)
2 3

C1
C2

−1
Figure 8.0.7: Two contours that surround a different number of the poles from f (z) = [(z − 2)(z − 3)] .

The result in equation 8.21 is remarkably simple, yet is vital for the rest of the course.

8.0.1 Example
Consider the integral
dz
I
, (8.22)
C (z − 2)(z − 3))
which clearly has singularities at z = 2 and z = 3. We will first consider a contour C1 defined as
|z| = 52 , as depicted in figure 8.0.7. Contour C1 only surrounds the pole at z = 2, therefore we find the
integral becomes, after separating the denominator,
−1
I
1
+ dz = −2πi + 0. (8.23)
C1 z − 2 z−3

As the second pole is outside of this contour, it contributes 0 to the integral. If instead we were to
choose the contour C2 defined as z = |4|, then both poles would be inside it and we would have
I
1
dz = −2πi + 2πi = 0. (8.24)
C2 (z − 2)(z − 3)
Chapter 8. Cauchy’s theorem 65

Im(z)

a
C

Re(z)

Figure 8.1.1: A circular contour C centered on z = a being shrunk to a radius ϵ.

8.1 Cauchy’s integral formulae

If f (z) is analytic in and on a contour C which encloses the point z = a, thena

f (z)
I
1
f (a) = dz. (8.25)
2πi C (z − a)
a
Often this is used the other way round.

Proof. To prove Cauchy’s integral formula, we use Cauchy’s theorem to shrink the contour to a small
circle around z = a as shown in figure 8.1.1.
Consider C to be a circle of radius ϵ centred on a, so z = a + ϵeiθ and dz = iϵeiθ dθ. Therefore,

f a + ϵeiθ
Z 2π 
f (z)
I
dz = iϵeiθ dθ,
C z−a ϵeiθ
0
Z 2π   (8.26)

=i f a + ϵe dθ,
0

which is valid for any ϵ > 0. Now consider what happens as we shrink ϵ:
Z 2π 
f (z)
I 
dz = i lim f a + ϵeiθ dθ,
C z−a ϵ→0 0
Z 2π
=i f (a) dθ = 2πif (a), (8.27)
0
f (z)
I
1
∴ f (a) = dz
2πi C (z − a)
as required.

We also have the case for the derivative of f (z):


f (z)
I
′ 1
f (a) = dz. (8.28)
2πi C (z − a)2
Proof. The derivative is defined as
df (a) f (a + δa) − f (a)
= f ′ (a) = lim ,
da δa→0
δa  (8.29)
f (z) f (z)
I I
1
= lim dz − dz ,
δa→0 2πi δa C z − a − δa C z−a
Chapter 8. Cauchy’s theorem 66

where we have made use of the Cauchy integral formula (eq. 8.25). Combining the two integrals we
find,

f (z)
I
1
f ′ (a) = lim dz,
δa→0 2πi C (z − a)(z − a − δa)
(8.30)
f (z)
I
1
= dz
2πi C (z − a)2

as desired.

We can generalize the above two results, extending to any derivative of order n, using

dn f (a) n! f (z)
I
= f (n)
(a) = dz. (8.31)
dan 2πi C (z − a)n+1

As 0! = 1, this also includes the original Cauchy integral formula.

8.1.1 Examples
cos z
H
C z dz
Consider the integral
cos z
I
I= dz, (8.32)
C z
for some contour C that encloses z = 0. Applying Cauchy’s integral formula (eq 8.25) with a = 0,
such that z−a
1
= z1 and f (z) = cos z, we have

I = 2πi f (a = 0),
(8.33)
= 2πi cos 0 = 2πi.

So Cauchy’s results mean we only need to determine if the contour encloses a singularity. Then,
determine the order of the pole.

dz
H
C (z−2)(z−3) (again)
Here, we revisit the integral
dz
I
I= (8.34)
C (z − 2)(z − 3)
for some contour C defined as |z| = 4. Now, we make use of Cauchy’s integral formula to split C into
two smaller contours that just enclose each pole as drawn in figure 8.1.2. Now, I becomes
dz dz
I I
I= + ,
C1 (z − 2)(z − 3) C2 (z − 2)(z − 3)
| {z } | {z }
1 1
f (z)= z−3 f (z)= z−2
(8.35)
    
1 1
= 2πi + ,
z − 3 z=2 z−2 z=3
 
1 1
= 2πi + = 0.
2−3 3−2
Chapter 8. Cauchy’s theorem 67

Im(z)

C1 C2
Re(z)
2 3

−1
Figure 8.1.2: A contour C that encloses the poles from f (z) = [(z − 2)(z − 3)] split into two smaller contours
C1 and C2 that enclose each pole individually.

Im(z)
π
C1

Re(z)

C2
−π C

ez
Figure 8.1.3: A contour C that encloses the poles from f (z) = (z 2 +π 2 )2 split into two smaller contours C1 and
C2 that enclose each pole individually.

ez
H
C (z 2 +π 2 )2 dz for |z| = 4
Consider the integral
ez
I
I= dz for |z| = 4. (8.36)
C (z 2 + π 2 )2
This denominator of this integral can be expanded to reveal two double poles,
ez
I
I= 2 2
dz. (8.37)
C (z − iπ) (z + iπ)

We can therefore replace contour C with two smaller contours around each of these poles at ±iπ:
ez ez
I I
I= 2 2
dz + 2 2
dz. (8.38)
C1 (z − iπ) (z + iπ) C2 (z − iπ) (z + iπ)

This is drawn in figure 8.1.3. We now apply Cauchy’s general integral formula (eq. 8.31) for each pole.
Within C1
ez
‘f (z)’ = , (8.39)
(z + iπ)2
Chapter 8. Cauchy’s theorem 68

which is analytic in this region. So,

ez ez
 
d
I
dz = 2πi ,
(z − iπ)2 (z + iπ)2 dz (z + iπ)2 z=iπ
C1 (8.40)
1
= 2 (iπ − 1).

Applying the same approach to the integral around C2 we find

ez ez
 
d
I
2 2
dz = 2πi ,
C2 (z − iπ) (z + iπ) dz (z − iπ)2 z=−iπ
(8.41)
1
= 2 (iπ + 1).

So the result of this integral is
ez i
I
2 2
dz = . (8.42)
C (z − iπ) (z + iπ) π

8.1.2 Formal consequences


With Cauchy’s integral formula there are now a number of further statements we can make regarding
complex functions:

• We have expressions for f (n) (a) in terms of f (z) on C. Therefore, all derivatives exist.

• We have shown that if f (z) is analytic in some region R, then


Z z
F (z) = f (ξ) dξ (8.43)
a

is path independent and dF dz = f (z). Hence, F (z) is analytic. Then, by Cauchy’s formulas, all
higher derivatives of F (z) (and f (z)) exist. We can turn Cauchy’s theorem around and say that
path independence of F (z) implies analyticity of f (z). Furthermore, path independence of F (z)
implies C f (z) dz = 0. This is known as Morera’s theorem - the converse of Cauchy’s theorem -
H

if I
f (z) dz = 0 (8.44)
C
for any path C within a region R, then f (z) is analytic within R.

• We can find Liouville’s theorem that states that every bounded entire function must be constant.

Proof. We start by finding Cauchy’s inequality which comes from combining the estimation lemma
(eq. 7.11) and Cauchy’s integral formula. Consider the integral of f (z) around a closed circular
contour C of radius R centered on z = a:
n! f (z)
I
(n)
f (a) = dz ,
2π C (z − a)n+1
n! M (8.45)
≤ 2πR,
2π Rn+1
M n!
∴ f (n) (a) ≤ .
Rn
This is Cauchy’s inequality. If f (z) is analytic everywhere, in the limit
R → ∞ all derivatives tend to zero, provided f (z) is bounded; i.e., has a maximum modulus
Chapter 8. Cauchy’s theorem 69

somewhere in the plane. Hence f (z) is constant. Mathematically, we could consider just the first
derivative in this limit, such that
M
lim f ′ (a) ≤ lim = 0,
R→∞ R→∞ R
⇒ lim f ′ (a) = 0, (8.46)
R→∞
∴ f (a) = constant.

This is Liouville’s theorem.

So, every interesting (non-constant) function f (z) must have non-analyticities somewhere in the
infinite complex plane.

• We can also now prove the fundamental theorem of algebra: an nth order polynomial P (z) has
n roots (not necessarily distinct).

Proof. Assume the polynomial P (z) has no roots, therefore P (z)


1
is analytic and has some max-
imum value M . By Liouville’s theorem P (z) is constant. This contradicts the assumption that
1

P (z) is a polynomial. Let z1 be one root of P (z), now write

P (z) = (z − z1 )Q(z), (8.47)

where Q(z) is a (n − 1)th order polynomial. We can repeat this step and show Q(z) has atleast
one root:
P (z) = (z − z1 )(z − z2 )R(z), (8.48)
where R(z) is an (n − 2)th order polynomial. Iterating this process we show that P (z) has n
roots;
P (z) = (z − z1 )(z − z2 ) . . . (z − zn )c, (8.49)
where c is a constant.

8.1.3 Argument theorem

If f (z) is analytic except for at P poles and if it has N zeros within some curve C, then

f ′ (z)
I
1
dz = N − P. (8.50)
2πi C f (z)

This type of function, that is analytic except at isolated poles, is known as meromorphic.

Proof. Start with a function with one pole of order p, and one zero of order n within C,

(z − β)n
f (z) = g(z) , (8.51)
(z − α)p

where g(z) is analytic with no zeros within C. The derivative of f (z) is then

(z − β)n (z − β)n−1 (z − β)n


f ′ (z) = g ′ (z) + n g(z) − p g(z) . (8.52)
(z − α)p (z − α)p (z − α)p+1

So
f ′ (z) g ′ (z) n p
= + − . (8.53)
f (z) g(z) z−β z−α
Chapter 8. Cauchy’s theorem 70

Im(z)
Im(z)
C4

C3
=
C5
C1

C2
C
Re(z)
Re(z)

Figure 8.1.4: The argument theorem illustrated. The integral around a closed path that encompasses singularities
(dots) and zeros (crosses) is equivalent to the sum of smaller contours that circle each point individually.

The integral around a closed contour of the above expression is simply

f ′ (z)
I ′
g (z)
I
dz = dz + 2πi (n − p). (8.54)
C f (z) C g(z)

We know g ′ (z) is analytic because g(z) is analytic and g(z)


1
is analytic because g(z) has no zeros within
C, so I ′
g (z)
dz = 0. (8.55)
C g(z)
What about a function that has more zeroes and poles? We could write

g(z)(z − β1 )n1 (z − β2 )n2 . . . (z − βK )nK


f (z) = , (8.56)
(z − α1 )p1 (z − α2 )p2 . . . (z − αL )pL

so the ratio of this function over its derivative would become


f ′ (z) g ′ (z) n1 n2 nK
= + + + ··· +
f (z) g(z) z − β1 z − β2 z − βK (8.57)
p1 p2 pL
− − − ··· − .
z − α1 z − α2 z − αL
The integral of this ratio becomes,
K L
!
f ′ (z)
I X X
dz = 2πi ni − pi ,
C f (z)
i i
(8.58)
= 2πi (N − P ) ,

where N is the sum of orders of zeros of f (z) within C and P is the sum of orders of all poles of f (z)
within C.

Another way we can understand the argument theorem is replace the full contour C by a set of
contours of small circles about each zero and pole, as shown in figure 8.1.4. Mathematically, this means

f ′ (z) X I f ′ (z)
I
dz = dz = 2πi (N − P ) . (8.59)
C f (z) Ci f (z)
i
Chapter 8. Cauchy’s theorem 71

Why ‘argument’ theorem? Consider the definite integral


Z zf ′
f (z)
dz = [ln f (z)]zzfi . (8.60)
zi f (z)

For a closed path that encloses the origin,

zf = e2πi zi (8.61)

because ln f (z) is multi-valued, ln f (zf ) ̸= ln f (zi ). Writing f (z) = ρeiϕ ,

ln f (zf ) − ln f (zi ) = i (ϕf − ϕi ) ,


(8.62)
= i∆ϕ.

Hence for a polynomial ∆ϕ = 2πN .


Chapter 9

Taylor & Laurent series

We now move to discuss power series involving complex numbers. This will allow us to broaden what
we have seen so far in complex integration.

9.0.1 Taylor’s theorem

If f (z) is analytic within a circle of radius R centred on a (fig. 9.0.1), then for all z such that
|z − a| < R, we can express f (z) as a power series:
1 ′′ 1
f (z) = f (a) + f ′ (a)(z − a) + f (a)(z − a)2 + · · · + f (n) (z − a)n + . . . . (9.1)
2! n!

Proof. Let C be a circular path within a region of convergence of f (z), and let z and a be within this
path as shown in figure 9.0.1. Using Cauchy’s integral formula, we can introduce a:

f (ξ) f (ξ)
I I
1 1
f (z) = dξ = dξ. (9.2)
2πi C ξ − z 2πi C (ξ − a) − (z − a)

Making |ξ − a| > |z − a|,

z − a −1
 
f (ξ)
I
1
f (z) = 1− ,
2πiC ξ−a ξ−a
!
z−a z−a 2
 
f (ξ)
I
1
= 1+ + + ... , (9.3)
2πi C ξ − a ξ−a ξ−a

X 1 (n)
= (z − a)n f (a).
n!
n=0

All Taylor series you have previously encountered can still be applied for complex variables within
their radius of convergence. For instance,
X zn
ez = ,
n
n!
1 3
sin z = z −z + ...,
3! (9.4)
1
cos z = 1 − z 2 + . . . and
2!
1 1
ln(1 + z) = z − z 2 + z 3 + . . . .
2 3

72
Chapter 9. Taylor & Laurent series 73

R
Im(z)

C
a

Re(z)

Figure 9.0.1: A circular contour centered on z = a inside the region of analyticity |z − a| < R2 of f (z).

Expansion of (1 + z)−1 about z = 1


Here we expect the radius of convergence R to be 2 as this function has a pole at z = −1 which is 2
away from z = 1. The first few derivatives of this function are:

−1 2 −6 (−1)n n!
f ′ (z) = , f ′′ (z) = , f ′′′ (z) = , ..., f (n) = . (9.5)
(1 + z)2 (1 + z)3 (1 + z)4 (1 + z)n+1

At z = 1
1 −1 1 (−1)n n!
f (1) = , f ′ (1) = , f ′′ (1) = , ..., f (n) = . (9.6)
2 4 8 2n+1
The expansion is then

f (z) = f (1) + f ′ (1)(z − 1) + f ′′ (1)(z − 1)2 + . . . ,


1 1 1 1 (−1)n (9.7)
= − (z − 1) + (z − 1)2 − (z − 1)3 + · · · + n+1 (z − 1)n + . . . ,
2 4 8 16 2
provided |z − 1| < 2.
Alternatively, let w = z − 1, therefore
1 1 w −1
f (w) = = 1+ ,
w+1 2 2
w w2 w3
 
1
= 1− + − + ... , (9.8)
2 2 4 8
1 z − 1 (z − 1)2
= − + + ...,
2 4 8
provided w
2 < 1.

Expansion of cos z about z = π/2


The first few derivatives of this function are:

f (z) = cos z, f ′ (z) = − sin z, f ′′ (z) = − cos(z), f ′′′ (z) = sin z. (9.9)

At z = π/2, these reduce to


π π π π
f ( ) = 0, f ′ ( ) = −1, f ′′ ( ) = 0, f ′′′ ( ) = 1. (9.10)
2 2 2 2
Chapter 9. Taylor & Laurent series 74

So the expansion of cos z is


π π π
cos z = cos − sin z− + ...,
2 2 2
 π 1  π 3 1  π 5 (9.11)
=0− z− + z− − z− + ....
2 3! 2 5! 2
Alternatively, let w = z − π2 , then
π π π
cos z = cos w + = cos w cos − sin w sin ,
2 2 2
= − sin w, (9.12)
1 1
= −w + w3 − w5 + . . . .
3! 5!
In this case the radius of convergence is infinite as there are no singularities in the complex plane.
Note that if f (z) has a zero of order N at z = a then the Taylor series will not start with
1 (N )
f (a)(z − a)N + . . . , (9.13)
N!
as expected since
f (z) = (z − a)N g(z). (9.14)
Chapter 9. Taylor & Laurent series 75

R2
Im(z)

R1 C1 C2
a

Re(z)

Figure 9.0.2: Two circular contours centered on z = a inside the region of analyticity R1 < |z − a| < R2 of
f (z).

9.0.2 Laurent’s theorem


What about functions that have poles? Lets now consider how we can extend power series to include
such functions. For instance
1
f (z) = (9.15)
z+1
for |z| > 1. Here, we can take a factor of z out of the denominator and then expand what remains.
This leaves us with
!
1 1
f (z) = ,
z 1 + z1
 
=
1 1 1 1
1 − + 2 − 3 + ... , (9.16)
z z z z
1 1 1 1
= − 2 + 3 − 4 + ...,
z z z z
which converges for |z| > 1 (so z1 < 1). This is the lowest series of 1+z
1
about z = 0, for |z| > 1. Note
that it doesn’t tell us anything about the function at z = 0 because it is not valid there.
Now, consider a function f (z) which is analytic for R1 < |z − a| < R2 , taken around two different
contours, C1 and C2 which lie within this region of analyticity. This is drawn in figure 9.0.2. Consider
some contour C that traces both C1 and C2 as illustrated in figure 9.0.3. f (z) is analytic within C, so
we can write
f (ξ)
I
1
f (z) = dξ. (9.17)
2πi C ξ − z
However, the closed contour integral around C can be written as the sum of four definite integrals;
I Z B Z C Z D Z A
= + + + . (9.18)
C A B C D

Shrinking the gap between lines AB and DC, A → D and B → C, such that
Z A I Z C I Z B Z D
→ , →− and →− . (9.19)
D C1 B C2 A C

This leaves eq. 9.17 as


f (ξ) f (ξ)
I I
1 1
f (z) = dz − dξ. (9.20)
2πi C1 ξ−z 2πi C2 ξ−z
Chapter 9. Taylor & Laurent series 76

Im(z)

C1 C2
a
B
C
A
D

Re(z)

Figure 9.0.3: A contour C that connects points A, B, C & D whilst tracing contours C1 & C2 such that the
function is analytic within the region enclosed.

We will first concentrate on the first integral in equation 9.20. For this integral,
|ξ − a| > |z − a|, so we can write
z − a −1
 
1 1 1
= = 1− ,
ξ−z (ξ − a) − (z − a) ξ−a ξ−a
! (9.21)
z−a z−a 2
 
1
= 1+ + + ... .
ξ−a ξ−a ξ−a
This allows us to write the integral as a series,
f (ξ) f (ξ) z−a f (ξ) (z − a)2 f (ξ)
I I I I
1 1
dξ = dξ + 2 dξ + 3 dξ
2πi C1 ξ − z 2πi C1 ξ − a 2πi C1 (ξ − a) 2πi C1 (ξ − a)
f (ξ)
I
1 n
+ ··· + (z − a) n+1 dξ + . . . , (9.22)
2πi C1 (ξ − a)
X∞
= an (z − a)n ,
n=0
where
f (ξ)
I
1
an = dξ. (9.23)
2πi C1 (ξ − a)n+1
Note that an is often equal to f (n) (0) (REF BACK TO RELEVANT SECTION).
Turning our attention to the second integral in equation 9.20. Here, on C2 , we have |ξ − a| < |z − a|,
so we can write
−1 ξ − a −1
 
1 1
= = 1− ,
ξ−z (ξ − a) − (z − a) z−a z−a
! (9.24)
−1 ξ−a ξ−a 2
 
= 1+ + + ... .
z−a z−a z−a
Therefore we can write the second contour integral as
−1 f (ξ)
I I I
1 1 1
dξ = f (ξ) dξ + f (ξ)(ξ − a) + . . .
2πi C2 ξ − z 2πi C2 (z − a)2 2πi C2
I
1 1
+ f (ξ)(ξ − a)n−1 dξ + . . . , (9.25)
(z − a)n 2πi C2

X bn
= ,
(z − a)n
n=1
Chapter 9. Taylor & Laurent series 77

where I
1
bn = f (ξ)(ξ − a)n−1 dξ. (9.26)
2πi C2

So, within the annulus of radii R1 and R2 f (z) is represented by the Laurent series:
∞ ∞
X X 1
f (z) = an (z − a)n + bn . (9.27)
(z − a)n
n=0 n=1

As f (z) is analytic between C1 and C2 , we can replace these contours in the definitions of an and bn
with any contour C lying within the annulus of convergence.
Note: the part with positive powers of z is known as the analytic part and the part with negative
powers of z is called the principal part of the Laurent series.
Chapter 10

The residue theorem

The residue theorem is one of the most elegant theorems in mathematics. It is a powerful tool that
lets us compute line integrals for analytic functions on closed curves (which we will explore shortly),
infinite series (which will be explored after) and real integrals (the final and arguably most applicable
part of the course).
What is a residue? A residue is the first non-zero term in the principal part of the Laurent series
evaluated about a point z = a. The Laurent series for some function f (z) expanded about z = a in a
Laurent series, valid within 0 < |z − a| < R is
X bn X
f (z) = n
+ an (z − a)n . (10.1)
(z − a)
n=1 n=0

Then, for some curve C lying within the region 0 < |z − a| < R,
I
f (z) dz = 2πi b1 . (10.2)
C

The coefficient b1 is the residue of f (z) at z = a.

The residue theorem states that for a curve C that encloses a number of poles of f (z), the closed
contour integral around C is then
I
f (z) dz = 2πi (sum of residues at all poles inside C) . (10.3)
C

MORE
Let’s consider an example and compare methods we’ve seen previously and the residue theorem.
Let’s evaluate I
1
dz, (10.4)
C z(z − 2)
where C encloses both z = 0 and z = 2 (see fig. 10.0.1). Here, we use the argument theorem (subsection
8.1.3) to split C into C1 and C2 that enclose only z = 0 and z = 2 respectively;
I I I
1 1 1
dz = dz + dz. (10.5)
C z(z − 2) C1 z(z − 2) C2 z(z − 2)

We then evaluate each of these using Cauchy’s integral formula (eq. 8.25). For C1 , we set
1 f (z)
= , (10.6)
z(z − 2) z
so I  
1 1
= 2πi f (0) = 2πi = −πi. (10.7)
C1 z(z − 2) z−2 z=0

78
Chapter 10. The residue theorem 79

Im(z)

Re(z)
C1 C2 2

Figure 10.0.1: A contour C that encloses both poles of f (z) at z = 0 and z = 2, and smaller contours C1 and
C2 that enclose each pole individually.

Similarly, for C2 , we set


1 f (z)
= . (10.8)
z(z − 2) z−2
So the integral becomes
I  
1 1
dz = 2πi f (2) = 2πi = πi. (10.9)
C2 z(z − 2) z z=2
Overall, our result is I
1
dz = −πi + πi = 0. (10.10)
C z(z − 2)
Using the residue theorem, we expand around each pole in each contour. Around C1 where there
is a pole at z = 0,
1 −1  z −1
= 1− ,
z(z − 2) 2z 2
(10.11)
−1  z  −1 1
= 1 + + ... = − + ....
2z 2 2z 4
So the residue from C1 is b1 = − 12 . Around C2 there is a pole at z = 2. We follow the same approach
before, though the algebra is made slightly easier with the substitution w = z − 2 and then expanding
around w = 0. With this,
1 1 1  w −1
= = 1+ ,
z(z − 2) w(w + 2) 2w 2
(10.12)
1  w  1 1
= 1 − + ... = − + ....
2w 2 2w 4
So the residue from C2 is b1 = 12 . The integral around C is therefore
I  
1 1 1
dz = 2πi − + = 0, (10.13)
C z(z − 2) 2 2
as expected.
As we only require the first term from the principal part, there are some tricks to suit different
functions. For instance, if f (z) only has a simple pole at z = a, then

g(z) = (z − a)f (z) (10.14)

is analytic at z = a. Therefore we can Taylor expand g(z) about z = a:

g(z) = g0 + (z − a)g1 + O (z − a)2 . (10.15)



Chapter 10. The residue theorem 80

The Laurent series of f (z) is then just


g0
f (z) = + g1 + O ((z − a)) . (10.16)
z−a
So, the residue of f (z) at z = a is

res (a) ≡ b1 ≡ g0 = g(a),


(10.17)
= lim [(z − a)f (z)] .
z→a

g(z)
If f (z) is given in the form z−a , then the residue is simply g(a).

z−1
Example: f (z) = z(z−2)

Consider the function


z−1
f (z) = , (10.18)
z(z − 2)
which has simple poles at z = 0 and z = 2. The residue at z = 0 is given by

(z − 1) 0−1
 
1
res (0) = lim z = = . (10.19)
z→0 z(z − 2) 0−2 2

At z = 2, the residue is
(z − 1) 2−1
 
1
res (2) = lim (z − 2) = = . (10.20)
z→2 z(z − 2) 2 2

1
Example: f (z) = sin z

The function
1
f (z) = (10.21)
sin z
has simple poles at z = nπ. The residue at these points are

z − nπ 0 L′ Hôpital 1
res (nπ) = lim = = = (−1)n . (10.22)
z→nπ sin z 0 cos nπ

g(z)
Example: f (z) = h(z)

A more general case would be a function that can be written as


g(z)
f (z) = , (10.23)
h(z)

where g(a) ̸= 0 and h(a) = 0. In such cases, we can Taylor expand h(z) about z = a:

h(z) = h(a) +(z − a)h′ (a) + O (z − a)2 . (10.24)



|{z}
=0

The residue of f (z) is therefore


 
g(z)
res (a) = lim (z − a) ,
z→a (z − a)h′ (a) + O ((z − a)2 )
  (10.25)
g(z) g(a)
= lim ′ = ′ .
z→a h (a) + O (z − a) h (a)

Our previous example conforms with this result with g(z) = 1 and h(z) = sin(z).
Chapter 10. The residue theorem 81

We can extend the approach outlined above to functions with a pole of order n at z = a. Consider
such a function f (z), then g(z) = (z − a)n f (z) is analytic at z = a. Taylor expanding g(z) about
z = a,
z−a ′
g(z) = g(a) + g (a) + . . .
1!
(10.26)
(z − a)n−1 (n−1)
+ g (a) + . . . .
(n − 1)!

This allows us to write


g(z)
f (z) = ,
(z − a)n
(10.27)
g(a) (z − a)−1 (n−1)
= + · · · + g (a) + . . . ,
(z − a)n (n − 1)!

so we can read off the residue as

g (n−1) (a)
res (a) = b1 = ,
(n − 1)!
( ) (10.28)
1 d(n−1) n
= lim [(z − a) f (z)] .
(n − 1)! z→a dz (n−1)

These cases are more work, unless f (z) is already in the form g(z)(z − a)−n and the Taylor series of
g(z) is known. As an example, we could find the residue of

sin z
f (z) = (10.29)
z8
at z = 0. Taylor expanding the numerator we find
1 3 1 5 1 7
sin z z− 3! z + 5! z − 7! z + ...
8
= . (10.30)
z z8
So, the coefficient of z1 , and hence residue, is

1 1
res (0) = − =− . (10.31)
7! 5040
(final example?)

Further reading:
Chapter 11

Complex methods for real integrals

We will now conclude this course by bringing together everything we have learnt about complex analysis
to evaluate real integrals. This might sound odd at first, but this will allow us to evaluate integrals
that are nearly impossible through other analytic methods. We have been building towards this for
some time, so you should expect to see a question on real integration in the exam.
Consider the integral
Z ∞ Z R
1 1
I= dx = lim dx, (11.1)
−∞ x + a2
2 R→∞ −R x2 + a2

for real a > 0. The approach you learnt in first year (or before) is to make the substitution x = tan θ.
The result of this is:

x R
 
1
I = lim arctan
R→∞ a a −R
(11.2)
2 R 2π π
= lim arctan = = .
R→∞ a a a2 a
We can also evaluate this integral by using complex integration and the residue theorem. x = Re(z),
so a2 +z
1
2 dz = a2 +x2 dx on the real axis. Furthermore, we can rewrite the integrand to make the poles
1

explicit:
1 1
2 2
= ; (11.3)
a +z (z + ia)(z − ia)
so there are simple poles at z = ±ia. The trick with all of these problems is to choose a suitable
contour that both has a segment that corresponds to the definite integral we wish to calculate and
allows us to make use of the residue theorem. In this example, such a contour traces a semicircle of
radius R with its base on the real axis where we explore the limit R → ∞; this is drawn in figure
11.0.1. We can therefore extract the integral of interest by splitting the contour integral up into each
path:
I
1
IC = 2 2
dz = I1 + I2 ,
C z +a
Z R Z −R
1 1 (11.4)
= 2 2
dz + 2 2
dz .
z +a z +a
| −R {z } |R {z }
along real axis along semicircle

So,
I = lim I1 = lim [IC − I2 ] . (11.5)
R→∞ R→∞

82
Chapter 11. Complex methods for real integrals 83

Im(z)

C
ia

Re(z)
−R R

−ia

Figure 11.0.1: A contour C that consists of a semicircle of radius R in the upper-half plane and a straight line
along the real axis. C encloses the upper pole at z = ia.

The result of the closed-contour integral is


I
1
IC = 2 2
dz = 2πi res (ia) (by the residue theorem)
C z +a
z − ia
= 2πi lim 2
z→ia z + a2
z − ia (11.6)
= 2πi lim
z→ia (z − ia)(z + ia)
1
= 2πi lim
z→ia z + ia
π
= .
a
The integral around the semicircle, I2 , requires a little more work. You might already have noticed
that we expect this term to vanish for R → ∞. This follows from the estimation lemma where in this
case the maximum value is when z = iR and the contour traces a length πR; therefore
Z −R
1 1
dz ≤ 2 πR. (11.7)
R z2 +a2 R − a2

From the result above, we can indeed see that in the limit R → ∞ the integral goes to zero. So, we
conclude that Z ∞
1 π
2 2
= . (11.8)
−∞ x + a a
That was quite a lot of effort, and probably not worth it for such a simple integral. Consider instead
a more challenging problem that cannot be solved with a standard substitution, such as
Z ∞
1
I= 6 6
dz, (11.9)
−∞ z + a

for real a > 0. From the fundamental theorem of algebra (REF), the integrand has six simple poles:

z 6 + a6 = 0 or z 6 = −a6
1
 1
6 iπ 2πik
(11.10)
⇒ z = a(−1) 6 = a eiπ+2πik = ae 6 e 6 ,

with k = 0, 1, 2, . . . , 5. We can evaluate the integral by using an identical contour to the previous
example; however, it will now enclose three poles instead of just one (see fig. 11.0.2).
Chapter 11. Complex methods for real integrals 84

Im(z)

C
ia

Re(z)
−R R

−ia

Figure 11.0.2: A contour C that consists of a semicircle of radius R in the upper-half plane and a straight line
πi 5πi
along the real axis. C encloses the upper poles at z = ae 6 , ia, ae 6 .

Like the previous example, we write the integral we wish to evaluate as the limit as R tends to
infinity of the difference between the closed-contour integral and the semicircular part:
I = lim I1 = lim [IC − I2 ] ,
R→∞ R→∞
 
I 1
Z −R
1  (11.11)
= lim  dz − dz  .
 
R→∞  C z 6 + a6 z 6 + a6
|R

{z }
along semicircle

You should be able use the estimation lemma to show that I2 vanishes as R → ∞. So we just need to
find the value of IC , which follows from the residue theorem once more:
 πi 5πi

IC = 2πi res (ae 6 ) + res (ia) + res (ae 6 ) . (11.12)

This particular integrand is of the form h(z) ,


1
so the residues are given by 1
h′ (z) evaluated at each pole.
In this case, we have
h′ (z) = 6z 5 , (11.13)
so the integral around the contour is
 
1 1 1
IC = 2πi + 5 +
6z 5 z=ae πi6 6z z=ia 6z 5 z=ae 5πi
6

2πi  − 5πi − 25πi



= 5 e 6 −i+e 6
6a    (11.14)
πi 5π 5π h π πi
= 5 cos − i sin − i + cos − i sin
3a 6 6 6 6
 
π 1 2π
= 5 2 × + 1 = 5.
3a 2 3a
So I = 2π
3a5
. This is much easier than trying to find a suitable substitution.

11.1 Fourier integrals


In the previous examples we have the freedom to choose the contour to be in either the upper half or
lower half plane. However, we do not always have this freedom. Consider the integral of the form
Z ∞
I= f (x)eikx dx, (11.15)
−∞
Chapter 11. Complex methods for real integrals 85

θ
π
2

2θ π
Figure 11.1.1: A plot illustrating that sin θ (solid) is greater than π (dashed) for 0 < θ < 2.

where k > 0 (k ∈ R) and


lim f (z) = 0 (11.16)
z→∞
in the upper half plane of z. We also assume that f (z) is meromorphic in the upper half plane; its
only singularities are isolated poles.

Example

eikx
Z
I= dx, (11.17)
−∞ 1 + x2
for real k > 0. Again, we write
eikz
I
dz = I1 + I2 , (11.18)
1 + z2
using the same contour as before (fig. 11.0.1) and

I = lim I1 . (11.19)
R→∞

What about the integral around the semicircle, I2 ? Here we can make use of the method used to derive
the estimation lemma (eq. 7.11):
Z π −kR sin θ
eikz eikz e
Z Z
2
dz ≤ 2
|dz| ≤ |i|R eiθ dθ,
I2 1 + z I2 1 + z 0 R2 − 1
(11.20)
| {z }
writing y=R sin θ
Z π
R
= 2 e−kR sin θ dθ.
R −1 0

To proceed, we note that the integrand is symmetrical about θ = π/2 and that sin θ > θ
π for 0 < θ < π2 ,
as shown in figure 11.1.1. Therefore,
Z π Z π/2
R −kR sin θ 2R −kR 2θ π 1  −kR

e dθ ≤ e π dθ = 1 − e . (11.21)
R2 − 1 0 R2 − 1 0 k R2 − 1
So
lim I2 = 0. (11.22)
R→∞
To evaluate the closed-contour integral, we write
eikz eikz
I I
2
dz = dz, (11.23)
C 1+z C (z − i)(z + i)
Chapter 11. Complex methods for real integrals 86

I2

I3
I1 I1 x
−R −ϵ ϵ R

Figure 11.1.2: A contour C composed of two concentric semicircles of radii R and ϵ in the upper half plane
centred over the pole at the origin.

where the pole at z = i is enclosed by the contour. The residue at this pole is

e−k
 ikz 
e
res (i) = = . (11.24)
z + i z=i 2i

Finally, the result for the integral I is


 
I = lim πe−k − I2 = πe−k . (11.25)
R→∞

If k had instead been negative, we would have closed the contour in the lower half plane and obtained
I = πek . The result that is valid any real k ̸= 0 is πe−|k| .
This is an example of the use of Jordan’s lemma, which states that I2 will vanish for the integrand
e f (z) if:
ikz

• k > 0,

• f (z) has a finite number of poles in the upper half plane,

• f (z) → 0 as |z| → ∞ in the upper half plane.

On the other hand, if k has the opposite sign we need to close the contour in the lower half plane. For
that case, Jordan’s lemma reads:

• k < 0,

• f (z) has a finite number of poles in the lower half plane,

• f (z) → 0 as |z| → ∞ in the lower half plane.

NOTE: Consider re-writing in terms of Mike’s general proof then using this as an example.
Another example of an improper integral we can now evaluate is
Z ∞
sin x
dx. (11.26)
−∞ x

Naively, we might expect this integral to be the imaginary part of


Z ∞ ix
? e
I= dx, (11.27)
−∞ x

but the integrand now has a pole at the x = 0, because the numerator is 1 there. Instead we redefine
the real integral I, by making use of the contour drawn in figure 11.1.2. The integral around the closed
contour, IC , is
IC = I1 + I2 + I3 = 0, (11.28)
Chapter 11. Complex methods for real integrals 87

as it does not enclose any poles. We now define the integral I to be

I = lim I1 . (11.29)
R→∞
ϵ→0

Using Jordan’s lemma, we find that


lim I2 = 0. (11.30)
R→∞
Lastly, we have
ϵ
eiz
Z
I3 = dz. (11.31)
−ϵ z
Let z = ϵeiθ , therefore
0 iθ 0
eiϵe iϵeiθ
Z Z

I3 = dθ = i eiϵe dθ. (11.32)
π ϵeiθ π

In the limit ϵ → 0, eiϵe → 1, so
lim I3 = −πi, (11.33)
ϵ→0
which is the only contribution to I;
I = −I3 = iπ. (11.34)
The imaginary part of this is just π, so we have the result
Z ∞
sin x
dx = π. (11.35)
−∞ x

From this result we have also found



cos x
Z
P dx = 0, (11.36)
−∞ x
by taking the real part of I instead; the symbol ‘P’ is used to indicate that we have excluded the
interval [−ϵ, ϵ] from the range of integration and taken the limit ϵ → 0.
In general, if there is a pole on the real axis, the trick of excluding ϵ on either side is a way of
defining the integral. The limit as ϵ → 0 is called the Cauchy principal value of the integral.
(Should general case and definition be provided first, followed by examples?)
MORE with example of higher-order pole.

11.2 Integrand containing trig functions


We can also evaluate integrals involving rational functions of sin θ and cos θ by extending to the complex
plane and making use of the residue theorem. For example, consider
Z 2π
1
I= dθ. (11.37)
0 5 + 4 cos θ
To evaluate this integral using traditional (or real) techniques would require some nontrivial substitu-
tion. If we extend to the complex plane, we just need to identify the poles and make use of the residue
theorem. Here, that means
dz
z = eiθ , dz = ieiθ dθ ∴ dθ = ; (11.38)
iz
and
 
1 1
cos θ = z+ ,
2 z
  (11.39)
1 1
sin θ = z− .
2i z
Chapter 11. Complex methods for real integrals 88

I2

I4 I1
x
−1
ϵ I3

Figure 11.3.1: A contour C composed of two concentric circles of radii R and ϵ connected via a bridge such that
the branch cut on the positive x axis is not enclosed by C.

Applying these to the integral at hand, in which the contour C is the circle |z| = 1, gives
dz
I
1
I= 1 iz ,

C 5+4 z+ z
I
1 1
= dz, (11.40)
2i C z + 52 z + 1
2
I
1 1
= dz.
2i C z + 12 (z + 2)


Within C, there is only one pole at z = − 12 . The residue at this pole is


   
1 1 1 12
res − = = . (11.41)
2 2i z + 2 z=− 1 2i 3
2

So, from the residue theorem, the value of the integral is 3 .


11.3 Integrand with branch point


We now consider the case of an integrand that has a branch point. For example, an integrand of the
1
form g(z)(z − z0 ) 2 has a branch point at z = z0 . To do the integral, we require a contour that doesn’t
cross the branch cut associated with the branch point. For example,
Z ∞ −α
x
I= dx, (11.42)
0 1+x
for 0 < α < 1. We choose the branch cut to lie just below the positive x axis. With this choice,
θ = arg z is restricted to the range [0, 2π). A suitable contour is drawn in figure 11.3.1. From the
residue theorem, the integral around the contour is
IC = I1 + I2 + I3 + I4 = 2πi res (−1)
= 2πi res (eiπ )
−α (11.43)
= 2πi eiπ
= 2πi e−iπα .
However, we desire
I = lim I1 . (11.44)
R→∞
ϵ→0
Chapter 11. Complex methods for real integrals 89

We continue by inspecting each integral in turn, starting with I2 . As the contour for I2 is a circle of
radius R, we change the integration variable from z to θ via

z = Reiθ and dz = iReiθ dθ. (11.45)

So,

R−α e−iαθ
Z
I2 = iReiθ dθ. (11.46)
0 1 + Reiθ
By using the estimation lemma, we find

R−α+1
|I2 | ≤ 2π. (11.47)
R−1
For large R this tends to 2πR−α which tends to 0 as R → ∞. I4 is also along a circular contour, so
using a similar approach we work out
Z 0 −α −iαθ
ϵ e
I4 = iθ
iϵeiθ dθ. (11.48)
2π 1 + ϵe

Again, by using the estimation lemma, we find

ϵ1−α
|I4 | ≤ 2π. (11.49)
1−ϵ
This also tends to 0 as ϵ → 0 provided α < 1. Finally we consider I3 where z = xe2πi . This can be
written in terms of I1 :
−α
xe2πi
Z ϵ
I3 = dx,
R 1+x
Z R −α
x
= −e−2πiα dx, (11.50)
ϵ 1+x
| {z }
I1
−2πiα
= −e I1 .

The result of the closed-contour integral is therefore

IC = I1 + I3 = I1 1 − e−2πiα = −iπα
(11.51)

|2πie{z } ,
from residue theorem

so that
π
I1 eiπα − e−iπα = 2πi, or I1 = (11.52)

.
sin(πα)
So the result for I is finite and positive for 0 < α < 1, as it should be.

11.4 Shifting Gaussian integrals


A type of integral we regularly encounter has the form
Z ∞ Z ∞
2 k2 k 2
e−αx e−ikx dx = e− 4α2 e−α(x+i 2α ) dx. (11.53)
−∞ −∞

The common approach here is to shift the integration variable such that it returns a Gaussian which
we know how to integrate; i.e.,
Z ∞ Z ∞ r
−α(x−x0 )2 −αu2 π
e dx = e du = , (11.54)
−∞ −∞ α
Chapter 11. Complex methods for real integrals 90

Im(z) I3
b
I4 I2 Re(z)

−R I1 R

Figure 11.4.1: Rectangular contour C of height b and width 2R split into four integrals.

where we have made the substitution u = x − x0 . We can prove this approach is valid for imaginary
x0 = ib by again extending the integral to the complex plane. Consider the integral
I
2
I= e−z dz (11.55)
C

around the contour as displayed in figure 11.4.1 in the limit R → ∞. This contour encloses no
singularities so the integral yields zero. Therefore,

I1 + I2 + I3 + I4 = 0. (11.56)

Let’s consider I2 and I4 first; their integrand is


2 2 −y 2 )
e−z = e−(x e−2ixy , (11.57)

the magnitude of which is


2 2 −y 2 ) 2 +y 2 2 +b2
e−z = e−(x = e−R ≤ e−R , (11.58)
where we have used y ≤ b. This will tend to zero as R tends to infinity, so I2 and I4 both vanish in
this limit. For I to be zero, it must be that I1 and I3 are equal and opposite. In the limit R → ∞,
you should verify that this results in
Z ∞ Z ∞
−x2 2 √
e dx = e−(x−ib) dx = π. (11.59)
−∞ −∞

So, the imaginary shift x0 = ib does not change the value of the integral.

11.5 Series summation


Another surprising application of complex integration is series summation. For instance, we can show
that

1 1 1 X 1 π2
S = 2 + 2 + 2 + ... = = (11.60)
1 2 3 n2 6
n=1
by considering an integral
cot z
I
IN = dz. (11.61)
CN z2
The integrand has poles where cot z has poles: at z = nπ. These are simple for n ̸= 0, and of order 3
for n = 0. Starting with the simple poles, we can write

z = nπ + w, (11.62)

such that
cos(nπ + w) (−1)n cos w
cot z = = = cot w,
sin(nπ + w) (−1)n sin w
(11.63)
1 w w3
= − − + O(w5 ),
w 3 45
Chapter 11. Complex methods for real integrals 91

Im(z)
−L + iL L + iL

Re(z)

−L − iL L − iL

Figure 11.5.1: A square contour C of side length L = (N + 21 )π. The simple poles on the real axis are represented
by dots and the third-order pole at the origin is shown as a circled dot.

in which we have written down a few terms of the ‘known’ (= looked-up) Laurent series for cot w.
Thus,  
cot z 1 1 w
= − + ... . (11.64)
z2 (nπ + w)2 w 3
Therefore the residues at the simple poles are
 
w 1 w
res (nπ) = lim − + ...
w→0 (nπ + w)2 w 3
(11.65)
1
= 2 2 , for n ̸= 0.
n π
To find the residue from the third order pole at n = 0, we again make use of the Laurent series for the
cotangent:  
cot z 1 1 z 1 1
= − + O(z 3
) = 3− + O(z). (11.66)
z2 z2 z 3 z 3z
From the coefficient of 1
z we find the residue
1
res (0) = − . (11.67)
3
The result of the contour integral is therefore
N
!
1 X 1
IN = 2πi − + 2 . (11.68)
3 n2 π 2
n=1

To complete our work, we use the estimation lemma to show that IN vanishes for N → ∞. To do
this, we consider a square contour centered on the origin with side length 2L, see figure 11.5.1. On
this contour,
1 1 1
|f (z)| = 2 = 2 2
≤ 2. (11.69)
|z | x +y L
On the vertical side where z = L + iy,
cos(L + iy) − sin L sin(iy)
cot z = =
sin(L + iy) sin L cos(iy) (11.70)
= − tan(iy) = −i tanh y,

where we have used cos L = cos [N + 12 ]π = 0. Thus,




|cot z| = |tanh y| < 1 (11.71)


Chapter 11. Complex methods for real integrals 92

on this side, and similarly for z = −L + iy. On the horizontal side, we have z = x + iL,
1 i(x+iL) + e−i(x+iL)

2 e
⇒ cot z = 1 i(x+iL) −i(x+iL)
,
2i e − e
eix e−L + e−ix eL

= i ix −L , (11.72)
(e e − e−ix eL )
ie−ix eL
≈ = −i,
−e−ix eL

where on the last line we have assumed eL ≫ e−L for large L; the corrections to the final result are
very small, of order e−2L . Thus, on all sides, | cot z| ≲ 1 and

cot z 1
2
≲ 2. (11.73)
z L

So, integrated around C,


cot z
I
8L
2
dz ≲ 2 , (11.74)
C z L
which tends to 0 as L → ∞. In this limit, we can rearrange eq. 11.68 to find

X 1 π2
= (11.75)
n2 6
n=1

as desired.
Similar sums can be done using similar trigonometric multipliers in the integrands. We choose
the trig function whose poles match the series to be summed: cot z and cosec z (whose poles are at
z = nπ) can be used for sums over all integers in which the terms have the same or alternating signs,
respectively; while tan z and sec z (poles at z = (n + 12 )π) are useful for sums over odd integers in
which the terms have the same or alternating signs, respectively.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy