0% found this document useful (0 votes)

39 views210 pages

Fisica Matematica I

The document describes Fourier series, which represent periodic functions as an infinite sum of sines and cosines. Periodic functions repeat their values over intervals of length L. Fourier series express periodic functions as a combination of trigonometric basis functions with frequencies that are integer multiples of the fundamental frequency 1/L. The coefficients of the trigonometric terms in the Fourier series capture how much each frequency component contributes to the overall periodic function.

Uploaded by

Leonardo Medeiros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views210 pages

Fisica Matematica I

Uploaded by

Leonardo Medeiros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 210

Mathematical physics

Gabriel T. Landi
University of São Paulo

December 15, 2020

Contents

1 Fourier Series 3
1.1 Periodic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Orthogonality of trigonometric functions . . . . . . . . . . . . . . . . 5
1.3 The Fourier recipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Complex form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Parseval’s identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.7 Dirac delta and Heaviside functions . . . . . . . . . . . . . . . . . . 25
1.8 Convergence of Fourier series . . . . . . . . . . . . . . . . . . . . . 29
1.9 Integrals and derivatives of Fourier series . . . . . . . . . . . . . . . 32

2 Ordinary Differential Equations 36

2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2 Separable equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 General solution of linear ODEs . . . . . . . . . . . . . . . . . . . . 40
2.4 Hello Green’s functions . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5 Linear first order equations . . . . . . . . . . . . . . . . . . . . . . . 45
2.6 First order ODEs with time-periodic forcing . . . . . . . . . . . . . . 50
2.7 2nd order, homogeneous ODEs: unforced oscillations . . . . . . . . . 56
2.8 Inhomogeneous ODEs; forced oscillations . . . . . . . . . . . . . . . 61

3 Partial Differential Equations 68

3.1 Overview of some important equations . . . . . . . . . . . . . . . . . 68
3.2 The method of separation of variables . . . . . . . . . . . . . . . . . 70
3.3 Heat equation and Fourier’s law . . . . . . . . . . . . . . . . . . . . 73
3.4 The heat equation in 1D . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5 Interlude: numerical solutions of differential equations . . . . . . . . 91
3.6 Laplace’s equation in 2D . . . . . . . . . . . . . . . . . . . . . . . . 100
3.7 Electromagnetic waves, strings and vibrations in solids . . . . . . . . 110
3.8 Wave equation in 1D (strings) . . . . . . . . . . . . . . . . . . . . . 118
3.9 Waves, Schrödinger, Klein-Gordon and Dirac . . . . . . . . . . . . . 126

1
4 Fourier Transforms 130
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.2 Characteristic function of a probability distribution (optional) . . . . . 139
4.3 Operations involving Fourier Transforms . . . . . . . . . . . . . . . . 141
4.4 Cauchy problem for the heat equation . . . . . . . . . . . . . . . . . 146
4.5 Quantum dynamics and Heisenberg’s uncertainty principle . . . . . . 152
4.6 Poisson’s equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.7 Discrete Fourier Transform and the FFT algorithm . . . . . . . . . . 164

5 Legendre Polynomials 174

5.1 Series solutions of differential equations . . . . . . . . . . . . . . . . 176
5.2 Rodrigues’ formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.3 Generating function . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
5.4 Multipole expansion of Poisson’s equation . . . . . . . . . . . . . . . 189
5.5 Associated Legendre functions . . . . . . . . . . . . . . . . . . . . . 192
5.6 Laplacian in spherical coordinates . . . . . . . . . . . . . . . . . . . 194
5.7 Angular equation and Spherical Harmonics . . . . . . . . . . . . . . 197
5.8 The hydrogen atom . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Chapter 1

Fourier Series

Fourier series is a method for dealing with functions f (x) that are periodic, with
some period L. That is, functions which satisfy

f (x + L) = f (x), (1.1)

Since periodic stuff is so common in physics, Fourier series turn out to be extremely
useful. They will also serve as the entry point for an even more useful idea, called
Fourier Transform, that we will study later on.
Periodic functions can have many shapes. Fig 1.1 shows some examples. The cool
thing about them is that you don’t need to consider the entire real line. All information
about them is contained in an interval of length L. The actual choice of interval is up
to you, as long as it has length L. This is illustrated in Fig. 1.2.

1.1 Periodic functions

When we think about periodic functions, we immediately think about sines and
cosines. Indeed, the functions
! !
2πx 2πx
cos , and sin ,
L L

��
��
��
��
�(�)

�(�)

��

��
��
��
-� -� � � � ��
� � �

Figure 1.1: Examples of periodic functions. These functions were chosen to have period L = 1.

3
� ∈ [��] � ∈ [-�/��/�] � ∈ [-�/��/�]

��

�(�) ��

�(�)

�(�)
��
��
��
-� -� � � � -� -� � � � -� -� � � �
� � �

Figure 1.2: All information about periodic functions is contained in the internal L. The actual
choice of interval is arbitrary, as long as it has length L.

�=� �=� �=� �=� �=� �=� �=�

� ��

Figure 1.3: The function cos 2πnx
L
for the first few values of n (with L = 1).

are periodic, with period L. We can explicitly verify that this is the case, e.g. by
expanding
! ! ! ! ! !
2π 2πx 2πL 2πx 2πL 2πx
sin (x + L) = sin cos + cos sin = sin ,
L L L L L L

since cos(2π) = 1 and sin(2π) = 0. These functions, however, are not the only trigono-
metric functions which have period L. There is, in fact, an entire set of functions of the
form ! !
2πnx 2πnx
cos , and sin , n = 0, 1, 2, 3, . . . .
L L
It suffices to stick to n ∈ N since n < 0 does not give us anything new (because
sin(−θ) = − sin(θ) and cos(−θ) = cos θ). One can check that these functions are peri-
odic, with period L, in the same way as above; for instance,
! ! ! ! ! !
2πn 2πnx 2πnL 2πnx 2πnL 2πnx
sin (x + L) = sin cos + cos sin = sin ,
L L L L L L

because cos(2πn) = 1 and sin(2πn) = 0, when n is an integer. Some of the cosines are
illustrated in Fig. 1.3.
Since all these sines and cosines are periodic, any linear combination of them must
also be periodic. What I mean are combinations of the form
∞ ! !
a0 X n 2πnx 2πnx o
f (x) = + an cos + bn sin , (1.2)
2 n=1 L L

where an and bn are arbitrary coefficients. I put a 1/2 on a0 for convenience (the
reason will become clear below). Eq. (1.2) is what we call a Fourier series. It is an

4
��
��
��
��

�(�)

�(�)
��
-��
-��
-��
-��
��
� �

Figure 1.4: The two Fourier series in Eqs. (1.3) and (1.4).

infinite series of sines and cosines, with specific coefficients (an , bn ). Since each term is
periodic, the resulting series is guaranteed to be periodic for any choice of coefficients.
By appropriately choosing the coefficients, one can cook up all sorts of funny func-
tions. Here are two examples:
1 1
f (x) = sin(2πx) − sin(4πx) + sin(6πx), (1.3)
2 3
1 1
f (x) = sin(2πx) +
sin(4πx) + sin(6πx). (1.4)
2 3
The only difference is the minus sign in the second term. These two functions are
plotted in Fig. 1.4. As can be seen, by changing only a innocent minus sign, one gets
rather different functions. The actual form of the function can thus change significantly
with the coefficients.

Piecewise periodic functions

Periodic functions are usually built “by hand”, using the notion of piecewise pe-
riodicity. Imagine that f (x) is actually a force as a function of time, F(t). For in-
stance, suppose your nephew is playing in a swing and asks you to push him really
high (Fig. 1.5). You do that by applying a periodic force F(t). An example is shown in
the right plot of Fig. 1.5. One applies a linear force for a while and then stops. Then
we apply the same linear force and stop again. And so on. To build this function, we
only need to specify it within one period L. For instance, a piecewise periodic function
of period L = 1, could be specified as

 x 0 < x < 1/4


f (x) = 

(1.5)
0 1/4 < x < 1.



This is actually what I used to make the plot in Fig. 1.5 I don’t need to specify it over
the other intervals, since I’m constructing it to be periodic. For instance, suppose I want
to evaluate f (x) at x = −0.9. How do I do it? First we translate it back to the interval
used in (1.5): since f (x) has period L = 1, f (−0.9) = f (−0.9 + 1) = f (0.1) = 0.1.

1.2 Orthogonality of trigonometric functions

By playing with the (an , bn ) one may, therefore, cook up all sorts of periodic func-
tions. What is less trivial is that the converse is also true: any periodic function can be

5
Figure 1.5: Imagine the periodic function you have to implement to make your nephew swing
higher and higher. Credits: illegally stolen from the internet.

expanded in a Fourier series. This is actually a very strong statement. Even poorly
behaved functions, containing discontinuities and divergences can be expanded. We
will actually prove some theorems about this later on. But before doing so, let us first
establish a recipe for how to determine the (an , bn ) given a certain f (x). This is done
using the idea of orthogonality of the trigonometric functions, which is pretty cool.
Here is how it works.
We start by integrating both sides of Eq. (1.2) over an interval of length L. The
actual choice of interval does not matter. Common choices are from [0, L] or from
[−L/2, L/2]. We will mostly use the latter. Hence,
ZL/2 ZL/2 ∞ ZL/2 ! ZL/2 !
a0 X 2πnx 2πnx
f (x)dx = dx + an cos dx + bn sin dx
2 n=1
L L
−L/2 −L/2 −L/2 −L/2

However, we now have that

ZL/2 ! ! L/2
2πnx L 2πnx L h i
cos dx = sin = sin(πn) − sin(−πn) = 0, (1.6)
L 2πn L 2πn

−L/2
−L/2

since n is an integer. Similarly, one may show that

ZL/2 !
2πnx
sin dx = 0. (1.7)
L
−L/2

The only term that survives is therefore the one proportional to a0 , which leads us to
ZL/2
a0 L
f (x)dx = .
2
−L/2

This therefore provides us with a recipe for finding a0 :

ZL/2
2
a0 = f (x)dx. (1.8)
L
−L/2

6
Let me remind you, once again, that the choice of interval is immaterial. We could also
integrate from [0, L], for instance. In fact, I recommend you try repeating the above
calculations for [0, L] to see that it works.
Next we try to do something similar to find the other coefficients. In this case, it
is better if we first establish some integral identities for the trigonometric functions.
Consider first the integral

ZL/2 ! !
2πnx 2πmx
cos sin dx.
L L
−L/2

We could solve this using complex exponentials. Or, what is a bit easier, using a
trigonometric identity like
1 1
cos(x) sin(y) = sin(x + y) − sin(x − y),
2 2
which you can look up on Wikipedia. We then get

ZL/2 ZL/2 ZL/2

2π(n + m)x
! ! ! !
2πnx 2πmx 1 1 2π(n − m)x
cos sin dx. = sin dx − sin dx
L L 2 L 2 L
−L/2 −L/2 −L/2
! L/2 ! L/2
1 L 2π(n + m)x 1 L 2π(n − m)x
=− +

cos cos
2 2π(n + m) L 2 2π(n − m) L

−L/2 −L/2

= 0,

since n and m are both integers. In doing these integrals, one should always be careful
with special cases (#protip). For instance, if n = m, we would be dividing by zero in
the last term. The right way to do this is to set n = m before doing the integral. Luckily,
in this case it makes no difference and we get 0 anyway. Thus, to summarize,

ZL/2 ! !
2πnx 2πmx
cos sin dx = 0, ∀n, m ∈ N.
L L
−L/2

Next, let us play with

ZL/2 ! !
2πnx 2πmx
cos cos dx.
L L
−L/2

In this case we use the Wikipedia formula

1 1
cos(x) cos(y) = cos(x + y) + cos(x − y),
2 2

7
We start by assuming n , m. We then get
ZL/2 ZL/2 ZL/2
2π(n + m)x
! ! ! !
2πnx 2πmx 1 1 2π(n − m)x
cos cos dx. = cos dx + cos dx
L L 2 L 2 L
−L/2 −L/2 −L/2
! L/2 ! L/2
1 L 2π(n + m)x 1 L 2π(n − m)x
= sin + sin
2 2π(n + m) L 2 2π(n − m) L

−L/2 −L/2

= 0,

This is starting to feel boring. We always get zero. But wait! This is only for n , m. If
n = m, we obtain instead
ZL/2 ! ! ZL/2 ! ZL/2
2πnx 2πnx 1 4πnx 1 L
cos cos dx = cos + dx = .
L L 2 L 2 2
−L/2 −L/2 −L/2

The first integral is zero [Eq. (1.6)], but the second is not. We therefore conclude that
ZL/2

0 n , m,
! !
2πnx 2πmx


dx = 

cos cos
L L L/2 n = m


−L/2

An identical formula also holds for a sin-sin integral. I will leave that for you as a
(fun!) exercise.

Orthogonality of trigonometric functions

We can summarize our findings in a neat way by introducing the Kronecker

delta, δn,m , defined as 
0 n , m


δn,m = 

(1.9)
1 n = m.



All calculations in this section are then summarized by:

ZL/2 ! !
2πnx 2πmx
cos sin dx = 0, (1.10)
L L
−L/2

ZL/2 ! !
2 2πnx 2πmx
cos cos dx = δn,m , (1.11)
L L L
−L/2

ZL/2 ! !
2 2πnx 2πmx
sin sin dx = δn,m , (1.12)
L L L
−L/2

8
which hold for any n, m ∈ Z. The only exception is Eq. (1.11) when n = m = 0,
in which case we get 2 instead of 1.

These results are called the orthogonality relations of the trigonometric functions,
over the interval [0, L]. This term is used because this is actually quite similar to the
orthogonality of vectors. The sines and cosines appearing in (1.10)-(1.12) form a basis
for the space of periodic functions of period L. And instead of the usual scalar product,
the inner product is defined here as the integral over [0, L] times 2/L.

1.3 The Fourier recipe

Armed with these orthogonality relations, let us now return to the problem of de-
termining the coefficients (an , bn ) in Eq. (1.2). We already found a0 in Eq. (1.8). To
determine the an , we multiply both sides of (1.2) by cos(2πmx/L) (for some integer m)
and integrate from 0 to L. We then get

ZL/2 ! ZL/2 ! ∞ ZL/2 ! !

2πmx a0 2πmx X 2πnx 2πmx
f (x) cos dx= cos dx + an cos cos dx
L 2 L n=1
L L
−L/2 −L/2 −L/2

ZL/2 ! !
2πmx 2πmx
+bn sin cos dx.
L L
−L/2

Almost everything vanishes. The only term that survives is the cos-cos integral. Using
Eq. (1.11) we then find

ZL/2 ! ∞
2πmx X L
f (x) cos dx = an δn,m .
L n=1
2
−L/2

At this point, it is worth getting used to the algebra of Kronecker deltas. The only
term in the sum over n that will not vanish will be when n = m. The Kronecker delta
therefore acts as a selector; it picks, out of all terms in the sum, the one with n = m:

ZL/2 !
2πmx L
f (x) cos dx = am .
L 2
−L/2

We therefore now have a recipe for determining the coefficients an . Of course, now that
there is no sum anymore, it does not matter what we call m or n. We can thus write

ZL/2 !
2 2πnx
an = f (x) cos dx
L L
−L/2

9
Compare this with Eq. (1.8) for a0 . We see that formula is now the same as this one,
for an , with n = 0. This is why I put the 1/2 in front of a0 in the Fourier series
definition (1.2). The expression for bn is derived in exact the same way. I will leave it
for you as a fun fun fun exercise.

The Fourier recipe

To summarize, we have learned that given a periodic function f (x), with

period L, we can always Fourier expand it, in the form
∞ ! !
a0 X n 2πnx 2πnx o
f (x) = + an cos + bn sin , (1.13)
2 n=1 L L

where the coefficients an , bn are given by

ZL/2 !
2 2πnx
an = f (x) cos dx, (1.14)
L L
−L/2

ZL/2 !
2 2πnx
bn = f (x) sin dx. (1.15)
L L
−L/2

The interval of integration here is arbitrary, as long as it has length L.

The above results are written for functions of period L. Some books use period 2L,
because this makes the integration interval go from [−L, L]. You can adjust for this by
simply replacing L → 2L everywhere. Other books like to use the interval L = 2π,
which is very convenient because it greatly simplifies the formulas:
∞
a0 X n o
f (x) = + an cos(nx) + bn sin(nx) , (1.16)
2 n=1

Zπ
1
an = f (x) cos (nx) dx, (1.17)
π
−π

Zπ
1
bn = f (x) sin (nx) dx. (1.18)
π
−π

In most problems below, I will assume L = 2π. Eqs. (1.16)-(1.18) will therefore ac-
tually be used way more often that the general formulas (1.13)-(1.15). But it is useful
to have (1.13)-(1.15) in a big gray box, as this makes them convenient to adjust for
different L.
It turns out, however, that as far as the Fourier coefficients are concerned, the choice
of interval does not matter at all. We can see this as follows. Suppose f (x) has period

10
L and Fourier series (1.13). Now define a new function g(x) = f (γx), where γ is a
constant. If f has period L, then g will have period L̃ = L/γ, since

g(x + L/γ) = f (γx + L) = f (γx) = g(x).

Thus, g(x) behaves just like f (x), but in a stretched interval. To find the Fourier series
of g(x), we use the series (1.13) for f (x), with x → γx:
∞ ! !
a0 X n 2πnγx 2πnγx o
g(x) = + an cos + bn sin . (1.19)
2 n=1 L L

In the trigonometric functions, we see that g’s period, L̃ = L/γ naturally appears. If we
now stare at this expression for a second, we will eventually conclude that it is already
in the form of a Fourier series, with the same Fourier coefficients (an , bn ) of f (x). We
thus reach a very important conclusion: stretching the interval does not affect the
Fourier coefficients; the change is only in the sines and cosines in Eq. (1.13).
Going back to generic intervals of length L, define
2πn
kn = , n = 0, 1, 2, 3, . . . . (1.20)
L
Eq. (1.13) is then written more compactly as
∞
a0 X n o
f (x) = + an cos(kn x) + bn sin(kn x) , (1.21)
2 n=1

These kn will turn out to have a neat physical interpretation when we talk about the
propagation of waves in electromagnetism and quantum mechanics. We already know
that, for each term in (1.21), the value n dictates how fast the oscillation is (Fig. 1.3).
Thus, n, or kn , is somehow related to the “speed” of that oscillation mode. We call
it momentum. This is not the usual momentum in classical mechanics. You will
only really appreciate why we use this word when we discuss quantum mechanical
applications (soon!). But for now I just wanted to introduce this jargon, so you could
get used to it. So, if you want to sound cool to your friends, from now on refer to n as
the momenta.

1.4 Examples
Square wave
Consider the piecewise periodic function

1

 0 < x ≤ π,
f (x) = 

(1.22)
−1 −π < x ≤ 0



This is usually called the sign function. But we are thinking about it in a piecewise
periodic fashion, as in Fig. 1.6, so the sign function becomes the square wave.

11
��
��

�(�)
��
-��
-��
-� π -π � π �π
�

Figure 1.6: The square wave function (1.22), with period L = 2π.

To compute its Fourier series, we apply Eqs. (1.17)-(1.18):

Zπ
1
an = f (x) cos(nx)
π
−π

Z0 Zπ
1 1
=− cos(nx) + cos(nx)
π π
−π 0
0 π
1 1
= − sin(nx) +

sin(nx)
nπ −π nπ 0

= 0.

The coefficients an in this case are all identically zero. This is something we could
have actually figured out without any calculations. Our function f is odd ( f (−x) =
− f (x)) while the cosines are even. The integrand, f (x) cos(nx), is thus odd and the
integration is symmetric with respect to zero. What this means is that any positive area
that contribute to the integral in the interval [0, π], will have a corresponding negative
area in the interval [−π, 0]. Hence the integral must vanish. In the Fourier business, it
is really worth to pay attention to this even/oddness thingy. It can save you tons of time
(#protip). We will talk more about it below.

12
As for bn , we have from Eq. (1.15):
Zπ
1
bn = f (x) sin(nx)
π
−π

Z0 Zπ
1 1
=− sin(nx) + sin(nx)
π π
−π 0
0 π
1 1
=

cos(nx) − cos(nx)
nπ −π
nπ 0

1 1
=

1 − cos(−nπ) − cos(nπ) − 1
nπ nπ
2
= 1 − cos(nπ) .

nπ
Since n is an integer, cos(nπ) will be either 1 or −1, depending on whether n is even or
odd: 
−1 n odd


cos(nπ) = (−1) = 
n

 (1.23)
1
 n even.
Thus, we see that the coefficients bn will only be non-zero when n is odd:

4
 nπ n odd


bn =  .

 (1.24)
0 n even


The Fourier series for the sign function (1.22) therefore reads
X 4
f (x) = sin(nx) (1.25)
n=1,3,5,...
nπ
∞
X 4
= sin(nx)
k=1
(2k − 1)π

4n 1 1 o
= sin(x) + sin(3x) + sin(5x) + . . . .
π 3 5
In going from the 1st to the 2nd line, I defined a new variable n = 2k−1. The restriction
n = 1, 3, 5, . . . then implies k = 1, 2, 3, 4, . . .. This is therefore just an alternative way
of writing a sum over odd terms.
In Fig. 1.7 I compare the series with the actual function, by truncating the sum at
different values nmax . That is,
nmax
X 4
f (x) = sin(nx).
n=1,3,5,...
nπ

As one would hope, the larger the value of nmax , the better the series approximates f (x).
In the limit nmax → ∞ one recovers f (x) exactly.

13
�� = � �� = � �� = �
��
��
�(�)

�(�)

�(�)
��
-�� -�� -��
-�� -�� -��
-� π -π � π �π -� π -π � π �π -� π -π � π �π
� � �
�� = �� = �� = ��
��
��
�(�)

�(�)

�(�)
��
-�� -�� -��
-�� -�� -��
-� π -π � π �π -� π -π � π �π -� π -π � π �π
� � �

Figure 1.7: Fourier series for the square wave function, Eq. (1.22) (shown in black). In each plot
the red curves corresponds to the Fourier series, Eq. (1.25), truncated at different
levels nmax .

�
�
�(�)

�
�
�
-� π -π � π �π
�

Figure 1.8: The function f (x) = x2 , piecewise periodic in [−π, π] (Eq. (1.26)).

The function x2
Next consider the function

f (x) = x2 , x ∈ [−π, π], (1.26)

again defined to be piecewise periodic. Let us start with an , using Eq. (1.17):
Zπ
1
an = cos(nx)x2 dx.
π
−π

Before we do any calculations, stop and think about even vs. odd. The integrand in
this case is even (if you take x → −x it does not change). So these coefficients will be
non-zero. The same cannot be said about bn , though:
Zπ
1
bn = sin(nx)x2 dx = 0.
π
−π

14
�� = � �� = � �� = � �� = �

� � � �

� � � �
�(�)

�(�)

�(�)
� � � �

� � � �

� � � �
-� π -π � π �π -� π -π � π �π -� π -π � π �π -� π -π � π �π
� � � �

Figure 1.9: Fourier series for f (x) = x2 (black), for different truncation sizes nmax .

We have therefore cut the problem in half. Back to the an , we can compute the integrals
using integration by parts twice (with u = x2 and dv = cos(nx)dx). This yields
( π Zπ )
1 1 2 sin(nx)
an = x sin(nx) −
2xdx
π n −π n
−π

The first term vanishes. We now integrate the remaining term by parts, again:
Zπ
x cos(nx) π
( )
2 cos(nx)
an = − − +
nπ n −π n
−π

Now it is the last term that vanishes, because this is the L = 2π version of Eq. (1.6). Or
you can also see it from (1.11) by setting m = 0. Thus, all that survives for an is
4 4
an = 2
cos(nπ) = 2 (−1)n . (1.27)
n n
Notice that this result is bugged for n = 0. This is again one of those problems where
particular cases must be handled before carrying out the integration. Setting n = 0 in
Eq. (1.17) yields
Zπ
1 2π2
a0 = x2 dx = .
π 3
−π

Thus, to summarize, the Fourier series of x2 reads

∞
2π2 X 4
f (x) = + 2
(−1)n cos(nx). (1.28)
6 n=1
n

The first term is a0 /2, which is why there is a 2π2 /6 instead of 2π2 /3. The results are
shown in Fig. 1.9. As can be seen, in this case the convergence is insanely fast. Much
faster than the sign function in Fig. 1.7, actually.

Fourier series for even and odd functions

As we saw in the previous examples, very often either one of the an or bn vanish.
This is related to whether the function f (x) has a definite parity; that is, whether it is

15
��
��
��

�(�)

�(�)
��
-�� -��
-�� -��
-�� -��
-� -� -� � � � � -� -� -� � � � �
� �

Figure 1.10: Even (left) and odd (right) functions. For even functions, the area for x < 0 equals
that for x > 0. Conversely, for odd functions, they exactly cancel each other.

even or odd. Even functions satisfy f (−x) = f (x) and odd functions satisfy f (−x) =
− f (x) (Fig. 1.10). Let us formalize the ideas used in the previous section a bit better.
First notice the following:
• even × even → even.
• odd × odd → even.
• odd × even → odd.
If you ever get confused about this, think about the functions x (odd) and x2 (even).
Thus, x · x = x2 is the product of two odd functions, which is even. Similarly, x · x2 = x3
is the product of an odd with an even function, which is odd. It is also useful to
remember that cosines are even and sines are odd.
When dealing with this even/odd issue, it is convenient to write the integration
intervals in (1.14), (1.15) to be from −L/2 to L/2. The reason is that:
ZL/2 ZL/2
f (x) even: f (x)dx = 2 f (x)dx, (1.29)
−L/2 0

ZL/2
f (x) odd: f (x)dx = 0, (1.30)
−L/2

For even functions, the area below the curve for x < 0 and x > 0 are equal. For odd
functions, they have opposite signs and thus cancel each other (Fig. 1.10).
What enters Eqs. (1.14), (1.15), however, are the products of f (x) with cosines and
sines, which are respectively even and odd. Thus, if f (x) is even, f (x) sin(kn x) will be
odd and bn will all vanish. And if f (x) is odd, f (x) cos(kn x) will be odd, so an will
vanish. Summarizing our results:

Even and odd functions

• f (x) even → Only an (cosine series).

• f (x) odd → Only bn (sine series).

16
�� = �� = �� = �� = �� = ��
��
��
��
�(�)

�(�)

�(�)
��
��
��
-�� -�� -�� -�� -��

Figure 1.11: A zoom at one of the discontinuities of the square function (Fig. 1.7), to illustrate
the Gibbs phenomenon.

This is actually very easy to remember. The type of the series should reflect
the parity of f (x). So if f is even, it must have a cosine series because cosines
are even. And similarly if f is odd.

Gibbs phenomenon
Compare Figs. 1.7 and 1.9. The latter is continuous and we see that the convergence
of the series is extremely rapid. The square wave in Fig. 1.7, on the other hand, has
discontinuities and, as a consequence, there is a lot more wiggling around. In fact, we
see that even for nmax = 99, in Fig. 1.7, there is still some wiggling present at the point
where the jumps occur. Far away from the jumps, all is good. But at the jumps it is not.
This is called the Gibbs phenomenon, or Gibbs overshoot. As it turns out, it does
not vanish as nmax → ∞. This is illustrated for the square wave in Fig. 1.11, where
I plot essentially a zoom of Fig. 1.7 around one of the discontinuities, but for much
higher values of nmax . As you can see, the oscillations tend to squeeze around the
discontinuities, making the series increasingly better outside the jumps. Exactly at the
jump, however, the general height of the overshoot does not go down. In fact, it can be
shown that even as nmax → ∞, the overshoot remains at about 9% of the value of the
discontinuity. However, as nmax → ∞, it gets infinitely squished and thus occur in just
a single point.

The function f (x) = x (sawtooth wave)

Let us also do another example, just for practice. Consider the function f (x) =
x, piecewise periodic in [−π, π]. This is called the sawtooth wave (Fig. 1.12). The
coefficients an in Eq. (1.17) read
Zπ
1
an = dx x cos(nx) = 0,
π
−π

since the integrand is odd. The bn , on the other hand, read

Zπ Zπ
 π 
1 1 x cos(nx) cos(nx)  2 cos(nπ)
 
bn = dx x sin(nx) =  + =− ,
 
− dx

π π n n  n





−π
−π −π

17
�� = � �� = �� = ��
� � �
� � �
� � �
�(�)

�(�)

�(�)
� � �
-� -� -�
-� -� -�
-� -� -�
-� π -π � π �π -� π -π � π �π -� π -π � π �π
� � �

Figure 1.12: Fourier series for f (x) = x, Eq. (1.31).

or bn = 2(−1)n+1 /n. Thus, the Fourier series for f (x) = x reads

∞
X (−1)n+1
f (x) = 2 sin(nx) (1.31)
n=1
n

Structurally, this looks very similar to the square wave in Eq. (1.25). The crucial dif-
ference is that here the sum is over all values of n, while in (1.25) it is restricted to
n = 1, 3, 5, . . .. The results are shown in Fig. 1.12. The Gibbs phenomenon can again
be clearly observed: the Fourier series tend to converge outside the jumps and wig-
gle around at the jump locations. The Gibbs phenomenon is, in fact, universal, being
directly associated with jump discontinuities of the functions.

Fourier series for scales and shifts

Here are some additional #protips for dealing with Fourier series. Suppose
the series of a certain function f (x) is given by Eq. (1.13), and let g(x) = α f (x)+
β. Then
∞
αa0 X n o
g(x) = β + + αan cos(2πnx/L) + αbn sin(2πnx/L) . (1.32)
2 n=1

The coefficients of g will thus be ã0 = αa0 + 2β, ãn = αan and b̃n = αbn .
This trick can save you some time, since it allows you to focus on the “core
part” of functions. For instance, suppose you are asked to compute the Fourier
coefficients of π2 x + 42. Forget about π2 . Forget about 42. Focus only on
the Fourier series of the function x. Similarly, suppose you are interested in the
Fourier series of the square wave, but defined to range from 0 to 1. The function
f (x) in Eq. (1.22) ranges from [−1, 1], so we must study g(x) = ( f (x) + 1)/2.
The corresponding Fourier series is then readily computed from Eq. (1.25).
We can also do scales and shifts in the argument x. Suppose g(x) =
f (λx + ω). The Fourier coefficients in this case don’t change. All you need to
do is change the argument in the Fourier series:
∞ ( ! !)
a0 X 2πn 2πn
g(x) = + an cos (λx + ω) + bn sin (λx + ω) . (1.33)
2 n=1 L L

18
You can then find the actual Fourier coefficients of g(x), by expanding the sines
and cosines using trigonometric identities. This will give ãn and b̃n as linear
combinations of an and bn (problem set 1).

1.5 Complex form

In many aspects, sines and cosines are not the most convenient way of expressing
Fourier series. It is better to use complex exponentials. Recall that

eiθ = cos θ + i sin θ, (1.34)

or,
eiθ + e−iθ eiθ − e−iθ
cos θ = , sin θ = . (1.35)
2 2i
The basic idea, therefore, is to construct a Fourier series based instead on ei2πnx/L ,
which will be periodic, with period L. As we will see, this yields a very convenient and
elegant formulation. It also allows us to naturally include complex functions f (x) (so
far, we have been assuming that f (x) was real). To this end, notice first that e−i2πnx/L is
not the same as ei2πnx/L , so we will need to consider both n > 0 and n < 0 (that is, we
need n ∈ Z, the set of all integers). The basic idea, therefore, is to expand f (x) as
X
f (x) = cn ei2πnx/L , (1.36)
n∈Z

where cn are a new set of coefficients, which are usually complex. The notation n ∈ Z
means n = 0, ±1, ±2, ±3, . . ..
To find the cn , we need to establish orthogonality relations for the complex expo-
nentials, very much like Eqs. (1.10)-(1.12). In this case it turns out it is even easier to
prove them. Since eiπn = e−iπn for any integer n, it follows that

ZL/2 L/2
L L
i2π(n−m)x/L
e dx = ei2π(n−m)x/L
= (eiπ(n−m) − e−iπ(n−m) ) = 0,
i2π(n − m) i2π(n − m)

−L/2
−L/2

But, as before, this only holds for n , m. Otherwise, one simply has

ZL/2
dx = L
−L/2

Thus, we arrive at the following orthogonality relation for complex exponentials:

ZL/2
1
ei2π(n−m)x/L dx = δn,m . (1.37)
L
−L/2

19
This result is pretty cool. If you open up these complex exponentials, you may check
that it contains the same amount of information as Eqs. (1.10)-(1.12), but in a much
more compact and elegant way.
With this, we can now find the coefficients cn in Eq. (1.36). We multiply both sides
by e−i2πmx/L (with a minus sign!) and integrate from 0 to L. We then get, using (1.37),

ZL/2 X ZL/2 X
−i2πmx/L
f (x)e dx = cn ei2π(n−m)x/L dx = cn Lδm,n = Lcm .
−L/2 n∈Z −L/2 n∈Z

Whence,
ZL/2
1
cn = f (x)e−i2πnx/L dx.
L
−L/2

Let us summarize this in a big box:

Complex Fourier series

A periodic function f (x) may be expanded in a complex Fourier series

X
f (x) = cn ei2πnx/L . (1.38)
n∈Z

The coefficients cn are found using the orthogonality of the complex exponen-
tials,
ZL/2
1
ei2π(n−m)x/L dx = δn,m , (1.39)
L
−L/2

which leads to
ZL/2
1
cn = f (x)e−i2πnx/L dx. (1.40)
L
−L/2

Notice that the exponential in cn has the opposite sign as the one in f (x).
As with real series, it is common to choose L = 2π, leading to
X
f (x) = cn einx . (1.41)
n∈Z

Zπ
1
cn = f (x)e−inx dx. (1.42)
2π
−π

I personally love these two equations: simple, elegant, powerful.

Next, let us try to connect the cn with our previous coefficients an , bn in Eq. (1.13).

20
This is very easy. We simply use Eq. (1.34) in Eq. (1.42), which leads to
ZL/2
1 h i
cn = f (x) cos(2πnx/L) − i sin(2πnx/L) dx.
L
−L/2

The first term is an /2 and the second is −ibn /2. Whence,

1
cn = (an − ibn ). (1.43)
2

This result also holds for n = 0 and n < 0. For n = 0 we get c0 = a0 /2, since b0 = 0.
For n < 0 the situation is a bit weird because we normally don’t define an and bn . But
from (1.14) it is clear that a−n = an and from (1.15), b−n = −bn . Thus, when n < 0,
cn = 21 (a−n + ib−n ).
The real and complex versions of Fourier’s series are entirely equivalent ways of
representing a periodic function. The set of functions
1, cos(x), sin(x), cos(2x), sin(2x), cos(3x), sin(3x), . . .
forms a basis for the space of functions with period 2π. This is the basis we use to
expand f (x) in the real series. In the complex case, we use instead
1, eix , e−ix , e2ix , e−2ix , e3ix , e−3ix , . . . ,
which corresponds to a different basis. But the two are equivalent because each pair
(cos nx, sin nx) is a linear combination of (einx , e−inx ).
Eqs. (1.41) and (1.42) hold for complex functions f (x). That is, f : R → C. What
happens if f (x) is real? Taking the complex conjugate of Eq. (1.42) we find that, since
f (x)∗ = f (x),
ZL/2
1
c∗n = f (x)ei2πnx/L dx.
L
−L/2
But looking at this for a second, we realize that it is nothing but c−n . Thus, we conclude
that if f : R → R, one must have

c∗n = c−n . (1.44)

This is a special symmetry of the cn . For general complex functions, cn and c−n are
generally unrelated complex numbers. But if f (x) is real, the negative c’s are related to
the conjugate of the positive c’s.
An even stronger symmetry can be uncovered when f (x) is real and has definite
parity. If f (x) is even, we must have a cosine series (an , 0 and bn = 0), so that
Eq. (1.43) shows cn must be real. Conversely, if f (x) is odd, cn will be purely imaginary.
Summarizing:
• f (x) even: cn = an /2 is real.
• f (x) odd: cn = −ibn /2 is purely imaginary.

21
Example: Triangle wave
Consider the function f (x) = |x|, but defined to be piecewise periodic in [−π, π].
The resulting function is called the Triangle wave. This function is even, so that its
real series should have only cosines (an = 0). As a consequence, from Eq. (1.43) we
should have cn = an /2; i.e., real. One could, of course, simply compute the coefficients
an from (1.14), and then reconstruct cn = an /2. But, for practice, let us compute it
directly from Eq. (1.42):
Zπ
1
cn = |x|e−inx dx
2π
−π

Zπ Z0 !
1
= xe −inx
dx + (−x)e −inx
dx .
2π
0 −π

Integration by parts yields

Zx1 x Zx1
xe−inx 1 1
xe −inx
dx = − e−inx dx
−in x0 −in
x0 x0
! x1
xe−inx e−inx
= .

−
−in (−in)2 x
0

Recalling that e inπ

=e −inπ
= (−1) , we then find
n

π(−1)n (−1)n − 1 π(−1)n 1 − (−1)n

( ! !)
1
cn = − − −
2π −in −n2 −in −n2
1 h i
= (−1) n
− 1 .
πn2
If n = 0 we get instead
Zπ
1 π
c0 = |x|dx = .
2π 2
−π

Thus, we can summarize our result as

π



 2 n=0


cn =  n = ±1, ±3, ±5, . . .

 2
− πn2 (1.45)



n = ±2, ±4, ±6, . . .

0


The complex Fourier series for the triangle wave therefore reads

π 2 ix e3ix + e−3ix
!
f (x) = − e + e−ix + + ... . (1.46)
2 π 9

22
We could rewrite everything in terms of cosines. But I wanted to leave it like this
to emphasize that, even though we are summing complex exponentials, the result is
nonetheless real since each term is accompanied by its complex conjugate.

1.6 Parseval’s identity

Let f and g be two periodic functions, with period 2π, and complex Fourier coeffi-
cients cn and dn . In problem set 1 you will show that
Zπ
1 X
f ∗ (x)g(x) = c∗n dn . (1.47)
2π n∈Z
−π

This property is pretty cool. It is exactly like the inner product you learn for vectors.
When we expand a function in a Fourier series, like f (x) = n cn einx , it is exactly
P
like expanding a vector in a basis; that is to say, the einx form a basis for the space
of periodic functions. Eq. (1.47) is thus nothing but the inner product of the two
functions, which we write as the sum of the products of the components in the basis.
For the space of functions, the inner product is obtained as an integral (together with
the factor of 1/2π). In particular, if g = f Eq. (1.47) yields
Zπ
1 X
| f (x)|2 = |cn |2 , (1.48)
2π n∈Z
−π

which is known as Parseval’s identity. This is just like expressing the absolute value
of a vector in terms of the squares of the coefficients in a basis. It is pretty cool that
this also holds for this (infinite dimensional) space of functions.
Parseval’s identity will be useful when we discuss applications of Fourier series.
The integral on the left-hand side of (1.48) is usually associated with some type of
energy input. The identity then decomposes this energy into the contributions from
each Fourier mode.
To give an example, consider the Fourier coefficients of f (x) = |x| in Eq. (1.45).
The left-hand side of (1.48) reads
Zπ
1 π2
|x|2 = .
2π 3
−π

The right-hand side, on the other hand, reads

X π2 4 X 1
|cn |2 = + 2
n∈Z
4 π n=±1,±3,...
n4

Since the summand is even in n, we can write the sum only over positive values and
double the result:
X π2 8 X 1
|cn |2 = + 2 .
n∈Z
4 π n=1,3,5... n4

23
Using Eq. (1.48) and isolating the remaining sum over n, we then find
X 1 π4
4
= , (1.49)
n=1,3,5...
n 96

which is a funny looking sum over n, with an even funnier looking result.
We can also write Parseval’s identity in terms of an and bn , using (1.43). We split
the sum in 3 parts: n > 0, n = 0 and n < 0. For n > 0 we simply replace cn = 12 (an −ibn ).
For n = 0 we use c0 = a0 /2. And for n < 0 we use cn = 21 (a−n + ib−n ). Thus
Zπ ∞ −1
1 |a0 |2 X 1 2 X 1
| f (x)|2 = + |an | + |bn |2 + |a−n |2 + |b−n |2 .
2π 4 n=1
4 n=−∞
4
−π

Changing n → −n allows us to combine the last two terms, leading to

Zπ ∞
1 |a0 |2 1 X 2
| f (x)|2 = + |an | + |bn |2 , (1.50)
2π 4 2 n=1
−π

which is the sine-cosine analog of (1.48). All results above were derived for period 2π.
But they hold for arbitrary L. All we need to do is replace 2π → L.

Parseval’s identity

Summarizing, Parseval’s identity in either real or complex form, reads

ZL/2 ∞
1 X |a0 |2 1 X 2
| f (x)|2 dx = |cn |2 = + |an | + |bn |2 . (1.51)
L n∈Z
4 2 n=1
−L/2

As an example of (1.50), consider the Fourier series of f (x) = x3 − π2 x, periodic in

[−π, π]. It reads bn = 12(−1)n /n3 , and an = 0, since the function is odd (I leave this as
an exercise). Using Eq. (1.50) we then find that
Zπ ∞
1 X 122
(x3 − π2 x)2 dx =
2π n=1
n6
−π

A boring calculation reveals that the integral on the left reads 8π6 /105. Thus, the sum
over n on the right reads
∞
X 1 π6
= . (1.52)
n=1
n6 945
This is a special case of the Riemann Zeta function
∞
X 1
ζ(s) = s
(1.53)
n=1
n
This function usually cannot be represented in terms of ordinary constants. But for
some special values, like 2, 4, 6, . . ., it can.

24
�
�

��
��
�

�(�)

��
�

�
��
�
�
-�� -��
�

Figure 1.13: The normalized boxcar function (1.54).

1.7 Dirac delta and Heaviside functions

Consider the functions shown in Fig. 1.13. This is called a boxcar (or box func-
tion). You should think of it as a window. It is zero everywhere, except over a small
interval. In this case, I chose the window to be centered around zero. But you can trans-
late it around as well. Moreover, we often choose the boxcar to have a fixed height of
1 (see problem set 1). Here, however, I chose it to be normalized. More specifically,
the functions in Fig. 1.13 read

1/a −a/2 6 x 6 a/2,


f (x) = 

 (1.54)
0
 otherwise,

where a > 0 is a parameter. I chose it in this way so that it has unit area:
Z∞
f (x)dx = 1. (1.55)
−∞

The integral doesn’t have to be from −∞ to ∞. It can be over any interval which
encompass [−a/2, a/2]. Written in this way, we therefore see that, as a gets smaller,
the function becomes thinner, but also taller. And it does so in a very precise way, such
that the area below the curve is always 1.
The reason why the boxcar can be viewed as a window is the following. Let g(x)
be another arbitrary function and consider the integral

Z∞ Za/2
1
f (x)g(x)dx = g(x)dx. (1.56)
a
−∞ −a/2

It picks out only the parts of g(x) that are within [−a/2, a/2]. In fact, the result can be
interpreted as the average of g(x) over this interval. It is a window because it selects
only one part of the function for us to see.
We can now think about what happens in the limit a → 0. This limit is somewhat
funny. The function becomes infinitely thin, but also infinitely tall. And the area below

25
the curve continues to be 1. We call this the Dirac delta function:
δ(x) = lim f (x) (1.57)
a→0

This is a very special function. In fact, it is a somewhat pathological, because it is zero

everywhere, except that x = 0, where it is infinite; and all this in such a way that
Z∞
δ(x)dx = 1. (1.58)
−∞

To be more precise, δ is is called a distribution, which is a generalization of the concept

to function to accept some weird stuff. We will talk more about distributions at the end
of the course.
It turns out, however, that one can always think about δ(x) as a boxcar with a really
small a. This is the idea behind Mathematician Mark Kac’s famous quote: Be wise,
discretize (a maxim that applies to many problems in physics and mathematics). For
instance, let’s see what happens to (1.56) when we take a → 0. In this case, the
integration interval becomes infinitesimal. We know from calculus that in this case the
integral simply becomes the function, evaluated at the midpoint, times the integration
interval. That is,
Za/2
g(x)dx ' g(0)a
−a/2

Whence, taking the limit a → 0 of Eq. (1.56) leads to

Z∞
δ(x)g(x)dx = g(0).
−∞

The δ function takes the idea of a window to the limit: it throws away everything about
g(x), and only picks the value of g at x = 0. One can also place the window at a
different point y. I will leave it for you to convince yourself that this is accomplished
by δ(x − y). That is,

Z∞
δ(x − y)g(x)dx = g(y). (1.59)
−∞

Now let’s compute the Fourier series of the Dirac delta function. We assume that
the function is piecewise periodic, with period 2π. In this case, we don’t get one δ
function, but a bunch of δ’s. This is called a Dirac comb (Fig. 1.14). We focus on
the complex series (1.41). The coefficients cn are given by Eq. (1.42). But because
of (1.59), it then follows that
Zπ
1 1
cn = δ(x)e−inx dx =
2π 2π
−π

26
��
��
��

�(�)
��
��
��
��
-�� -�� -� � � ��
�

Figure 1.14: The Dirac comb, obtained by making the Dirac delta function piecewise periodic,
with period 2π.

We therefore reach a quite curious result: the Fourier series for the Dirac comb is
independent of n. One simply has
1 X inx
δ(x) = e , (1.60)
2π n∈Z

for period 2π. This result will turn out to be very very useful later on. You may
naturally object that this series does not converge (we will talk more about convergence
of Fourier series in Sec. 1.8). This is one of those pathologies of distributions. But this
can be regularized by computing the Fourier series of the boxcar instead. You will do
this in problem set 1. What you will find is that the series always converges, for any
finite a. Since the δ-function is just a boxcar, with a tiny a, we can convince ourselves
that the series converges.
Another very important function for this course will be the Heaviside θ function
defined as 



1 x>0
θ(x) = 1/2 x = 0

(1.61)



x < 0,

0


The value at x = 0 seldom matters. The Heaviside function is therefore a step, being
zero for x < 0, and 1 for x > 0. We can also shift the step. For instance θ(x − b) will
jump from 0 to 1 when x = b. The Heaviside function can also be used to construct
the boxcar (1.54) in a more compact way. We need two steps, one up, one down. A
general boxcar between the interval [a, b] will therefore read
Boxcara,b (x) = θ(x − a)θ(b − x) (1.62)
so that the boxcar used in Eq. (1.54) would be
1
f (x) = θ(x − a/2)θ(a/2 − x).
a
Funny enough, because of the special structure of θ(x), Eq. (1.62) is not the only way
to write the boxcar. A completely equivalent expression would be
Boxcara,b (x) = θ(x − a) − θ(x − b).

27
The first θ ensures that the function is only non-zero for x > a. But then we want to cut
it out eventually at b, so we subtract another θ.
What is the derivative of the Heaviside function? Well, for x < 0 and x > 0 the
function is flat, so θ0 (x) = 0. The tricky part is at x = 0. At this point the function jumps
instantaneously from 0 to 1. The slope of the curve should therefore be infinitely large.
It is easier to think that θ(x) changes quickly, but continuously, at x = 0, as represented
pictorially in Fig. 1.15(a). The typical slope will then look like that in Fig. 1.15(b). This
seem to imply that θ0 (x) will be proportional to the Dirac delta function, θ0 (x) = αδ(x),
for some constant α. To determine α we recall that
Z Z
1 1h i 1 1
1= δ(x) = θ0 (x)dx = θ() − θ(−) = (1 − 0) = .
α α α α
− −

Thus, α = 1. We therefore conclude that

dθ
= δ(x) (1.63)
dx
The Heaviside and delta functions are thus intimately related.
The reason why the above results are useful is because, in this Fourier business,
we often have to deal with piecewise continuous functions. That is, functions which
are continuous, except at a finite number of points, where it jumps from one value to
another. To be a bit more precise, a function is continuous on an interval [a, b] if, for
every point in the interval, the limits

f (x − 0) := lim− f (x + ), f (x + 0) := lim+ f (x + ),

→0 →0

are the same. Here → 0− means tending to zero from the left ( < 0) and 0+ means
tending to zero from the right ( > 0). Conversely, a function is piecewise continuous if
it is continuous in most of the interval [a, b], except at a finite number of points where
the jumps f (x +0) and f (x −0) are finite. So, in a nutshell, piecewise continuous means
the function never diverges and the number of discontinuities is not infinite.
Let us suppose that f (x) is continuous, except at one point x0 , where it jumps from
f (x − 0) to f (x + 0). We can represent this using the Heaviside function. All we need
is to do is multiply f (x) by a convenient “1”. In this case, 1 = θ(x − x0 ) + θ(x0 − x).
We then get
f (x) = f (x)θ(x0 − x) + f (x)θ(x − x0 ), (1.64)
For x < x0 and x > x0 the function f (x) is now continuous and silky smooth. Differen-
tiating with respect to x and using (1.63), we then get

f 0 (x) = f 0 (x)θ(x0 − x) + f 0 (x)θ(x − x0 ) − f (x)δ(x − x0 ) + f (x)δ(x − x0 ),

where I also used the fact that δ(x0 − x) = δ(x − x0 ). The last two terms are only non-
zero when x → x0 . But in each term, it will tend to zero from a different side. Thus, in
the third term we can replace f (x) with f (x0 − 0) and in the fourth we can replace f (x)
by f (x0 + 0). We then arrive at

f 0 (x) = f 0 (x)θ(x0 − x) + f 0 (x)θ(x − x0 ) + ∆ f (x0 )δ(x − x0 ). (1.65)

28
��
��
��

θ(�)

θ(�)
��
��
��
-�� -�� -�� -��
� �

Figure 1.15: Left: the Heaviside function, imagined as a sharp jump at x = 0. Right: θ0 (x),
which is zero almost everywhere, except at x = 0, where it is very large.

The first two terms are the usual derivatives of f (x), outside the jump point, while the
last represents the infinite contribution from the jump.
The generalization of this to an arbitrary number of jumps is straightforward. We
assume the jumps occur at J points {x j }. Then

J+1
X J
X
f 0 (x) = f 0 (x)θ(x − x j−1 )θ(x j − x) + ∆ f (x j )δ(x − x j ). (1.66)
j=1 j=1

In addition to the J points x j , I also defined x0 , which is the beginning of the interval
we are interested in studying, and x J+1 , which is the end. They could be, for instance,
±π if the function is periodic or ±∞ if it is not. We will use this formula in the next
section to prove the convergence of Fourier series.

1.8 Convergence of Fourier series

A series such as ∞n=1 an is said to converge if the successive sums An =
P Pn
m=1 am
themselves converge. That is, if the sequences A1 , A2 , A3 , . . . become closer and closer
to a fixed number α, as n increases. An alternative way of stating this is to say that,
given an arbitrary number > 0, we can always find an integer N (which depends on
) such that
An − α < , n > N.
The rationale of this kind of formula is as follows. Think of some small number ; say
0.1. Then there exist a sufficiently large integer N (which depends on ) such that the
partial sums An are always -close to α, for any n > N.
Here we will be interested in series of functions, such as fn (x) = nm=1 um (x). As
P
before, this series is said to converge to a function f (x) if, given an arbitrarily small ,
there exist N such that
fn (x) − f (x) < , n > N.
For series of functions, however, N may in general depend on both and x. That is,
for each point x, the value of N may be different. When N is independent of x (and
depends only on ), we say the series converges uniformly.

29
There is one test of uniform convergence, which is the bread and butter of the
business, and is worth knowing. It is called the Weirstrass-M test. Let nm=1 um (x)
P
denote a series of functions, defined in some interval I ∈ R. Suppose that there exist
constants Mn , such that
|un (x)| < Mn , ∀x ∈ I, (1.67)
P∞ P∞
and such that n=1 Mn converges. Then n=1 un (x) converges uniformly. The idea,
therefore, is to majorize the series of functions by a numerical series (no x involved),
which itself converges.

Properties of uniformly convergent series

The nice thing about uniformly convergent series is that they comply with
our intuition. In particular, the operations we would naively want to do, are all
perfectly allowed: Let f (x) = ∞
P
n=1 un (x) denote a uniformly convergent series.
Then
• If the un (x) are all continuous, so will f (x).

• If the un (x) are integrable, so will f (x). Moreover, we can permute sums
and integrals:
Z X ∞ ! X ∞ Z
un (x) dx = un (x)dx.
n=1 n=1

• If the un (x) are differentiable, then so will f (x). Moreover, we can differ-
entiate term by term:
∞ ! X ∞
d X dun
un (x) = .
dx n=1 n=1
dx

You see? Uniformly convergent series of functions are awesome! This is the
kind of series you want to work with. Of course, you will not be surprised
to hear that Fourier series of piecewise continuous functions converge uni-
formly. This is Fourier’s theorem, which we are now going to prove.

We are going to go through the proof of Fourier’s theorem for the case of contin-
uous functions. The case of general, piecewise continuous functions, can be found in
any of the textbooks in the bibliography (e.g. Djairo, chapter 3). Although not the most
general, what I like about the proof below is that it will also give us insight into how
fast the Fourier series converges. The main ingredients we will need are the derivatives
of f (x). We begin by noticing that, even if f (x) is continuous, f 0 (x) may be piecewise
continuous. An example is shown in Fig. 1.16. A function which is continuous and
whose k-th derivatives are also continuous is said to belong to the class C k . The func-
tion in Fig. 1.16 would thus be of class C 0 , since none of its derivatives are continuous.
When the function is of class C k , f (k+1) will no longer be continuous, but piecewise
continuous. Consequently, f (k+2) will involve δ-functions, like Eq. (1.66). And f (k+3)

30
��
��
��

��/��
�(�)
-��
��
-��
��
-��
�� -��
��
� �

Figure 1.16: A function which is continuous, but whose derivative is discontinuous

will involve derivatives of δ functions, at which point things start to become messy.
We begin by considering the Fourier coefficients cn [Eq. (1.42)] of a function of
class C k . Integrating by parts, using u = f (x) and dv = e−inx dx, we get
π Zπ
e−inx
( )
1 1
cn = f (x) + f (x)e dx .
0 −inx
2π −in −π in
−π

The first term vanishes because e−inπ = einπ and f (−π) = f (π) (since f (x) is periodic).
We are thus left with
Zπ
1
cn = f 0 (x)e−inx dx.
2πin
−π

Notice what happened. Because of the cross term vanishes, integration by parts simply
amounts to replacing f (x) → f 0 (x) and dividing by in. The function f 0 (x) is still
periodic, so we can now repeat the procedure. In fact, we can repeat it k + 1 times,
leading to
Zπ
1
cn = f (k+1) (x)e−inx dx. (1.68)
2π(in)k+1
−π

We can do this k+1 times because f (k+1) is piecewise continuous, but finite; the integral
is thus still bounded.
We keep going, however, and integrate by parts once again. Since f (k+1) is piece-
wise continuous, it will have jumps at a certain set of points {x j }. Using (1.66) we then
get
xj
J Z J
1 X 1 X
cn = f (k+2)
(x)e −inx
dx + ∆ f (k+1) (x j )e−inx j (1.69)
2π(in)k+2 j=0 2π(in)k+2 j=1
−x j−i

Using the triangle inequality, |a + b| 6 |a| + |b|, we can bound this as

x
j
J Z J
1 X
(k+2) −inx + 1
X
|cn | 6 f (x)e dx |∆ f (k+1) (x j )|.
2πnk+2 j=0 2πnk+2
j=1
−x j−i

31
Since we broke up f (k+2) into the intervals [x j−1 , x j ], the integrals above are all bounded.
And the same is true of the second term.
Thus, we see that
M
|cn | 6 k+2 , (1.70)
n
where M is a finite number. And, what is most important, M is actually independent of
n; the only n-dependence comes from the factor of 1/nk+2 . You may protest and argue
that there is still an n in the e−inx term of the integral. But that can be eliminated further
using the Cauchy-Schwarz inequality
x 2
Z j Zx j Zx j Zx j
(k+2) −inx (k+2) 2 −inx 2

f (x)e dx 6 |f (x)| dx |e | dx 6 | f (k+2) (x)|2 dx
−x j−i
−x j−i −x j−i −x j−i

Thus, we get in the end a bound of the form (1.70), with M being a constant that
depends only on f (x).
To finish of with style, we now use the Weirstrass-M test. We consider the Fourier
series X
f (x) = cn e−inx
n∈Z
−inx
and notice that the summands cn e are bounded by
M
|cn e−inx | 6 := Mn .
nk+2
P
According to the test, all that is left is to ask whether the sum n Mn converges. Well,
it is known that n=1 n converges for all α > 1. Thus, n Mn will converge provided
P∞ −α P
k > 0. That is, the series is guaranteed to converge provided the function is at least of
class C 0 ; i.e., continuous. Et voilà!
The convergence of the series ∞ −α
is faster for large α. If α = 100, for instance,
P
n=1 n
the series should converge super fast. The above arguments therefore show that the
convergence of the Fourier series is associated with how smooth it is. A function that
is in a class C k , with very large k, will be very smooth and will thus converge faster
than a function of class C 0 , which has kinks and stuff.

1.9 Integrals and derivatives of Fourier series

Let f (x) be a piecewise continuous function with Fourier series (1.41). What is
the Fourier series of f 0 (x)? We can use the fact that the series converges uniformly to
differentiate each term in the sum:
X
f 0 (x) = incn einx . (1.71)
n∈Z

The Fourier coefficients of f 0 (x) is thus incn . This can be used as a trick to obtain one
set of Fourier coefficients from another.

32
Similarly, we can also integrate term by term. The integral of a periodic function,
however, is not necessarily periodic. Thus, consider instead the function
Zx h i
F(x) = f (x0 ) − c0 dx0 (1.72)
0

By subtracting c0 = a0 /2, which is the average of the function over the interval, we
make F(x) periodic. I leave it for you to check that this is indeed true.
What are the Fourier coefficients Cn of F(x)? The Fourier series of f (x) − c0 is just
like that of f (x), except that the sum does not contain n = 0. Using this in (1.72) then
leads to
x
XZ X einx − 1
F(x) = cn einx dx = cn
n,0 n,0
in
0

We assume f (x) is real, so that c−n = c∗n . This result already has the form of a Fourier
series. The coefficients Cn , with n , 0, are simply the coefficients that multiply einx ;
that is, Cn = cn /in. The only special case is for n = 0. The coefficients C0 is the
constant part of F(x). It will therefore correspond to the entire sum in the last term
above: X cn
C0 = − .
n,0
in
Let’s summarize these results in a pretty big box:

Derivative and integral of Fourier series

Let f (x) be a periodic function with Fourier coefficients cn [Eq. (1.41)]. Then
• The Fourier coefficients of f 0 (x) are incn ;
R x
• The Fourier coefficients of F(x) = 0 f (x) − c0 dx are cn /in, for n , 0,

and i n,0 cn /n, for n = 0.
P

The intuition should thus be clear: differentiate and you get in. Integrate and
you get 1/in (except for the annoying term with n = 0).

The above results also resonate well with the calculations of the previous section,
on the speed of convergence of a Fourier series. Integration makes functions smoother
and therefore converge faster. This is evidenced by the fact that integration multiplies
the Fourier coefficients by 1/n. Derivatives, on the other hand, makes things more
irregular and the corresponding series converges more slowly.

Sine as an infinite product

Integration can also be used to obtain some nice, but counterintuitive, results. In
the problem set I asked you to compute the Fourier series of cos(αx), where α is an

33
arbitrary real number. As a side effect of that calculation, I also asked you to show that
∞
1 1 X 2x
cot(πx) − = . (1.73)
πx π n=1 x2 − n2

Let us assume x ∈ [0, 1]. Since the series converges uniformly, we can integrate each
side, term by term. One may readily check that
Z
1
cot(πx)dx = ln sin(πx),
π
Whence,
Zx h x
1 i 0 1 sin(πx0 ) 1 sin(πx)
cot(πx ) − 0 dx = ln
0
= ln ,
πx π x 0
0
π πx
0

where I also used the fact that the limit of lim x→0 sin(πx)/πx = 1.
On the other hand, to compute the integral of the right-hand side of (1.73), we
change variables to y = α2 − n2 , leading to
Zx x
2x0
dx 0
= ln(x 02
− n 2
) = ln(x − n ) − ln(−n ) = ln(1 − x /n ).
2 2 2 2 2
x02 − n2 0
0

Thus, combining both results we arrive at

∞
sin(πx) X
ln = ln(1 − x2 /n2 )
πx n=1

Finally, we get rid of the logarithm by exponentiating both sides. But to do that, we
must first change the sum on the right-hand side into a product using ln(a) + ln(b) =
ln(ab). That is, we rewrite this as
"Y∞ #
sin(πx)
ln = ln (1 − x /n ) .
2 2
πx n=1

Exponentiating on both sides finally yields the infinite product expansion of the sine
function
∞
sin(πx) Y
= (1 − x2 /n2 ) (1.74)
πx n=1

= 1 − x2 1 − x2 /4 1 − x2 /9 1 − x2 /16 . . .

This is pretty fun. We are used to expressing sines as a Taylor series expansion. But it
can also be expressed as a product.
One can also derive a similar formula for the cosine. Of course, cos(πx) = sin(πx +
π/2), so we could simply shift x in the product above. But the resulting formula is not

34
so fun. A nicer form can be found starting with the identity sin(2θ) = 2 sin θ cos θ and
writing
sin(2πx)
cos(πx) =
2 sin πx
Using Eq. (1.74) for both the numerator and denominator we then find
∞
(1 − (2x)2 /n2 )
Q
n=1
cos(πx) = ∞
(1.75)
x2 /n2 )
Q
(1 −
n=1

We are almost done. All we need to do is realize that the two products are actually
related, so some terms will cancel out. Opening up the first few terms in the product
on the numerator, we find
∞
Y 4x2 x2 4x2 x2
(1 − (2x)2 /n2 ) = 1 − 4x2 1 − x2 1 − 1− 1− 1− ....
n=1
9 4 25 9

Hey look! We see in every other term, exactly the product (1 − x2 /n2 ). The other terms
(those that have 4x2 in them) are like 1 − 4x2 /(2n − 1)2 , for n = 1, 2, 3, . . .. Thus, we
conclude that
Y∞ ∞ "
Y #
(1 − 4x2 /n2 ) = (1 − x2 /n2 )(1 − 4x2 /(2n − 1)2 ) .
n=1 n=1

The first product will cancel out the denominator in Eq. (1.75), leaving us with
∞
x2
Y !
cos(πx) = 1− .
n=1
(n − 1/2)2

which is pretty neat. We now summarize the results and conclude the chapter.

Sine and cosines as infinite products

The sin and cosine can be represented as an infinite product, as

∞
sin(πx) Y
= (1 − x2 /n2 ), (1.76)
πx n=1
∞
x2
Y !
cos(πx) = 1− . (1.77)
n=1
(n − 1/2)2

35
Chapter 2

Ordinary Differential Equations

2.1 Overview
Most laws of physics are expressed in the form of differential equations. A differen-
tial equation is any type of equation involving derivatives. For instance, let N(t) denote
the number of atoms present in a radioactive sample. At each step ∆t, the number of
atoms that decay must be proportional to the number of atoms present. So the rate of
change dN/dt must depend on N. Something like
dN
= −λN, (2.1)
dt
where λ (units of 1/second) is the decay rate; λ∆t represents the probability that an atom
decays in a small time interval ∆t. This is called an ordinary differential equation
(ODE) because it involves only one variable (here t). This is to be contrasted with
partial differential equations (PDEs), which contain partial derivatives over several
variables. In this chapter we focus only on ODEs. PDEs will be studied next.
Another example of a differential equation is Newton’s second law, d2 r/dt2 = F .
It is a differential equation because it relates the second derivative of position with the
force acting on the particle. In this chapter we will be particularly interested in the
damped harmonic oscillator, which is described by the differential equation
d2 y dy
m 2
= −α − ky, (2.2)
dt dt
where y is the position, m is the mass, k is the spring constant and α is the damping
coefficient. This equation mixes y with its first and second derivatives. It is therefore
called a 2nd order ODE. Eq. (2.1), on the other hand, is a 1st order ODE, because it
contains only first derivatives.
We will also play with electric circuits. A simple series circuit containing a resis-
tor R, a capacitor C, an inductance L and a voltage source V(t), is described by the
differential equation
d2 Q dQ Q
L 2 +R + = V(t), (2.3)
dt dt C

36
where Q(t) is the charge in the circuit. This equation is in general of 2nd order. But
sometimes it becomes of 1st order if an element is missing. For instance, if there is no
inductance present (the “RC circuit”) we get
dQ Q
R + = V(t). (2.4)
dt C
Conversely, if the capacitance is infinite (the “LR circuit”), we get LQ̈ + RQ̇ = V(t).
This equation still looks 2nd order. But we can instead work with the current I(t) = Q̇,
in which case we get
dI
L + RI = V(t). (2.5)
dt
It is important to note how (2.4) and (2.5) are mathematically equivalent, even though
they describe different physical systems.
Another example of ODE we are going to study is Schrödinger’s equation, describ-
ing a one-dimensional quantum particle subject to a position dependent potential V(x).
The ODE reads
~2 d2 ψ
− + V(x)ψ = Eψ, (2.6)
2m dx2
where ~ is Planck’s constant, m is the mass and E is the energy. Unlike the previous
examples, here the independent variable is x, not t. In this chapter we will go back
and forth between x and t. The dependent variable, on the other hand, is ψ(x), which
is called the wavefunction. I will try to explain a bit better what it represents once we
play with some examples of (2.6).
Eqs. (2.1)-(2.6) are all examples of linear ODEs, because they depend linearly on
the dependent variable, N, or Q or I or ψ. Here are some examples of non-linear ODEs:

y0 + xy2 = 1, ẏ = cot(y),
dy
y = 1, ẏ2 = y.
dx
Here I used y to denote the dependent variable. This is the notation we are going
to use whenever we talk about generic equations. In the formulas above I also used
different notations for derivatives: y0 or ẏ or dy/dx or dy/dt. I know it may seem a bit
messy, but all of these notations are used in physics, so we better get used to them. The
examples above represent non-linear ODEs, because they involve non-linear functions
of the dependent variable y and/or its derivatives. Non-linear equations are dramatically
more complicated to deal with than linear ones. But, lucky for us, the vast majority of
physical laws are linear.
Another ODE which will be quite important later on is Legendre’s equation,
which appear often in quantum mechanics and electromagnetism. It reads

(1 − x2 )y00 − 2xy0 + n(n + 1)y = 0, (2.7)

where n are integers. This equation is linear because y, y0 , y00 only appears linearly.
However, if we contrast it with (2.1) or (2.2), it certainly seem more complicated. The
reason is because those examples had constant coefficients, whereas (2.7) does not.

37
For instance, the coefficient multiplying y00 in (2.7) is (1 − x2 ), which is a function of
the independent variable x. Conversely, in (2.2) the coefficient is just a constant, m.
ODEs with constant coefficients are much easier to deal with. But variable coefficients
are manageable as well, and we will work through some examples.
The entire picture of ODEs becomes a bit cleaner if we work with differential op-
erators. We call an object such as
n
X dj
L= u j (x) , (2.8)
j=0
dx j

a n-th order, linear ordinary differential operator. Here is what each word means:
• This is an operator because it acts on functions to produce new functions.
• It is linear because derivatives are linear, so

L a1 y1 (x) + a2 y2 (x) = a1 L(y1 ) + a2 L(y2 ),

for any two functions y1 , y2 and constants a1 , a2 .

• It is n-th order because the highest derivative is of order n.
• It is ordinary because all derivatives are with respect to the same variable x.

• The coefficients of L are the functions u j (x). In the particular case when the u j
are independent of x, we say this is a differential operator with constant coeffi-
cients.
The differential operator associated with the damped harmonic oscillator (2.2) reads

d2 d
L=m + α + k, (2.9)
dt2 dt
Eq. (2.2) is then written as
Ly = 0. (2.10)
When the right-hand side is zero, we call the equation homogeneous. Conversely, an
inhomogeneous ODE has the form

Ly = f (x), (2.11)

or f (t), if you are using t as independent variable. Inhomogeneous equations are usually
associated to an external force. For instance, if we apply an external force F(t) to the
damped oscillator (2.2), we get instead

mÿ + αẏ + ky = F(t), (2.12)

which is of the form (2.11). The same is also true for the RLC circuit ODE (2.3). In
this case the inhomogeneity is the external voltage V(t).

38
2.2 Separable equations
Before we enter into the more formal aspects on how to characterize the general
solutions of an ODE, let us practice with the simplest example possible. Consider a
differential equation of the form
dy
= f (x).
dx
This equation is called separable because we can put all y’s on the left and all x’s on
the right,
dy = f (x)dx.
Integrating both sides we then find
Z
y(x) = f (x)dx + c, (2.13)

where c is a constant. We have just solved the differential equation. Curiously, you see
that whenever we integrate a function, we are actually solving a differential equation.
We could also have written the result as a definite integral. That is, we integrate dy =
f (x)dx from x0 to x, leading to
Zx
y(x) = y(x0 ) + f (x0 )dx0 . (2.14)
x0

Since I am integrating up to x, I use x0 as the integration variable, to avoid ambiguities.

This is always a very good practice, which I really recommend (this was not necessary
in (2.13) because the integral was indefinite). Eq. (2.14) is entirely equivalent to (2.13).
Now there is no integration constant because we already include y(x0 ).
But is (2.13) the most general solution? Could we maybe add something to y and
still obtain a solution? These are all extremely important questions, which we will
address in the next section. For now, however, we just go cowboy-style and push
forward, without thinking about any of these issues.

Radioactive decay: Consider the decay equation (2.1). We write it as

dN
= −λdt.
N
Integrating both sides then yields
Z Z
dN
= −λ dt + c → ln N = −λt + c.
N
We can also write this as
N(t) = Ce−λt , (2.15)
where C = e is a new constant. The value of C is fixed by specifying an initial
c

condition. For instance, if at t = 0 we had N(0) = N0 then C = N0 and thus the solution
is finally
N(t) = N0 e−λt ,

39
which represents an exponential decay, from N0 at t = 0, toward 0 as t → ∞. It is
important to notice, however, that we did not have to specify N(t) exactly at time t = 0.
We could have specified N(t) at time t = 42. The general solution is still given by
Eq. (2.15).

Example: Consider the ODE xy0 = y. We write it as

dy dx
= → ln(y) = ln(x) + c.
y x
The general solution is thus y = Cx.

A weird example: Consider the ODE

y0 sin x = y ln y.

This looks complicated, but we can simply write it as

dy dx
= .
y ln y sin x
The solution is now just two integrals away. Of course, this does not mean it is easy
to do the integrals. Sometimes it is not even possible. In this case it turns out this is
possible; using our brains (or Mathematica) we find
Z Z
dy dx
= ln(ln y), = ln tan(x/2) .

y ln y sin x
Thus, the solution is

ln(ln(y)) = ln(tan(x/2)) + c → ln(y) = C tan(x/2) → y = eC tan(x/2) .

Please take notice of where the constant is. It is not true to say that the solution is
y = C̃etan(x/2) since this would be the same as y = eln(C)+tan(x/2) . Where we put the
constant is very important.

2.3 General solution of linear ODEs

We now discuss the structure of the general solution to linear n-th order ODEs,
described by a differential operator of the form (2.8):
n
X dj
L= u j (x) ,
j=0
dx j

We first consider the homogeneous equation L(y) = 0. A solution of this ODE is any
function y(x) satisfying Ly = 0. Since L is a linear operator, if y1 and y2 are solutions,
then any linear combination is also a solution:

Lyi = 0 → L a1 y1 + a2 y2 = 0.

40
This is very important since it allows us to build general solutions as linear combina-
tions of specific solutions. A set of solutions y1 , . . . , yk is said to be linearly indepen-
dent if, for all x, the only set of numbers satisfying
c1 y1 + . . . ck yk = 0,
is the trivial set c1 = . . . ck = 0. As we will prove below, homogeneous ODEs of order
n have n linearly independent solutions.

Linear homogeneous ODEs

An n-th order linear homogeneous ODE of the form Ly = 0, with L given

by Eq. (2.8), has at most n linearly independent solutions y1 , . . . , yn . The most
general solution is then of the form
n
X
y(x) = c j y j (x), (2.16)
j=1

where c j are coefficients (that can be adjusted, for instance, from the initial
conditions).

Example: Consider the homogeneous ODE y00 + 5y0 + 4y = 0 (associated to the

d2
differential operator L = dx 2 + 5 dx + 4). You may verify that e
d −x
and e−4x are both
solutions of this equation (we will learn how to actually derive these solutions later
on). Moreover, these functions are linearly independent because you cannot write one
as a constant times the other. Thus, according to the claim above, we are guaranteed
that the most general solution will be of the form
y(x) = c1 e−x + c2 e−4x ,
for constants c1 and c2 .
Before we prove (2.16), let us quickly talk about the inhomogeneous equation,
Ly(x) = f (x). In this case the sum of two solutions is in general not a solution. How-
ever, if yi1 and yi2 happen to be two solutions (where the index i stands for inhomoge-
neous), their difference will be a solution of the corresponding homogeneous equation.
That is,
Lyi1 = f, Lyi2 = f → L yi1 − yi2 = 0.
Thus, yi1 − yi2 must be of the form (2.16), meaning yi1 and yi2 must be related by
n
X
yi1 (x) = yi2 (x) + c j y j (x). (2.17)
j=1

This must be true for any two solutions of Ly(x) = f (x). So suppose you found one
particular solution yi2 . This result then guarantees that any other solution can always
be written as the particular solution yi2 , plus a linear combination of the solution to the
homogeneous equations.

41
Linear inhomogeneous ODEs

Let y p (x) denote any particular solution of the inhomogeneous equation Ly(x) =
f (x). Then the most general solution will be of the form
n
X
y(x) = y p (x) + c j y j (x), (2.18)
j=1

where c j are coefficients and y j are the solutions of the homogeneous equation
Ly j = 0.

This result is extremely powerful. And it is really cool how it follows from such a
simple reasoning. Notice also how, in order to solve Ly = f , one must first know the n
linearly independent solutions of Ly = 0.

Example: consider the ODE y00 + 5y0 + 4y = 2. We already saw that e−x and e−4x are
solutions of the homogeneous equation. So now we only need one particular solution
of the inhomogeneous equation. A very simple choice is y p = 1/2. Thus, the most
general solution will have the form
1
y(x) = + c1 e−x + c2 e−4x .
2
Notice that I called this “a particular solution” instead of “the particular solution”. The
reason is because any particular solution works. For instance, y = 21 + e−x is also a
particular solution, so we could have equally well have written the general solution as
1
y(x) = + e−x + c1 e−x + c2 e−4x .
2
But, as can you see, this is just an unnecessary complication: since the ci are constants
anyway, we can just absorb the 2nd term in the 3rd and call it a new constant c̃1 .
Next, consider the ODE y00 + 5y0 + 4y = 2x. This has the same differential operator,
but the inhomogeneous term is different. A particular solution, which one may verify, is
y p = (4x − 5)/8 (once again, we will learn how to derive these in due time, I promise!).
Thus, the most general solution is
4x − 5
y(x) = + c1 e−x + c2 e−4x .
8

Number of linearly independent solutions

Let us now prove Eq. (2.16). That is, we are going to prove that an n-th order linear
homogeneous ODE has at most n linearly independent solutions. This is based on a
neat linear algebra argument. First, consider a system of m functions y j (x) (with m not

42
necessarily equal to n). We define the Wronskian matrix as the m × m matrix with
entries
 y1 y2 ... ym 
 
 
 y0
 1 y 0
2 . . . y 0 
m 

Y(x) =  .  .

. . . (2.19)
 .. .. .. .. 
 
 (m−1) (m−1) 
y1 y2 . . . ym
(m−1) 

It turns out this matrix can be used to test if the functions y j are linearly independent
or not. The reason is that linear independence implies there exist a set {c j } such that
c1 y1 + . . . + cm ym = 0. (2.20)
Differentiating this once yields
c1 y01 + . . . + cm y0m = 0,
and twice,
c1 y001 + . . . + cm y00m = 0,
and so on for any order of the derivative. To connect these equalities with the matrix
Y(x), we now define a vector c = (c1 , . . . , cm ). We can then compact all equalities into
a single matrix-vector multiplication
Yc = 0.
Try out an example to make sure this makes sense. For instance, Eq. (2.20) is the first
line of Y(x)c, and so on.
From linear algebra we know that Yc = 0 if and only if the matrix Y has zero deter-
minant, |Y| = 0. Thus, the functions are linearly dependent if |Y| = 0. The determinant
of the Wronskian matrix is often called simply the Wronskian (which sounds like the
name of a character in a Tarantino movie):
W(x) = |Y(x)| (2.21)
This provides a quick test to see if a set of functions are linearly dependent or not.

Linearly independent when the Wronskian is not zero

Given a set of functions y1 (x), . . . , ym (x), we define the Wronskian (or Wron-
ski determinant) as

...

y1 y2 ym
...

y01 y02 y0m
W(x) = . .. .. .. .
(2.22)
.. . . .

(m−1) (m−1)
y1 y2 . . . y(m−1)

m

If W , 0, except perhaps at isolated points, then the set y1 , . . . , ym are linearly

43
independent.

We use the Wronskian to show that a n-th order linear operator has n linearly inde-
pendent solutions. Let’s do it for n = 2. The generalization to n > 2 is straightforward.
We can always parametrize a 2nd order linear ODE in the form
y00 + P(x)y0 + Q(x)y = 0, (2.23)
where P(x) and Q(x) are generic functions of x. Suppose we found three solutions, y1 ,
y2 and y3 . Construct the Wronskian

y1 y2 y3

W(x) = y01 y02 y03 .

y001 y002 y003

Using (2.23), we can eliminate y00j and write

y1 y2 y3

W(x) = y01 y02 y03 .

−Py01 − Qy1 −Py02 − Qy2 −Py03 − Qy3
Now the third row is a linear combination of the first two rows. From linear algebra, the
determinant is known to be zero whenever this is the case. Whence W(x) = 0, meaning
there cannot be 3 linearly independent solutions. There also cannot be 4 solutions, for
the same argument. The largest number of linearly independent solutions we can build
is two because, in this case, the Wronskian would read

y1 y2
W(x) = ,
y01 y02
and there is no way we can use (2.23) to express one line as a linear combination of the
other. We have therefore shown that ODEs of order 2 have at most 2 linearly indepen-
dent solutions. I will leave it for you to convince yourself that the same argument also
holds for ODEs of order n.

2.4 Hello Green’s functions

Consider the inhomogeneous ODE Ly = f (x). The general solution is given
by (2.18) and we know that, once the homogeneous solutions are known, all that is
left for us is to do is compute one particular solution. Suppose, however, that f (x) is
actually the sum of two contributions, f (x) = f1 (x) + f2 (x). We are then interested in
finding y p such that
Ly p = f1 (x) + f2 (x).
But if we already know the particular solutions Ly p1 = f1 and Ly p2 = f2 , then a
particular solution to Ly p = f1 + f2 will simply be y p1 +y p2 . This is called the principle
of superposition.

44
Example: Consider the ODE y00 +5y0 +4y = 2x+2. As we saw in the previous section,
a particular solution for f1 = 2x is y p1 = (4x − 5)/8 and a particular solution for f2 = 2
is y p2 = 1/2. Thus a particular solution for f = 2x + 2 will be y p = (4x − 5)/8 + 1/2.
Next, consider the inhomogeneous ODE, but with f (x) being a Dirac delta:

Ly = δ(x − x0 ). (2.24)

You can imagine this as an external perturbation that acts punctually only on a specific
point x0 . A particular solution of this equation is called the Green’s function of L:

L xG(x, x0 ) = δ(x − x0 ). (2.25)

We write it as G(x, x0 ) because it depends on two parameters, the variable x, and the
position of the drive x0 . Note, however, that L is a differential operator acting only on
x, not x0 . This is why I put the subscript L x , just to be clear.
Green’s functions are not necessarily easy to find. But once we find them, they are
incredibly useful since they work as building blocks to study other types of inhomo-
geneities. This is associated to the window property of the Dirac δ. Consider a general
function f (x) and write it as
Z∞
f (x) = δ(x − x0 ) f (x0 )dx0 . (2.26)
−∞

An integral is a sum, so we can now apply the principle of superposition. A particular

solution will thus be Z ∞
y(x) = G(x, x0 ) f (x0 )dx0 . (2.27)
−∞
To check that this is indeed true, we simply apply L on both sides. The integral on the
right is over x0 , so L will go right through and will only stop when it finds G(x, x0 ) (it
also doesn’t care about f (x0 )). Thus, we find
Z ∞ Z ∞
Ly(x) = LG(x, x0 ) f (x0 )dx0 = δ(x − x0 ) f (x0 )dx0 = f (x). (2.28)
−∞ −∞

Green’s functions are therefore building blocks. The choices for f (x) that may appear
on Ly = f (x) are endless. But if we solve it for just a single drive (the δ-drive), then
we can build up the solution for any other drive. Pretty powerful, eh?
Green’s functions are a big business in physics and we will talk more about them
as the course progresses. Here I just wanted to introduce them to you, so you could say
hello.

2.5 Linear first order equations

An extremely common type of ODE are linear first-order equations, of the form

y0 + P(x)y = Q(x), (2.29)

45
where P(x) and Q(x) are arbitrary functions of x. The radioactive decay equation (2.1)
and the RC and LR circuits in Eqs. (2.4) and (2.5) are all of this form. But those
equations are actually simpler since they have constant coefficients, while (2.29) has
arbitrary coefficients.
From what we learned in Sec. 2.3, the general solution will have the form y = y p +
cyh , where y p is a particular solution of (2.29) and yh is the solution to the homogeneous
equation, y0 + Py = 0 (we only need one solution because the ODE is first order). The
homogeneous equation is easy to solve since it is separable (Sec. 2.2):
Z
dy
= −P(x)dx → ln y = − P(x)dx + const.
y
It is convenient to define Z
I(x) = P(x)dx. (2.30)

Then the homogeneous solution reads

yh (x) = ce−I(x) . (2.31)

Next we turn to the inhomogeneous equation (2.29). We will actually find not only
the particular solution, but the general one. We do this using the method of integrating
factors. The idea is as follows. From (2.30) we have that I 0 (x) = P(x). We can then
use this to rewrite the left-hand side of (2.29) as
d
y0 + Py = e−I (yeI ).
dx
Please take a second to check that this is indeed true. We call eI(x) an integrating factor
because it transformed the differential operator L = dxd
+ P into something like dx d
(. . .).
Eq. (2.29) can then be rewritten as
d d
e−I (yeI ) = Q → (yeI ) = QeI . (2.32)
dx dx
Now it is easy to integrate on both sides, leading to
Z
ye =
I
Q(x)eI(x) dx + c.

Moving the e−I(x) to the other side then finally yields our general solution.

Linear first order ODE

The general solution of y0 + P(x)y = Q(x) is

Z
y(x) = ce −I(x)
+e −I(x)
Q(x)eI(x) dx, (2.33)

where I(x) = P(x)dx and c is a constant. Please be careful not to put the e−I(x)
R

inside the integral. Alternatively, we can also integrate Eq. (2.32) as a definite

46
integral, which leads to
Zx
0
y(x) = y(x0 )e −I(x)+I(x0 )
+e −I(x)
Q(x0 )eI(x ) dx0 . (2.34)
x0

This is the same as (2.33); there is no constant c because we explicitly wrote it

in terms of the initial condition at x0 .

Example. Consider the ODE y0 + 2y = e−x . This is in the form R(2.29), with P = 2
and Q = e−x . The integrating factor (2.30) therefore reads I(x) = 2dx = 2x and so
Eq. (2.33) becomes
Z
y(x) = ce−2x + e−2x e−x e2x dx = ce−2x + e−2x e x

The general solution is thus y(x) = ce−2x + e−x .

Example. Consider the ODE x2 y0 + 3xy = 1. Dividing both sides by x2 reveals that
this has the form (2.29), with P(x) = 3/x and Q(x) = 1/x2 . The integrating factor (2.30)
is Z
3
I(x) = dx = 3 ln x.
x
Thus eI = x3 and e−I = 1/x3 . Eq. (2.33) then yields
Z 3
c 1 x c 1 x2
y= 3 + 3 2
dx = 3 + 3
x x x x x 2
Thus, the general solution is
c 1
3
y=
+ .
x 2x
A very common situation is when P(x) is independent of x. That is, when the ODE
has the form
ẏ + λy = Q(t), (2.35)
where λ is a constant. Here I changed from x to t because, as we will see below, most
applications involve time as the independent variable. The integrating factor in this
case is I(t) = λdt = λt. Thus, the general solution (2.33) simplifies to
R

Z
y(t) = ce + e
−λt −λt
Q(t)eλt dt, (2.36)

or, in terms of definite integrals,

Zt
Q(t0 )eλt dt0 .
0
y(t) = y(t0 )e−λ(t−t0 ) + e−λt (2.37)
t0

47
In most cases of interest, one has λ > 0. The reason is because, as we can see in the
first term, this ensures that the dynamics is stable. If λ < 0 the first term would grow
unboundedly with time and eventually explode (kabuum).
An immediate application of this result is to the LR circuit described by Eq. (2.5)
(or, similarly, to the RC circuit). Dividing by L on both sides we get that the current
I(t) will evolve according to
R V(t)
I˙ + I = .
L L
This has the same form as (2.35), with λ = R/L and Q(t) = V(t)/L. Whence, the
general solution will be

e−λt
Z
I(t) = ce−λt + V(t)eλt dt, (2.38)
L
with λ = R/L. You can view the RL circuit as a kind of black box, that processes the
input V(t) into an output I(t), according to this expression. This is a fun example to
work with, because we can just go to the lab and apply all sorts of weird electric signals
V(t). Eq. (2.38) then specifies how the circuit will respond. We will discuss this kind
of game further in the next section.

Green’s function
Referring to the constant P case in Eq. (2.35), let us study the Green’s function.
That is, we study the system response to a δ impulse at a specific time s:

ẏ + λy = δ(t − s). (2.39)

It is helpful to keep a physical image of what is happening. The δ-function plays the
role of a kick. It is an infinitely strong, but infinitely short kick. The picture, therefore,
is that the system was doing whatever for t < s; we then kick it at s and examine how
it evolves when t > s.
In this case, it is more convenient to use the definite-integral solution (2.37), which
becomes
Zt
δ(t0 − s)eλt dt0
0
y(t) = y(t0 )e −λ(t−t0 )
+e−λt
(2.40)
t0

To compute this integral we need to be a bit careful. Recall that

Zb

 f (s) if t ∈ [a, b],


δ(t − s) f (t)dt = 


0
a
 otherwise.

That is, the δ-function will only return something useful if the integration interval con-
tains the δ-peak at s.
It matters whether we specify the initial condition for t0 before or after the kick.
Physically, it is a bit weird to specify it after the kick (although mathematically, there
is nothing wrong with that). Thus, we usually assume that t0 < s. In this case, if t < s,

48
the integration interval [t0 , t] will not contain s and the integral in (2.40) will vanish
identically, leading to y(t) = y(t0 )e−λ(t−t0 ) . Conversely, for t > s the interval [t0 , t] will
contain the δ and we get instead y(t) = y(t0 )e−λ(t−t0 ) +e−λ(t−s) . Thus, the general solution,
assuming t0 < s is 
y(t0 )e

 −λ(t−t0 )
, t < s,
y(t) =  (2.41)

y(t )e−λ(t−t0 ) + e−λ(t−s) t > s.


0

We often don’t worry too much about y(t0 ). We just assume that before the kick the
system was standing still (y(t0 ) = 0). Alternatively, we also imagine that t0 = −∞;
that is, the initial condition happened long long ago. The first term would then vanish
because of the exponential. In any case, we usually focus on

0,

 t < s,
y(t) = 

e−λ(t−s) t > s.



We thus see that, because of the structure of the ODE, if at any point t0 < s the system
was in y(t0 ) = 0, then it must have been at zero for all t < s. We can neatly summarize
the above result using the Heaviside function.

Green’s function

A particular solution of ẏ + λy = δ(t − s), assuming that y(t) = 0 for t < s, is

G(t − s) = θ(t − s)e−λ(t−s) . (2.42)

This is the Green’s function associated to the differential operator L = (∂t + λ).
That is why I wrote G, instead of y. The Heaviside function clearly shows
that the kick happened at time s. Before that the system was standing still
and afterwards it relaxes exponentially. Usually, we write the Green’s function
in terms of two parameters, G(t, s), one being the independent variable and
the other the position of the kick. But in this case it turns out that the result
depends only on their difference, t − s. This happens whenever the coefficients
in the differential operator are constant. I recommend you have a look at the
Wikipedia page for Green’s functions. There is a nice table listing the Green’s
function’s associated to a bunch of differential operators.

Once we have the Green’s function, we can now use it as a building block to gener-
ate solutions of ẏ + λy = Q(t), for arbitrary Q(t). As we saw in Eq. (2.27), a particular
solution in this case will be
Z∞
y p (t) = G(t − s)Q(s)ds.
−∞

When we plug the solution (2.42), the Heaviside function will chop the integration

49
interval, from (−∞, ∞) to (−∞, t]:
Zt
y p (t) = e−λ(t−s) Q(s)ds. (2.43)
−∞

If we stare at this for a few seconds, we will start to see the logic behind it. This
result is actually very similar to the general solution (2.37). In fact, we can make them
equal if we consider the initial condition in (2.37) to be at t0 = −∞. Thus, in this
sense, we could have maybe even “read” what the Green’s function should be, directly
from (2.37). This is a bit frustrating: our Green’s function (2.42) definitely looks very
pretty, but once we plug it back to obtain (2.43), we are essentially back to (2.37). But
don’t let this turn you off: this only happened here because the ODE we are solving
is very simple, so that a solution like (2.37) can be written down explicitly. For most
other ODEs, finding the Green’s function is a significant effort, but which pays off big
time.

2.6 First order ODEs with time-periodic forcing

This is a very important section. It will bridge ODEs with the results on Fourier
series from chapter 1. And it is filled with nice physical results and ideas. We consider
again the first order ODE
ẏ + λy = f (t). (2.44)
But now we assume that f (t) is a periodic function, with period T ; or, what is equiva-
lent, with frequency ω = 2π/T . The general solution is still given by (2.36) or (2.37).
Physically speaking, it really helps to think in terms of the LR circuit (2.38). The co-
efficient λ, in that case, is R/L. It therefore describes a dissipation rate associated to
the resistor R. Moreover, the inhomogeneous term is represented by V(t)/L and thus
essentially represents the externally applied voltage.
Let us first assume that f (t) is a simple oscillatory function, f (t) = f0 eiωt . Using a
complex f (t) may seem a bit weird at first since voltages are real. We could have just
used f (t) = f0 cos(ωt) or f (t) = f0 sin(ωt): Simply plug that in Eq. (2.36) and we would
get an answer. But using complex exponentials has advantages. The reason is that the
differential operator associated to the ODE (2.44) is linear with real coefficients. Thus,
if yR and yI are two solutions, z = yR + iyI will also be a solution. We may then invert
the argument: using f (t) = f0 eiωt will give us a complex solution z(t), but whose real
part is exactly the solution for f (t) = f0 cos ωt, and whose imaginary part is a solution
for f (t) = f0 sin ωt.
Using f (t) = f0 eiωt in (2.36) yields
Z
z(t) = ce−λt + f0 e−λt e(λ+iω)t dt

e(λ+iωt)
!
= ce−λt + f0 e−λt .
λ + iω

50
The general solution of ż + λz = f0 eiωt is therefore
f0 iωt
z(t) = ce−λt + e . (2.45)
λ + iω
This result is now a complex number. To get the real or imaginary parts, recall that for
any complex number,
√
r = λ2 + ω2
λ + iω = reiφ ↔
φ = arctan(ω/λ),
and so
1 e−iφ
= .
λ + iω r
Whence, Eq. (2.45) may be written as
f0
z(t) = ce−λt + √ ei(ωt−φ) . (2.46)
λ + ω2
2

When z(t) is written in this way, it becomes absolutely trivial to take the real or imagi-
nary parts: we simply replace ei(...) with cosine or sine. Thus, for instance, the real part
is
f0
y(t) = ce−λt + √ cos(ωt − φ), (2.47)
λ + ω2
2

which is the general solution of ẏ + λy = f0 cos(ωt). The constant c is in principle dif-

ferent from that in Eq. (2.46), since one should be real and the other could be complex.
But we don’t care too much since they are just constants. An example of how Eq (2.47)
looks like is shown in Fig. 2.1 in green, with f (t) = f0 cos(ωt) shown in black, for com-
parison. As can be seen, initially there will be a transient where the system adjusts to
the periodic drive. This is related to the term ce−λt , which is quickly damped out. Thus,
after some time has elapsed, the system enters into a periodic steady-state, where it
simply follows the external force up and down.
I like the complex forms in Eqs. (2.45) or (2.46) very much because, with some
practice, you can read a lot of information from them, without doing any calculations.
Here are some highlights:
• The dependence on the initial conditions involves an exponential relaxation e−λt ,
while the response to the periodic forcing involves only pure oscillations, eiωt .
If we leave the forcing on for a long time, the first term will eventually vanish
(transient) and z(t) will become purely oscillatory (steady-state). We can also
implement this by fixing the initial condition to happen at t0 = −∞ (the distant
past).
• The system oscillates at the same frequency as the external pump:
f (t) = f0 eiωt → z(t) = const × eiωt .
The constant in front is proportional to f0 (so big drives tend to produce big
responses). But it is also modulated by a factor λ2 + ω2 , that depends on the
parameters of the model.

51
�� (�)
��
��

�(�)
��
-��
-��
� �π �π �π
ω�

Figure 2.1: (Green) Example solution of Eq. (2.47) as a function of ωt, with c = 1, f0 = 1 and
λ/ω = 0.3. (Black) The function cos(ωt), for comparison.

• The system oscillates with a phase lag φ. We say that it is always lagging behind
the drive (Fig. 2.1). This lag depends on the drive frequency ω. If we drive the
system very very slowly, ω ' 0 and thus φ ' 0. The faster the drive, the larger
is the lag. Here the word “faster” is used in comparison with λ, which is the
intrinsic time scale of the system.

Generic time-periodic drives: Fourier series

Let us now consider Eq. (2.44) with a generic time-periodic function f (t). We
expand it in Fourier series X
f (t) = cn einωt , (2.48)
n∈Z
where, to be a bit more compact, I am using einωt , instead of ei2πnt/T , like we did in the
last chapter. Plugging this in Eq. (2.36) we get
X Z
y(t) = Ce−λt + cn e−λt e(λ+inω)t dt.
n∈Z

Here I used capital C in the first term, to avoid confusion with the Fourier coefficients
cn . We can exchange the order of sums and integrals because Fourier series converge
uniformly (Sec. 1.9). Carrying out the integrals we then find
X cn
y(t) = Ce−λt + einωt . (2.49)
n∈Z
λ + inω
This is the principle of superposition in its clearest form: the solution is just a sum of
the solutions for the different driving frequencies einωt . You may also wonder why I
used y(t) here, instead of z(t), since the exponentials are complex. The reason is that
even though the exponentials may be complex, the function f (t) may very well be real.
This is encoded in the Fourier coefficients, through the fact that c−n = c∗n . In other
words, if f (t) is real, the solution (2.49) will also be real, because in this case c−n = c∗n ,
so that we get, for instance,
!∗
c1 eiωt c−1 e−iωt c1 eiωt c1 e−iωt
+ = + ,
λ + iω λ − iω λ + iω λ + iω

52
λ/ω = �� λ/ω = �� λ/ω = �� λ/ω = �
��
(�) (�) (�) (�)
��
��
��
�(�)

�(�)

�(�)
��
-�� -��
-�� -��
-�� -��
-�� -�� -�� -��
� �π �π � �π �π � �π �π � �π �π
ω� ω� ω� ω�

Figure 2.2: Steady-state response (2.50) for a square-wave input and different values of λ/ω.

which is real, since this is in the form z + z∗ .

The first term in (2.49) is not periodic. This makes sense since it describes the
transient relaxation from some initial condition. But after a long time has passed this
transient will vanish and we will be left only with the last term, which is now periodic.
In fact, the last term is exactly in the form of a Fourier series, but with coefficients
X cn
y(t) = dn einωt , dn = . (2.50)
n∈Z
λ + inω

In the steady-state the system will thus respond to all harmonics einωt of the input drive
f (t). Moreover, each response will be weighted by a factor cn /(λ2 + n2 ω2 ) and will also
lag behind by an angle arctan(nω/λ). The overall response will therefore be somewhat
complicated, with the system trying to follow all einωt the best it can, but always lagging
behind (poor guy!).
To illustrate the behavior of Eq. (2.50), we consider the case of a square-wave. The
Fourier coefficients were computed in Sec. 1.4, Eq. (1.24), and read1

− nπ , n odd

 2i
cn = 

 (2.51)
0,
 n even

The corresponding steady-state y(t), computed using Eq. (2.50), is shown in Fig. 2.2
for different values of λ/ω. As can be seen, changing λ leads to dramatic changes.
Large λ/ω is the same as saying that the drive is very slow (ω small compared to λ).
In this case (figure (d)) we see that y(t) has a tendency to follow f (t) more closely. In
fact, if ω = 0 we simply get dn = cn /λ and the two coincide, up to a scaling factor.
Conversely, if ω is large (λ/ω small) we are in the fast drive regime, where the response
is now significantly different from the input.
There is a cool interpretation to the solution in Eq. (2.50). Recall that in Sec. 1.9 we
discussed how differentiating a function changed cn to incn , while integration changed
cn to cn /(in). There we were using period 2π. If we use arbitrary periods, we would get
inωcn and cn /(inω) instead. What we are doing now is not a simple integration. But it
is still the integration of an ODE. Since y(t) is the solution of Ly(t) = f (t), we can also
picture that y(t) = L−1 f (t): that is, y(t) is obtained by applying the inverse of L into the
The result we obtained in Eq. (1.24) was for bn . But recall that cn =
1 1
2 (an − ibn ) and, in our case,
an = 0.)

53
input f (t). And what we just learned is that this operation takes a Fourier coefficient cn
to cn /(λ + inω):
L−1 cn
cn −−−−−→ . (2.52)
λ + inω
Isn’t this cool? I mean, if we had only the differential operator L = dtd , the inverse
would be the integral and we would get only cn /inω. But since we have L = dtd + λ, we
get cn /(inω + λ) instead. This type of reasoning is usually called input-output theory
and is very important in engineering and physics. The input is f (t) (the drive), which
goes through a black box L−1 to produce the output (the system response) z(t). This
black box does not mix harmonics, taking einωt into einωt . But it processes each one
with a different weight, taking cn to cn /(λ + inω).

Energy
Consider specifically the RL circuit (2.5), where y(t) = I(t) is the current, λ = R/L
and f (t) = V(t)/L is essentially the voltage. The energy stored in the inductor is
1 2
E= LI . (2.53)
2
The energy is thus seen to be a quadratic form in the output variables. Similarly, in the
RC circuit y(t) = Q(t) and the energy stored in the capacitor is

Q2
E= . (2.54)
2C
Again, E is quadratic in y. In fact, this is very common. For most systems described by
ODEs of the form (2.44), the energy can be written as a quadratic form in the output,
1
E(t) = κy(t)2 , (2.55)
2
where κ is some constant to get the correct units.
Let us focus on the steady-state (long-time) regime of (2.50). The energy in this
case will oscillate periodically in time. Given any function g(t), periodic with period
T = 2π/ω, we define its time average as

ZT
1
ḡ = g(t)dt, (2.56)
T
0

which is nothing but the Fourier coefficient a0 [Eq. (1.8)]. The average energy over a
period is thus
ZT ZT
1 κ1
Ē = E(t)dt = y(t)2 dt. (2.57)
T 2T
0 0

54
But since y(t) is given by the Fourier series (2.50), we can directly use Parseval’s iden-
tity, Sec 1.6, Eq. (1.48). In our notation, this is translated as
ZT
1 X
y(t)2 dt = |dn |2 (2.58)
T n∈Z
0

Whence, we conclude that the energy associated to the steady-state solution (2.50) is
κ X |cn |2
Ē = . (2.59)
2 n∈Z λ2 + n2 ω2

This is one of the main practical uses of Parseval’s identity: it separates the energy
into specific contributions from each harmonic. We sometimes call this a spectral
decomposition.
Next, going back to E(t) in Eq. (2.55) and differentiating with respect to time, we
get
dE dy
= κy .
dt dt
Plugging the original ODE (2.44) yields
dE
= −κλy(t)2 + κy(t) f (t). (2.60)
dt
These two terms have a clear physical interpretation. In the RL circuit, for instance
(κ = L, y = I, λ = R/L and f = V/L), this becomes
dE
= −RI(t)2 + V(t)I(t). (2.61)
dt
You may remember from Physics III that the first term is the power that is dissipated
in the resistor, whereas the second term is the power delivered by the voltage source.
These two terms combine to yield the net power dE/dt in the circuit.
We can compute the integrated power in the circuit over a full period. That is, the
average of dE/dt. But if we are already in the steady-state, this will give zero because
E(t) is periodic:
ZT
1 dE 1h i
dt = E(T ) − E(0) = 0.
T dt T
0
This always happens in the steady-state: since the energy is periodic, in some parts of
the cycle dE/dt > 0 while in others dE/dt < 0, so that the area under the curve of
dE/dt yields zero when integrated. But saying that the average power is zero does not
mean that both terms in the right-hand side of (2.60) or (2.61) are individually zero. It
just means that the two must coincide. In fact, time-averaging each term yields
X |cn |2
κλy2 = κλ (2.62)
n∈Z
λ2 + n2 ω2
X |cn |2
κy(t) f (t) = κ . (2.63)
n∈Z
λ + inω

55
Let me explain what I just did. Eq. (2.62) is the same as (2.59). As for Eq. (2.63),
I actually used the more general “inner product” identity we developed in Eq. (1.47).
Essentially, I multiplied the Fourier coefficients cn of f (t), with the Fourier coefficients
dn of y(t).
The two results in Eqs. (2.62) and (2.63) don’t look equal. But they are. To see
that, we write
1 λ − inω
= 2 .
λ + inω λ + n2 ω2
Eq. (2.63) then becomes
X |cn |2 X n|cn |2
κy(t) f (t) = κλ − iκω .
n∈Z
λ2 +n ω 2 2
n∈Z
λ2 + n2 ω2

But the last term vanishes, because the summand is an odd function of n, so that the
terms with n > 0 exactly cancel those with n < 0 (and the term with n = 0 is also zero).
Thus, we are left only with the first term, which is exactly (2.62).
The moral of the story is that in the steady-state the power dissipated equals the
power delivered:
!
κλy = κy(t) f (t)
2 or RI(t) = V(t)I(t).
2 (2.64)

Thus, even though the average energy of the system is no longer changing, stuff is
still happening: the voltage source is constantly injecting juice in the circuit, which is
constantly being burned in the resistor. And what is perhaps most fascinating, because
this is a linear circuit, the harmonics don’t mix so this balance is true for each individual
frequency n.

2.7 2nd order, homogeneous ODEs: unforced oscilla-

tions
In this section we are going to consider 2nd order homogeneous ODEs with con-
stant coefficients:
d2 y dy
u2 2 + u1 + u0 y = 0,
dt dt
where u j are constants. We can always divide by u2 and reparametrize this as

Ly = ÿ + 2γẏ + ω2 y = 0. (2.65)

I labelled u1 /u2 ≡ 2γ and u0 /u2 ≡ ω2 , merely for convenience; it implies no loss of

generality.
Newton’s second law for a damped harmonic oscillator, Eq. (2.2), is an example of
this type of equations, with
k α
ω2 = , 2γ = . (2.66)
m m

56
The coefficient γ in (2.65) is thus associated to damping, while ω is associated to oscil-
lations (it comes from the spring constant). In the vast majority of physical problems,
γ > 0 and ω ∈ R. But the solutions we will develop in this section also hold other-
wise. Another example of Eq. (2.65) is the LRC circuit (2.3) with zero external voltage,
V(t) = 0. In this case 2γ = R/L and ω2 = 1/LC. So, once again, we can associate γ
with damping (in this case caused by the losses in the resistor) and now the oscillatory
behavior is linked with 1/LC.
It is convenient at this point to introduce the shorthand notation
d
D=
dt
for the derivative operator. Then ẏ = Dy and ÿ = D2 y. In this notation Eq. (2.65)
becomes
D2 y + 2γDy + ω2 y = 0,
and allows us to identify the differential operator

L = D2 + 2γD + ω2 .

which is a polynomial in D.
This new object, D, is a differential operator, so you cannot treat it like a number.
For instance, Dy , yD since D acts on anything that is a function of t. We say D and
y(t) do not commute. On the other hand, D does commute with things which are not
functions of t, such as scalars: 2D = D2. It also commutes with itself. For instance,
D(D+2) = (D+2)D. Checking this kind of property can be a bit confusing at first. The
trick is to make the differential operator act on a generic function y(t). For instance:

D(D + 2)y = D(Dy + 2y) = D2 y + 2Dy,

(D + 2)Dy = D2 y + 2Dy.

Since this must be true for any y(t), we then conclude that the identity must hold for
the operator itself. that is, D(D + 2) = (D + 2)D. It is convenient to use the notation

[A, B] = AB − BA,

to denote the commutator between two objects, A and B. Then our last calculation
reveals that
[D, (D + 2)] = 0.
Conversely, D(D + t) , (D + t)D, since t there does not commute with D. We can
again check this explicitly

D(D + t)y = D(Dy + ty) = D2 y + D(ty) = D2 y + y + tDy,

(D + t)Dy = D2 y + tDy.

57
The two are clearly not equal. In fact,

[D, D + t]y = y.

And, again, since this must hold true for any y(t), we conclude that

[D, D + t] = 1. (2.67)

Canonical quantization
I cannot resist to tell you of a neat application of differential operators to quan-
tum mechanics. Heisenberg showed that quantum mechanical experiments could be
explained if position x and momentum p were not numbers, but differential operators.
More specifically, they should be such as to satisfy

[x, p] = i~, (2.68)

where ~ is Planck’s constant, which is a fundamental constant associated to quantum

stuff in general. This result is called canonical quantization. The logic is that ~ is
a tiny number, so you can almost change the order xp → px with impunity; but not
quite: every time you do, you pay a small price i~ (which, interestingly, is a complex
number).
Consider now the differential operator D = dx
d
and let us compute [x, D]. Introduc-
ing a generic function y(x), we get Then

D(xy) = y + xDy.

Thus, h i
[x, D]y = (xD − Dx)y = xDy − y + xDy = −y.
Since this must be true for all y, we then conclude that

[x, D] = −1

This is very similar to Eq. (2.68), except we got a −1 on the right-hand side, instead of
i~. This therefore motivates us to define the momentum operator

d
p = −i~ . (2.69)
dx

In quantum mechanics, x continues to be x, but p is promoted to a differential operator

(one could also go the other way around, keeping p as p and promoting x to an operator;
it’s an equivalent formulation).

58
Back to ODEs
Sorry. I lost focus. Back to business. We can now use these properties of differ-
ential operators to establish a general solution for ODEs of the form (2.65). Consider,
for example, the differential operator L = D2 + 5D + 4. This only involves scalars and
powers of D, which commute between themselves. But since everything commutes,
we can apply standard algebra. For instance, we can factor D2 + 5D + 4 just like we
would factor a normal polynomial,
D2 + 5D + 4 = (D + 1)(D + 4),
[which is also the same as (D + 4)(D + 1), since everything commutes]. Again, if you
ever feel insecure about this, plug a generic y and check it:
(D + 1)(D + 4)y = (D + 1)(Dy + 4y) = (D2 y + 4Dy) + (Dy + 4y) = D2 y + 5Dy + 4y.
Since this must hold for any y, it must then be true at the level of the differential operator
itself.
But why is this useful? The reason is simple: we know how to solve (D − a)y = 0,
for any constant a. This is just ẏ − ay = 0, whose solution is y = eat :
(D − a)y = 0 → y = eat . (2.70)
Going back to our example, we know that y = e−t solves (D + 1)y = 0. So if we plug
e−t on the entire ODE we get
D2 y + 5Dy + 4y = (D + 4)(D + 1)e−t = (D + 4) · 0 = 0,
meaning e−t also solves D2 y + 5Dy + 4y = 0. Similarly, y = e−4t solves (D + 4)y = 0
and so will also solve D2 y + 5Dy + 4y = 0 because (D + 4)(D + 1) = (D + 1)(D + 4),
because they commute. Thus, we conclude that y = e−t and y = e−4t are both solutions.
And since these are linearly independent and our ODE is 2nd order, we conclude that
the most general solution will be
y = c1 e−t + c2 e−4t ,
for constants c1 , c2 .

Unforced oscillations

The above example points us to a general method to solve (2.65). We first

factor the polynomial

D2 + 2γD + ω2 = (D − λ+ )(D − λ− ) (2.71)

where its roots are given by

q
λ± = −γ ± γ2 − ω2 , (2.72)

One may then verify that eλ+ t and eλ− t will both be solutions of (2.71). And so,

59
as long as λ+ , λ− , the general solution will be given by the linear combination

y(t) = c1 eλ+ t + c2 eλ− t . (2.73)

If λ+ = λ− , the two solutions are no longer linearly independent. We continue

to have one solution, but now we need to find a second one (see Eq. (2.78)
below).

We can also write (2.73) more explicitly as

√2 2 √2 2
y(t) = e−γt c1 e γ −ω t + c2 e− γ −ω t , (2.74)

This shows that the solution is always enveloped by e−γt . In most physical applications
γ > 0 and hence y(t) will decay exponentially. This is what happens, for instance, in the
damped harmonic oscillator, which wiggles around for a bit until eventually stopping.
The condition γ > 0 is thus associated with the stability of the ODE.p
If γ > ω we say the solution is overdamped. In this case γ2 − ω2 ∈ R so
all exponentials in (2.74) are real; there will be no oscillations and the system will
just relax exponentially towards equilibrium. Conversely, if γ < ω the solution is
called underdamped. In this case the square roots will be imaginary. Defining Ω =
ω2 − γ2 > 0, we can also write it as
p

y(t) = e−γt c1 eiΩt + c2 e−iΩt (2.75)

= e−γt C1 cos Ωt + C2 sin Ωt (2.76)

= ce−γt sin(Ωt − φ). (2.77)

The three solutions are all equivalent and simply correspond to different parametriza-
tions of the constants. For instance, if we take the 3rd line and expand sin(Ωt − φ) =
sin Ωt cos φ − cos Ωt sin φ, we see that c cos φ ≡ C1 and −c sin φ ≡ C2 .
Finally, there is the critically damped case where γ = ω. What is special about
this is that the two roots of L become equal, λ+ = λ− [Eq. (2.72)]. Let us go back to
the drawing board. What we are interested, in this case, is to find a general solution of

(D − a)(D − a)y = 0.

We already know that y = eat works. But to construct the general solution, we need
two linearly independent solutions. Let us then, instead, try a solution of the form
y = u(t)eat , for some function u(t). However, one may verify that

(D − a)y = u̇eat .

Thus, every time we apply (D − a) to ueat , we get back almost the same thing, but with
u replaced with u̇. So applying it a second time yields

(D − a)(D − a)y = (D − a) u̇eat = üeat .

60
Hence, we see that y = u(t)eat will be a solution of (D − a)(D − a)y = 0, provided ü = 0.
The most general solution should thus be of the form

y(t) = (c1 + c2 t)eat . (2.78)

We have just found our other linearly independent solution. One solution is eat and the
other is teat .

General solution including initial conditions

To summarize, let me make explicit the formulas to be used in each of the

regimes. The general solution of Eq. (2.65) is given by Eq. (2.74) in the over-
damped case, Eq. (2.75) in the underdamped and Eq. (2.78) in the critically
damped.
I also want to write down the solutions considering actual initial conditions,
y(0) = y0 and ẏ(0) = v0 . I will save you the work. They read:

v0 + γy0
" q q #
y(t) = e−γt y0 cosh t γ2 − ω2 + p sinh t γ2 − ω2 , γ > ω,
γ2 − ω2
h i
= e−γt y0 + (v0 + γy0 )t , γ = ω, (2.79)

v0 + γy0
" q q #
= e−γt y0 cos t ω2 − γ2 + p sin t ω2 − γ2 , γ < ω.
ω −γ
2 2

I know this looks messy. But it interesting to compare the 3 cases. First, notice
how γ > ω and γ < ω can be obtained by simply switching from trigonometric
to hyperbolic functions. Second, now that we wrote the solution in terms of
actual physical quantities, y0 and v0 , we can obtain the critically damped solu-
tion by taking the limit ω → γ. This does not work on Eq. (2.74) because it is
written in terms of generic constants. An illustration of the three solutions is
given in Fig. 2.3.

2.8 Inhomogeneous ODEs; forced oscillations

In this section we consider ODEs of the form

Ly = ÿ + 2γẏ + ω2 y = f (t), (2.80)

for some external drive f (t). The prototypical example is the forced, damped harmonic
oscillator described by
mÿ + αẏ + ky = F(t), (2.81)
where m is the mass, α is the damping, k the spring constant and F(t) the external force.
Dividing both sides by the mass then yields Eq. (2.80) with 2γ = α/m, ω2 = k/m and

61
��
�� ω/γ = ��
�� ω/γ = ��
�� ω/γ = ��

�
��
��
-��
� � � � � ��
γ�

Figure 2.3: Example of the general solution (2.73) or (2.78), for different values of ω/γ, all
starting from y(0) = 1 and ẏ(0) = 0.

f (t) = F(t)/m. To be honest, the results we are going to develop also hold for ODEs of
order higher than 2. But these are seldom found in practice, so we focus on 2nd order
to gain intuition.
It helps to think about Eq. (2.80) as a probe & response problem, an idea that
encompass many experiments in physics. Our system can be viewed as a kind of black
box, that we do not know much about. To learn something about it, we probe it with
an external perturbation f (t) and measure how it responds. This scenario happens
in particle physics, for instance: Rutherford probed gold atoms by poking them with
α particles. It also happens (a lot) in condensed matter. To characterize a magnetic
system, we probe it with a magnetic field. To characterize a Bose-Einstein condensate,
we shake it with an optical field.
Since Eq. (2.80) is inhomogeneous, the general solution will be

y(t) = y p + c1 y1 + c2 y2 , (2.82)

where y p is a particular solution and y1(2) are two linearly independent solutions of
Ly = 0, which is precisely what we studied in the last section. There is a general
method to solve this type of equation, for arbitrary right-hand side. But the method is
not very elegant and, for simple choices of f (t), more direct and insightful approaches
exist. Nonetheless, I discuss this general method below, in Eqs. (2.94) and (2.95), in
case you are curious.
For now I want to focus on the case when f (t) = f0 eiΩt , where I use Ω to avoid
confusion with the natural oscillation frequency ω in Eq. (2.80) We begin by noting the
following nice result: let

L(D) = a0 + a1 D + a2 D2 + . . . ,

denote an arbitrary differential operator, with constant coefficients ai . Then

L(D)eiΩt = L(iΩ)eiΩt . (2.83)

This is pretty cool: when L(D) acts on a complex exponential, we get the exact same
polynomial, but with differential operators replaced by numbers.

62
Let us prove this identity. Start with a simple example:

d(eiΩt )
(D − 2)eiΩt = − 2eiΩt = (iΩ − 2)eiΩt .
dt
Notice how the thingy on the left has the same algebraic structure as the differential
operator we started with: we simply replaced D − 2 with iΩ − 2. From this I think the
proof is pretty easy, right? I mean, we simply act with L(D) on eiΩt . Each time a D hits
the exponential, we get iΩ times the exponential again. So we are essentially replacing
D’s with iΩ’s everywhere.
Consider now the ODE
L(D)y = f0 eiΩt .
Use the ansatz y(t) = AeiΩt , where A is a constant. We then get
h i
L(D) AeiΩt = AL(iΩ)eiΩt = f0 eiΩt

The anstaz will thus be a valid particular solution, provided the constant A is chosen as
f0
A= . (2.84)
L(iΩ)

Particular solution for exponential drive

We therefore conclude that a particular solution of

L(D)y = f0 eiΩt , (2.85)

where L(D) is an arbitrary polynomial in D, with constant coefficients, is given

by
f0 iΩt
y p (t) = e , (2.86)
L(iΩ)
where L(iΩ) is the same polynomial, but with D replaced by iΩ. For instance,
a particular solution of (2.80) for f (t) = f0 eiΩt reads
f0
y p (t) = eiΩt . (2.87)
ω2 − Ω +
2 2iγΩ

If it happens that L(iΩ) = 0, then this of course won’t work. In this case, the particular
solutions will usually have the form P(t)eiΩt , where P(t) is a polynomial in t. I will
leave this as an exercise for you to check. Try, for instance, L = (D − iΩ)(D − a) or
L = (D − iΩ)(D − iΩ).
If we want the general solution, then we must still add to y p the homogeneous
solutions y1 and y2 , as in Eq. (2.82). However, as we discussed in the previous section,
these solutions always have an exponential envelope and therefore decay in time (the
transient). The particular solution (2.87), on the other hand, is periodic and will thus

63
oscillate indefinitely (the steady-state). Thus, if we are only interested in the long-time
behavior of the system, all we need is Eq. (2.87).
The solution (2.87) is also complex and we may wish to take its real or imaginary
parts. The rationale is exactly the same as in Sec. 2.6: the ODE is linear with real coef-
ficients, so the real part of (2.87) will give us a particular solution for f (t) = f0 cos Ωt
and the imaginary part will give us a solution for f (t) = f0 sin Ωt. In this sense, it is
convenient to write
1 e−iφ 2γΩ
= , tan φ = 2 . (2.88)
ω2 − Ω2 + 2iγΩ (ω − Ω ) + 4γ Ω ω − Ω2
p
2 2 2 2 2

Then we may express (2.87) as

f0
y p (t) = p ei(Ωt−φ) . (2.89)
(ω − Ω ) + 4γ Ω
2 2 2 2 2

Taking the real or imaginary parts is now trivial: simply replace the complex exponen-
tial by cosine or sine.
The particular solution (2.89) shows that, in the steady-state, the system will oscil-
late with the same frequency Ω as the external force. The amplitude of the oscillations
is proportional to f0 (so big drives generate big responses), but modulated by a factor
that depends on Ω, ω and γ. This amplitude is illustrated in Fig. 2.4(a) as a function of
Ω/ω. Unlike what we had in first order systems [c.f. Eq. (2.47)], we see in this case the
possibility of a resonance effect: By tuning Ω we can either suppress or enhance the
amplitude. In particular, if the damping γ is very small, we can dramatically enhance
the response by tuning Ω exactly at the natural frequency ω. The magnitude of the
response depends on the damping constant γ, so the resonance is stronger for lower
dissipation.
Now take a second imagine all this from the perspective of an experimentalist. This
is extremely valuable information. You sweep through different values of Ω, until you
encounter a peak. The position of the peak indicates the natural oscillation frequency
of the system. And the height/width of the peak tells you about the magnitude of the
damping (an experimentalist might call this the quality factor, instead of damping).
Here we focused on a single harmonic oscillator, but more complicated systems tend to
behave in a similar way. The only difference is that they may have multiple resonances
and thence multiple peaks.
Fig. 2.4 also shows the phase lag φ in (b), as well as different example dynamics in
(c). In this case, the behavior is similar to that of 1st order systems, where the system
tries to follow f (t) around, but is always lagging behind.

Arbitrary periodic drive

We now repeat the same logic used in Sec. 2.6 to consider arbitrary periodic drives.
Once we know the particular solution (2.87) for a single exponential drive, we can
use the principle of superposition to compute a particular solution for a generic time-
periodic drive f (t) = n cn einΩt . The particular solution will be
P
X cn
y p (t) = einΩt . (2.90)
n∈Z
ω2 − n 2 Ω 2 + 2niγΩ

64
�� π
(�) (�) � (�)
�
��

ω��(�)/��
� �� π
�

ϕ
�
� �� -�
�
-�
� �
�� π �π �π
Ω/ω Ω/ω Ω�

Figure 2.4: (a) Amplitude and (b) phase φ [Eq. (2.89)], plotted as a function of Ω/ω, for differ-
ent values of γ/ω. (c) The system’s response ωy p (t)/ f0 as a function of Ωt for the
same values of γ/ω as in (a) and (b), with fixed Ω/ω = 0.8. The dotted black curve
is simply the external force, cos(Ωt).

�(�)

Figure 2.5: A typical impulse response of marching soldiers on a bridge.

The new and exciting feature here, as compared to the first order systems we studied in
Sec. 2.6, is the prospect of resonant effects. As we just saw, resonance occurs when the
driving frequency Ω matches the natural frequency ω. But now f (t) contains a super-
position of many driving frequencies Ω, 2Ω, 3Ω etc. This opens up more possibilities.
We can now have a resonance when any of these subharmonics gets very close to the
system’s natural frequency.
You may have heard about the Angers Bridge in France, in 1850, which collapsed
when a battalion was marching through it. Marching generates a periodic drive. It is
very far from a single harmonic drive, but it is still periodic. In fact, marching looks
somewhat like a narrow boxcar, such as that in Fig. 2.5. Let us take, for concreteness,
a boxcar of height f0 , frequency Ω and duration a:

f (t) = f0 θ(t + a)θ(a − t), t ∈ [−π/Ω, π/Ω]. (2.91)

For convenience, I chose it as a symmetric boxcar, centered at 0, going from −a to a

(total width 2a).
A bridge may, at first, not seem like a harmonic oscillator. But, surprisingly, that is
actually not a bad approximation. The complicated vibrations of a solid body can be
decomposed into different normal modes, where each mode vibrate independently of
each other. And very often, different inputs only stimulate one or a handful of these
modes. So, as a first order approximation, it is actually not horrible to assume that the
bridge is a damped harmonic oscillator.
The Fourier coefficients associated to Eq. (2.91) are given by Eq. (1.40), with L (or

65
what we have been calling T in this chapter) being the period, 2π/Ω:
ZT/2 Za
1 f0 f0 2
cn = f (t)e−inΩt
dt = e−inΩt = sin(nΩa).
T T T nΩ
−T/2 −a

Simplifying a bit, we get

f0
cn =
sin(nΩa). (2.92)
nπ
The particular solution (2.90) then becomes, in this case,
X 1 sin(nΩa)
y p (t) = f0 einΩt . (2.93)
n∈Z
nπ ω2 − n2 Ω2 + 2niγΩ

This result is very nice. The soldier’s march is modeled by three ingredients: f0 , a
and Ω. The first is the overall amplitude. And, as one would intuitively expect, larger
inputs f0 lead to larger outputs y p . Then there is a and Ω. The latter reflects the
overall periodicity of the steps. It describes at which frequency the input repeats itself.
Conversely, a describes the duration of each kick. That is, the amount of time the
soldier’s boots apply a force on the ground. It is thus something one has much less
control over.
But, as can be seen, the possibility of having resonances depends on Ω, not a.
The value of a will only influence the magnitude of the resonance, through the factor
sin(nΩa). A resonance occurs when Ωn = ω. If ω < Ω there will be no integer n that
gets close. So resonances occur when ω > Ω; that is, when the marching pace is slow,
compared to the natural oscillation frequency of the bridge. In this case, a resonance
will occur whenever ω and Ω are such that ω/Ω is close to an integer. If γ is small,
this may cause one of the Fourier coefficients in (2.93) to become very large. And this,
in turn, can make the bridge go pluft. Still today, soldiers break stride (stop marching)
when they go on a bridge, precisely for this reason.

General solution for arbitrary right-hand side

To finish, I want to write down a general method to find the solution for arbitrary
f (t). The method is a bit ugly and for simple choices of f (t), like exponentials, one
may simply use the results derived above. But nonetheless, I think it is nice to have an
idea of how a general solution works. The idea is to split the ODE in two. Using the
ideas of Sec. 2.7, we first factor the differential operator as
L = (D − λ+ )(D − λ− ),
where λ± are the roots of L given in Eq. (2.72). Now define u = (D − λ− )y. Then
Ly = f (t) → (D − λ+ )u = f (t).
This is a first order ODE, just like (2.35) (but with λ = −λ+ ). The general solution is
thus, Z
u(t) = c1 eλ+ t + eλ+ t f (t)e−λ+ t dt. (2.94)

66
Imagine that you are given a specific f (t) and you solved this integral. This will give
you u(t) as some function. But since u = (D − λ− )y, we can use u(t) as a right-hand
side and solve the ODE (D − λ− )y = u, whose solution is
Z
λ− t λ− t
y(t) = c2 e + e u(t)e−λ− t dt. (2.95)

This gives you the general solution of (2.80), for any f (t), in terms of two integrals.

Example: consider the ODE

(D − 1)(D + 2)y = t.

We have λ− = −2 and λ+ = 1. Defining u(t) = (D + 2)y leads to the ODE (D − 1)u = t

whose solution, Eq. (2.94), reads
Z
u(t) = c1 et + et te−t dt

= c1 et − et (1 + t)e−t

= c1 et − (1 + t).

Next we plug this in Eq. (2.95) and solve for y:

Z h i
y(t) = c2 e + e
−2t −2t
c1 et − (1 + t) e2t dt

e3t 1
" #
= c2 e −2t
+e −2t
c1 − (t + 1/2)e2t
3 2
c1 t 1
= c2 e−2t + e − (t + 1/2).
3 2
Since c1 is a constant anyway, we may redefine c1 /3 → c1 , leading finally to the general
solution
1
y(t) = c2 e−2t + c1 et − (t + 1/2).
2

67
Chapter 3

Partial Differential Equations

3.1 Overview of some important equations

Partial differential equations (PDEs) are differential equations where the dependent
variable u depends on more than one independent variable, u(x, y, z, t). Here are some
examples of what u might mean:
• The temperature of a certain body;
• The electrostatic potential generated by a certain charge configuration;

• The wavefunction of a quantum mechanical body.

To be fancy, we call u a scalar field. The name field is used for anything that varies as
we move to different points in space-time. And the name “scalar” is used because u has
only one component. We could also study PDEs for vector fields, such as the electric
or magnetic fields. But all the intuition, and all relevant methods, can be understood by
focusing only on scalar fields.
Mathematically, we could just call the independent variables as x1 , x2 , x3 , . . .. But
physically we like to distinguish time t, from spatial variables x, y, z. The temperature
of a certain body, for instance, may both change from one position (x, y, z) to another,
as well as in time. The reason for this distinction is that for the time variable, we
usually specify an initial condition, as in the case of ODEs in chapter 2. For instance,
something like u(x, y, z, 0) = u0 (x, y, z), for some function u0 . This means that at time
0 the variable u had some kind of shape u0 in space. For the spatial variables, on
the other hand, we usually specify boundary conditions. For instance, if one side
of the body is kept in contact with a flame at a specific temperature, we may want to
specify something like u(0, y, z, t) = T 0 , for some temperature T 0 . This means that for
all times, the solution must be such that u always has a fixed value when x = 0. We
thus constraint the function to behave in some specified way at the boundaries of some
region; hence the name “boundary conditions”.
In the PDE business, time is not always present. But space is. That is, in many
cases we are not interested in how u changes in time, but only in some steady-state

68
configuration u(x, y, z). This happens, in all three examples above. For instance, one
may ask what is the temperature profile of some solid that is connected to multiple heat
baths. Or what is the electrostatic potential resulting from some charge configuration.
In this sense, the most important equation, by far, is Laplace’s equation

∇2 u = 0, (3.1)

where ∇2 is the Laplacian operator, which is defined in Cartesian coordinates as

∂2 u ∂2 u ∂2 u
∇2 u ≡ + +
∂x2 ∂y2 ∂z2
≡ ∂2x u + ∂2y u + ∂2z u,

≡ u xx + uyy + uzz .

In cylindrical or spherical coordinates the shape of ∇2 changes completely. These cases

will be studied later on in the course. Closely related to Laplace’s equation is Poisson’s
equation
∇2 u = f (x, y, z), (3.2)
This is the inhomogeneous version of Laplace’s equation. In the case of electrostat-
ics, f (x, y, z) represents some external charge configuration placed in some position in
space. Eq. (3.2) then describes the electrostatic potential u due to these charge configu-
rations. Conversely, Laplace’s equation describes the electrostatic potential in a region
of space which is free of charges.
When dealing with PDEs involving time, the Laplacian is usually still present in
the spatial part. What really changes the physics is the time-part. There are three big
time-dependent PDEs we are going to study:
• Wave equation:
1 ∂2 u
= ∇2 u. (3.3)
c2 ∂t2
• Heat (diffusion) equation:
1 ∂u
= ∇2 u. (3.4)
α ∂t
• Schrödinger’s equation:
∂u ~2
i~ = − ∇2 u. (3.5)
∂t 2m
Forget about the constants for now. What I want you to see is how similar they are;
all that really changes is the order of the time-derivative. But even though this may
seem like a small difference, it turns out to be absolutely essential. While the the wave
equation (3.3) describes oscillatory phenomena (electromagnetic waves, sound waves,
etc.), Eq. (3.4) describes diffusion or heat transport. This is what happens, for instance,
when you open a flask of perfume or heat flows through a metal bar. The solutions
of Eq. (3.4) are not oscillatory, but exponentially decaying, as we will see. And all
of this just happens because we changed from ∂2t u to ∂t u. Then there is Schrödinger’s

69
equation. Its solutions are oscillatory, even though it is first order in time. This happens
because of the complex factor in front and is something which has major consequences
for the description of quantum phenomena, as we will discuss.

3.2 The method of separation of variables

Before we talk about each PDE of the previous section individually, and discuss
the physics behind them, I wanted to show you a solution method which is the starting
point to tackle all of them. It is called the method of separation of variables. The basic
idea is to look for solutions which have the form

u(r, t) = R(r)T (t), (3.6)

where r = (x, y, z). That is, we look for solutions where the time part is factored from
the spatial part. As we will learn, there will usually be an infinite number of linearly
independent solutions of this form, which we can label as un (r, t) = Rn (r)T n (t), with
some index n. The general solution will definitely not have this form; otherwise time
and space would evolve independently and there would be no fun at all. But these
product solutions can be used as building blocks to construct more general solutions.
This is possible because the equation is linear, so the sum of two solutions is also a
solution. That is to say, the general solution will usually have the form
X
u(r, t) = cn Rn (r)T n (t), (3.7)
n

where cn are coefficients that need to be adjusted to match the initial and boundary
conditions. This will be the basic game that we will play throughout this chapter: we
find the set of product solutions of the form (3.6) and then use them to build the general
solution (3.7) by adjusting the constants to match the initial/boundary conditions.

Wave equation
Let us see what the ansatz (3.6) brings us in the case of the wave equation (3.3).
On the left we have
1 ∂2 u R d2 T
=
c2 ∂t2 c2 dt2
since R(r) does not depend on t. Similarly, on the right we have

∇2 u = T (t)∇2 R,

since T (t) does not depend on r. Rearranging a bit, we can then write Eq. (3.3) as

T̈ ∇2 R
2
= . (3.8)
c T R
Now comes the key point: We are looking for functions T and R which solve this
equation. By “solve” we mean functions which satisfy it for all values of t and all

70
values of r. But the left-hand side is only a function of t, by hypothesis. And the
right-hand is only a function of r. So how can it be that, as we vary t and we vary r,
which we can of course do independently, the quantity on the left remains equal to the
quantity on the right? The answer is that this can only happen if they each are equal to
a constant. That is, if
T̈ ∇2 R
= = constant := −k2 . (3.9)
c2 T R
We label the constant as −k2 for convenience. This implies no loss of generality. The
above logic is very important. So please take a second to see if you really understand.
The value of the constant does not matter right now; we will see that there are many
constants k that satisfy this, whose values are imposed by the boundary conditions.
What really matters at this point is that T̈ /T and ∇2 R/R must be a constant. That is,
they cannot depend on either t or r.
Eq. (3.9) therefore implies two separate equations

T̈ = −c2 k2 T, (3.10)

∇2 R = −k2 R. (3.11)

The equation for R is called the Helmholtz equation. Notice how it looks similar to
Laplace’s or Poisson’s equations, (3.1) or (3.2). The difference is that the right-hand
side now depends on R. We will discuss how to solve this type of equation soon.
For now, I just wanted to anticipate one result: we will find that in most problems
Eq. (3.11) only has a solution for a discrete set of real constants kn . You may appreciate
the connection with eigenvalues and eigenvectors. Eq. (3.11) has the form Ax = λx,
where instead of a matrix A, we now have a differential operator ∇2 and instead of
a vector x we have a function R(r). The allowed values kn , which solve (3.11), are
therefore the eigenvalues of the Laplace operator. And the corresponding functions
Rn (r) are the eigenfunctions.
Let us then focus on the time part, Eq. (3.10). This equation is easy; it is just a
2nd order homogeneous ODE with constant coefficients. The two linearly independent
solutions are
T (t) = eickt , e−ickt . (3.12)
The solutions are thus complex exponentials, befitting of a wave equation. We could
also use sines and cosines, but I prefer to leave it like this for now. Let us denote by Rn
the solutions of (3.11). The general solution can then be written as a linear combination
of these solutions X
u(x, t) = (an eickn t + bn e−ickn t )Rn (r). (3.13)
n

where an and bn are constants.

Heat equation
Next let us see what happens for the heat equation (3.4). We use the same type
of ansatz as in (3.6). As a result, we get something very similar to (3.8), but with a

71
different time part:
Ṫ ∇2 R
= = −k2 . (3.14)
αT R
The argument for the separation of variables remain exactly the same: the left-hand
side is only a function of t and the right-hand side is only a function of r. The resulting
equation for R continues to be the Helmholtz equation (3.11). But the equation for T
now becomes
Ṫ = −αk2 T
This is a 1st order ODE with constant coefficients. There is now only one independent
solution,
2
T (t) = e−αk t
Since k is real, we therefore see that the solution is a decaying exponential: Heat
dissipates, while waves propagate. The general solution will thus have the form
X 2
u(r, t) = cn e−αkn t Rn (r). (3.15)
n

Schrödinger equation
Finally, we consider Schrödinger’s equation (3.5). Repeating the same procedure,
we find
i~Ṫ ~2 ∇2 R
=− = E.
T 2m R
In the case of Schrödinger’s equation we call the constant E instead of −k2 . This
is because, it will turn out, E is associated with the energy of the system (we will
understand why later on). The equation for R now reads
~2 2
− ∇ R = ER (3.16)
2m
while that for T reads
Ṫ = −i(E/~)T. (3.17)
This is a 1st order ODE, so the solution reads
T (t) = e−iEt/~ . (3.18)
Hence, the general solution will have the form
X
u(r, t) = cn e−iEn t/~ Rn (r). (3.19)
n

The value of E turns out to be real, so that the solutions are seen to be complex ex-
ponentials. Thus, even though the time-derivates are first order, the solutions are still
oscillatory. This is because of the factor of i in Eq. (3.5).
Eq. (3.5) is actually only a particular case of Schrödinger’s equation. The general
equation actually reads
∂2 u ~2
i~ 2 = − ∇2 u + V(r)u, (3.20)
∂t 2m

72
where V(r) is an arbitrary function of r, which is called the potential energy. I will
explain the logic a bit better later on. For now I just wanted to point out that, even in
this more general case, the method of separation of variables continues to hold. But
now it yields

Ṫ = −i(E/~)T, (3.21)
~2 2
− ∇ R + V(r)R = ER (3.22)
2m
Notice how the equation for the time-part remains unchanged; the solutions continue
to be T (t) = e−iEt/~ . The equation for R is sometimes called the time-independent
Schrödinger equation. The solutions Rn (r) will still only exist (usually) for a discrete
set of energies En , but which will depend sensibly on the function V(r). Notwithstand-
ing, since the time-part does not change, the general solution will still be given by
Eq. (3.19), but with new functions Rn (r).

3.3 Heat equation and Fourier’s law

Let us stop to talk about physics for a second. The heat equation (3.4) describes
either heat flow through a material, or the diffusion of molecules through a solid. In
the former, u is the temperature in the material, and in the latter u is the concentra-
tion of molecules. This equation is phenomenological. It does not follow from any
fundamental law of Nature, but simply describes a common behavior found in many
(but not all) materials and systems. In this section I want to briefly discuss the basic
principles underlying it. There are two main ingredients. The first is the notion of a
continuity equation. It is easier to think in terms of diffusion, so that u(r, t) describes
the concentration of particles in point r at time t. But the logic also applies to heat.
The basic idea of a continuity equation is that particles cannot be spontaneously
created or destroyed. So, if the value of u in a certain position r has changed in time,
this must be because a current of particles (or heat) has flown through that region.
Consider a certain region Γ in space and let
$
Q(t) = u(r, t)dV, (3.23)
Γ

denote the net amount of particles in that region. A continuity equation then states that

dQ
=− J · dS, (3.24)
dt
SΓ

where J (r, t) is the current that flows through point r at time t, and the integral is over
the surface S Γ encompassing the region Γ, with dS being a surface element. Using the
divergence theorem, however, we may write
$
J · dS = (∇ · J )dV,
SΓ Γ

73
where ∇ · J = ∂ x J x + ∂y Jy + ∂z Jz is the divergence. We may combine this and (3.23)
into (3.24), leading to
$ $
∂u(r, t)
dV = − (∇ · J )dV.
∂t
Γ Γ

And since this must be true for any region Γ, we conclude that the integrands them-
selves must be equal. That is,
∂u
= −∇ · J . (3.25)
∂t
This is the continuity equation. It relates changes in u with the divergence of a current
J through that region.
This result is nice but, in a sense, it doesn’t say much because we haven’t really
defined what J is. What really determines the physics is the form of the current.
Unless we say something specific about it, there is nothing to do. This is where our
hero, Fourier, comes in. He argued that what generates a current is precisely a variation
of u. If u were constant everywhere, there would be no current. But if there is an
imbalance between u in one point and u in another, this will cause a current to flow.
This makes a lot of sense. For instance, if u is the concentration of particles, then what
generates a current is the fact that one region has more particles than others; or, if u is
the temperature, the heat will flow because one region may have a temperature higher
than the other.
According to Fourier, therefore, the current should have the form

J = −α∇u. (3.26)

That is, it should be proportional to minus the gradient of u (which is what quantifies
how steeply u changes). Here α is just a constant, which varies from material to ma-
terial, called the diffusivity. The minus sign is there because the flow tends to be from
high to low concentration. For instance, heat flows from hot to cold. For this reason,
we also have α > 0. When we are thinking in terms of heat flow, Eq. (3.26) is called
Fourier’s law of heat conduction. Conversely, if we are thinking in terms of particle
diffusion, it is called Fick’s law of diffusion.
Eq. (3.26) is the missing ingredient that makes it possible to extract useful infor-
mation from the continuity equation (3.25). Combining both results, we finally find the
diffusion equation
∂u
= α∇2 u, (3.27)
∂t
where ∇2 u = ∇ · (∇u). Interestingly, Fourier series were actually invented to treat the
heat equation. Fourier was interested in solving (3.27) and developed the theory behind
Fourier series as a method to solve it.

3.4 The heat equation in 1D

In this section we study in great detail the solutions of the heat equation assuming
that the system is one-dimensional. This will be a good approximation if the system is

74
Figure 3.1: A practically 1D bar, of length L.

a thin but long metal rod, for instance, so that the temperature may change along the x
direction, but is practically constant along the y and z directions (Fig. 3.1). The bar is
assumed to have length L. In this case u(x, t) depends only on the x position, and time.
We are going to write the heat equation more compactly as

ut = αu xx . (3.28)

This equation, by itself, doesn’t tell the whole story. We still need to specify the bound-
ary and initial conditions. The initial condition is specified by providing the function
u(x, 0) = u0 (x). That is, we need to know how the temperature profile looked like
initially. Only then can we actually say something about how it will evolve in time.
The boundary condition, on the other hand, is specified by saying what happens to
u at the endpoints, x = 0 and x = L. Can heat flow from the endpoints? Or maybe, are
the end-points in contact with some heat bath kept at some temperature? This actually
defines two types of boundary conditions:
• Dirichlet boundary condition: we fix the temperature at the boundaries, at all
times: u(0, t) = T 1 and u(L, t) = T 2 , for some temperatures T 1 and T 2 . This is the
case, for instance, if the endpoints are connected to thermal baths (like a flame
or a big bucket of water).
• Neumann boundary condition: we fix a constant heat flux at the boundaries.
Recall from Eq. (3.26) that the heat flux J is essentially the gradient of u. In 1D
this reduces to J = −α∂ x u. Hence, fixing the heat flux is the same as fixing ∂ x u.
For instance, we could set J(0, t) = J1 and J(L, t) = J2 . A particularly common
choice is when we set J = 0 (and hence u x = 0); this describes insulating walls.
That is, we block any heat flow to the outside world.
These boundary conditions are not the only ones; they are just two very common
choices that appear frequently. We could also have mixed boundary conditions, such
as Dirichlet on the left and Neumann on the right. Or we can also mix them at the same
boundary. Something like,
au(0, t) + bJ(0, t) = c, (3.29)

75
for constants a, b, c. This is called a Robin, or “radiation” boundary condition. It
has this name because, if we relabel c = T a, for some parameter T , we can write it as
bJ(0, t) = a(T − u(0, t)). (3.30)
The idea is that T is some kind of temperature of the surroundings, so this equation
specifies that the heat flux at the boundary is proportional to the temperature difference
between T and u(0, t), a hypothesis known as Newton’s law of cooling. Robin bound-
ary conditions can also be solved by similar methods, but turn out to be mathematically
more complicated. We will therefore focus on Dirichlet and Neumann in this course.

Solution by separation of variables

We proceed as in Sec. 3.2, by attempting to find a solution of the form u(x, t) =
X(x)T (t). Plugging this in (3.28) yields
Ṫ X 00
= = −k2 .
αT X
2
Thus T (t) = e−αk t , while X satisfies
X 00 = −k2 X.
The solutions to this ODE are sines and cosines (or complex exponentials; whatever
you prefer):
X(x) = a cos(kx) + b sin(kx), (3.31)
where a, b are constants. So far these solutions seem to hold for any k. But this will
change once we impose the boundary conditions.

Dirichlet boundary conditions; zero temperature

Consider the case where
u(0, t) = u(L, t) = 0. (3.32)
This, in turn, implies X(0) = X(L) = 0. Using Eq. (3.31) at x = 0, we find
X(0) = a = 0,
which therefore can only be satisfied if a = 0. We are thus left only with the sine part.
Next we impose the boundary condition at x = L:
X(L) = b sin kL = 0.
At first we could be tempted to set b = 0. But that would lead to the trivial solution,
u = 0, which is no fun at all. Instead, the only other possibility is if sin kL = 0. This
therefore restricts the possible choices of k:
πn
kn = , n = 1, 2, 3, . . . , (3.33)
L
which defines the allowed set of solutions.

76
Eigenvalues and eigenfunctions

We can summarize the above results by writing the following boundary prob-
lem:
X 00 = −k2 X, X(0) = X(L) = 0. (3.34)
This is the 1D version of the Helmholtz equation (3.11), with Dirichlet bound-
ary conditions. What we have just learned, therefore, is that the allowed solu-
tions are Xn (x) = sin(kn x), where kn = nπ/L and n = 1, 2, 3, . . .. The kn are the
eigenvalues and Xn are called the eigenfunctions.

You should now see a Fourier series starting to take shape. Combining X(x) with
2
the time part, T (t) = e−αk t all functions of the form
2 πn
un (x, t) = e−αkn t sin(kn x), kn = , n = 1, 2, 3, . . . (3.35)
L
will be solutions of our equations. Since the PDE is linear, linear combinations of them
will also be a solution. Therefore, the general solution of ut = αu xx , with Dirichlet
boundary conditions u(0, t) = u(L, t) = 0, will be
∞
X 2 πn
u(x, t) = bn e−αkn t sin(kn x), kn = , n = 1, 2, 3, . . .
n=1
L

for constants bn , which are determined from the initial condition u(x, 0) = u0 (x).
We find the bn by setting t = 0, leading to
∞
X
bn sin(kn x) = u0 (x). (3.36)
n=1

We can isolate the bn using the orthogonality of trigonometric functions, developed in

Sec. 1.2. There are two caveats, however: First, the function sin(nπx/L) is periodic,
with period L/2, not L. And second, the integration interval is defined here only to be
[0, L] and not [−L, L]. The actual orthogonality relation should thus be adjusted to
ZL
2
sin(kn x) sin(km x)dx = δn,m . (3.37)
L
0

I know these details seem a bit confusing. To be honest, it is easier to simply check
that this is the correct formula by making a couple of tests in Mathematica. We now
use (3.37) in (3.36). We multiply both sides by sin(km x) and integrate from 0 to L. As
a result, we find
ZL
2
bn = u0 (x) sin(kn x)dx. (3.38)
L
0

Looking at this, however, we realize it is exactly the Fourier sine series of chapter 1.

77
In fact, so we are all in the same page, let us explore this connection in more depth.
Consider the Fourier coefficients of a function f (x), periodic in [−L/2, L/2], defined in
Eq. (1.15):
ZL/2 !
2 2πnx
bn = f (x) sin dx.
L L
−L/2

Now shift the period to be 2L instead of L. We can get that by simply replacing every
L we see by 2L:
ZL πnx
1
bn = f (x) sin dx.
L L
−L

This is already starting to look like (3.38). Finally, let us split f (x) into an even and
an odd part, f (x) = fe (x) + fo (x), where fe (−x) = fe (x) and fo (−x) = − fo (x). We can
always do this for any function.1 Then it follows that

ZL
fe (x) sin(kn x)dx = 0,
−L

since sine is odd and fe (x) is even. Thus, only the odd part of the function contributes
to its Fourier series,
ZL πnx
1
bn = fo (x) sin dx.
L L
−L

Finally, since the integrand is now even (because it is the product of two odd functions),
we can write the integration to be only from 0 to L, and multiply the result by 2. We
then finally arrive at
ZL πnx
2
bn = fo (x) sin dx,
L L
0

which is exactly (3.38). Thus, we conclude that the coefficients bn are nothing but the
Fourier sine series of u0 (x). The reason why the result depends only on the sine series,
and not the cosine, is because of the boundary conditions: we wanted u(0, t) = 0, so a
cosine would never work.

Heat Eq. in 1D with Dirichlet BCs and zero temperature

To summarize, the general solution of ut = αu xx , with Dirichlet boundary

conditions u(0, t) = u(L, t) = 0, will be
∞
X 2 πn
u(x, t) = bn e−αkn t sin(kn x), kn = , (3.39)
n=1
L

1 Simply define fe (x) = [ f (x) + f (−x)]/2 and fo (x) = [ f (x) − f (−x)]/2.

78
��

��
�=�
�(��) �� = ��
� = ��
��
� = ��
��
��
�
Figure 3.2: Dashed-black: The function u0 (x) = x(x2 − 3Lx + 2L2 ), with L = 1. Colors: solution
u(x, t) for different times.

where n = 1, 2, 3, . . . and
ZL
2
bn = u0 (x) sin(kn x)dx. (3.40)
L
0

Example: Suppose
u0 (x) = T 0 x(x2 − 3Lx + 2L2 ), (3.41)
which is illustrated by the dashed curve in Fig. 3.2. This mimics something that is hot
in the middle and cold at the boundaries. The Fourier coefficients (3.40), as you may
quickly verify with the help of Mathematica, are

12T 0 L3 12T 0
bn = = 3 .
π3 n3 kn
Thus, the general solution will be
∞ 2
X e−αkn t sin(kn x) πn
u(x, t) = 12T 0 , kn = . (3.42)
n=1
kn3 L

This is illustrated for different times in Fig. 3.2 and as a pretty contour plot in Fig. 3.3.
As can be seen in both figures, due to the negative exponentials in Eq. (3.39), the
solution tends to zero in the long time limit: u(x, t) → 0 when t → ∞. This makes
sense since the boundaries are kept at zero temperature. Thus, the initial temperature
concentration at the middle simply flows out to the boundaries and eventually dies out.

Example: Boxcar Suppose that the initial temperature profile is a boxcar function
somewhere in the middle of [0, L]:

u0 (x) = θ(x − L/2 + )θ(L/2 + − x). (3.43)

79
Figure 3.3: Same as Fig. 3.2, but as a contour plot in (x, t).

Figure 3.4: Evolution of u(x, t) for u0 (x) being the boxcar function (3.43).

This represents a boxcar of width 2 centered around L/2. The result is presented in
Fig. 3.4. As can be seen, the initially hot part in the middle sort of dissipates in time
and eventually the entire bar reaches zero temperature when t → ∞.

Mismatch between boundary and initial conditions: Suppose the temperature pro-
file is initially given by u0 (x) = x. At x = 0 this will match the boundary condition
u(0, t) = 0, which is supposed to hold for all t. But at x = L the two will never match
because u0 (L) = L, while we are supposed to have u(L, t) = 0. What happens then?
The short answer is that this is an ill-posed problem because it implies that u(L, t) has
to change discontinuously from the value u(L, 0) = L to u(L, t) = 0 for any t > 0. You
are thus trying to solve for something that is unsolvable. The longer answer is that,
surprisingly, the heat equation kind of “adapts” to this situation.
To see what happens, let us look at the what the Fourier series of the initial condition
is doing. The Fourier coefficients bn in Eq. (3.40) become bn = −2(−1)n /kn . Let us

80
Figure 3.5: The Fourier series (3.44), with L = 1, which is trying to simulate a straight line
u0 (x) = x.

then consider the function

∞
X (−1)n
f (x) = −2 sin(kn x). (3.44)
n=1
kn

This was supposed to be the initial condition u0 (x). But is it? After all, the bn are
chosen precisely so that u(x, 0) = u0 (x) [Eq. (3.39)]. The results for this function, with
the sum involving a finite number of terms, is shown is shown in Fig. 3.5. The series is
trying to approximate the straight line u0 (x) = x. And it does a fairly good job at that,
except at x ∼ L.
At x ∼ L the function oscillates violently. This is the Gibbs phenomenon discussed
in Sec. 1.4. And, what is most important, the series always tends to zero when x → L.
This is seen in the smaller panels of Fig. 3.5, where I make a zoom around this region
and plot the result for different maximum values nmax . As can be seen, the function
wiggles around 1, but when x → L it always eventually falls down and touches zero.
It must do that because Eq. (3.44) is a sum of functions sin(kn x), which are identically
zero at x = L.
The moral of the story is that the Fourier expansion (3.44) cannot describe a func-
tion such as u0 (x) = x. If we wish to do that, we would need a Fourier series involving
sines and cosines. Then why not use cosines? Well... because cosines do not satisfy
the boundary conditions. Cosines fix the initial conditions, but break the boundary
conditions. This is the incompatibility I was talking about: it is impossible to solve the
Dirichlet problem if the initial conditions are not consistent with the boundary condi-
tions.
Well, what if we do it anyway? That is, what if we plug the bn found in (3.44)
into our general solution (3.39)? The result is shown in Fig. 3.6. As can be seen, for
any t > 0 the solution picks up the boundary conditions perfectly and the evolution is
smooth. To summarize, therefore, strictly speaking, it makes no sense to try to solve a
problem where the boundary and initial conditions do not match. That would require
a discontinuity in u(x, t). But still, if you are stubborn and try to do that anyway, you

81
Figure 3.6: Solution of the Dirichlet heat equation with initial conditions u0 (x) = x. For any
t > 0, the function satisfies the initial condition

will still find an answer. The heat equation tends to smooth things out and produce, for
all t > 0, a solution which is smooth and well behaved.

Dirichlet boundary conditions; non-zero temperature

Let us now see what if the boundaries are kept at non-zero temperatures:

u(0, t) = T 1 , u(L, t) = T 2 .

Instead of looking at the time-dependence, let us first analyze the steady-state. That is,
the solutions u ss after a long-time has passed. In this case ∂u ss /∂t = 0 and so Eq. (3.28)
reduces to
d2 u ss
= 0. (3.45)
dx2
This is a 2nd order linear ODE. The solution is trivial:

u ss = c0 + c1 x,

where c0 and c1 are determined by the boundary conditions:

u ss (0) = c0 = T 1 ,

u ss (L) = c0 + c1 L = T 2 .

Hence c0 = T 1 and c1 = (T 2 − T 1 )/L. The steady-state solution is therefore

(T 2 − T 1 )x
u ss (x) = T 1 + . (3.46)
L
The steady-state temperature profile is thus a linear interpolation between the temper-
atures at the boundaries, T 1 and T 2 .
The heat flux in the steady-state is
∂u ss α(T 2 − T 1 )
J ss = −α =− . (3.47)
∂x L

82
Suppose T 1 > T 2 (left side hot). Then T 2 − T 1 < 0 and thence J ss > 0. That is, heat
flows from hot to cold. This is a manifestation of the 2nd law of thermodynamics.
Great. Now let’s go back to the full time-dependent problem. We already know
the steady-state solution. To find the full time-dependent solution, we use the magic of
linearity: the function uss (x) solves (3.28). And so does u(x, t) in Eq. (3.39). Hence,
their sum must also be a solution. The general solution will thus be
∞
X 2
u(x, t) = u ss (x) + bn e−αkn t sin(kn x). (3.48)
n=1

This also matches the boundary conditions. The value of the coefficients bn are again
determined from the initial conditions, u(x, 0) = u0 (x). Setting t = 0 yields
∞
X
u ss (x) + bn sin(kn x) = u0 (x).
n=1

Thus, instead of Eq. (3.40), we now get

ZL h
2 i
bn = u0 (x) − u ss (x) sin(kn x)dx.
L
0

Let us summarize what we found.

Heat equation in 1D with Dirichlet BCs

Consider the general Dirichlet problem

ut = αu xx , (3.49)

u(0, t) = T 1 u(L, t) = T 2 ,

u(x, 0) = u0 (x).

The steady-state solution (for t → ∞) is

(T 2 − T 1 )x
u ss (x) = T 1 + . (3.50)
L
And the general, time-dependent solution is
∞
X 2
u(x, t) = u ss (x) + bn e−αkn t sin(kn x), (3.51)
n=1

83
where
ZL h
2 i
bn = u0 (x) − u ss (x) sin(kn x)dx. (3.52)
L
0

As you can see, adding non-zero temperatures at the boundaries does not introduce
many changes from a mathematical point of view (although, of course, it completely
changes the physics). For this reason, most textbooks focus on T 1 = T 2 = 0.

Example: Consider the Dirichlet problem

ut = αu xx , (3.53)

u x (0, t) = 2 u x (1, t) = 1,

u(x, 0) = 2 − x2 .

Since T 1 = 2, T 2 = 1 and L = 1, the steady-state solution (3.50) will be

u ss (x) = 2 − x.

The Fourier coefficients (3.52) will, in turn, be given by

4 1 − (−1)n

bn = .
kn3
Hence, the general solution (3.51) will be
∞
4 1 − (−1)n −αkn2 t
X
u(x, t) = 2 − x + e sin(kn x). (3.54)
n=1
kn3

This result is plotted in Fig. 3.7. As can be seen, the initial parabolic temperature
profile is slowly damped until it becomes the flat line u ss (x) = 2 − x.

Neumann boundary conditions

Next consider the Neumann problem

ut = αu xx , (3.55)

u x (0, t) = 0 u x (L, t) = 0,

u(x, 0) = u0 (x).

The difference here is in the second line, where we are fixing u x = ∂u/∂x, instead of u.
Since we set both u x (0, t) = u x (L, t) = 0, we are insulating the system, so that no heat
can flow to the outside.
To find the solution, we go back to the separation of variables. The basic solu-
tion (3.31) continues to be valid, with a, b and specially k, still to be determined. We

84
Figure 3.7: u(x, t) from Eq. (3.54), describing the heat equation with Dirichlet boundary condi-
tions u(0, t) = 2 and u(1, t) = 1. Left: as a function of x, for different t; Right: as a
density plot in the (x, t) plane.

now try to use it to impose the boundary conditions. Differentiating with respect to x,
we get
2

u x (x, t) = e−αk t − ak sin kx + bk cos kx .
Imposing u x (0, t) = 0 yields b = 0 so the solutions will be of the form cos kx. Next,
imposing u x (L, t) = 0 implies that

ak sin kL = 0.

Once again, this will be satisfied when k = nπ/L, with n = 1, 2, 3, . . .. Now, however,
there is one extra possibility: we can also have k = 0. When the solutions were sines,
k = 0 was not important since sin(0) = 0. But now that the solutions are cosines, k = 0
is meaningful. Thus, in this case, the set of allowed values of k becomes
nπ
kn = , n = 0, 1, 2, 3, . . . .
L
The general solution will thus be a linear combination of the form
∞
X 2
u(x, t) = an e−αkn t cos(kn x).
n=0

As in the Fourier business, it is convenient to separate the n = 0 case and relabel

the constant as a0 /2, instead of a0 . That is, we are going to parametrize our general
solution as
∞
a0 X 2
u(x, t) = + an e−αkn t cos(kn x).
2 n=1
What is interesting is that the term with n = 0 does not have a time-dependence asso-
ciated to it, which makes it special compared to all other terms since it does not decay.

85
Finally, we fix an from the initial conditions u(x, 0) = u0 (x). Again, we have exactly
the Fourier recipe:
ZL
2
an = u0 (x) cos(kn x)dx,
L
0
which holds also for n = 0 (which is why we parametrize the coefficient as a0 /2).

Heat equation in 1D with Neumann BCs

The general solution of the Neumann problem

ut = αu xx , (3.56)

u x (0, t) = 0 u x (L, t) = 0,

u(x, 0) = u0 (x),

is
∞
a0 X 2
u(x, t) = + an e−αkn t cos(kn x). (3.57)
2 n=1
where
ZL
2
an = u0 (x) cos(kn x)dx, (3.58)
L
0

Example: Consider the Neumann problem (3.55) with L = 1 and u0 (x) = x2 (2 − x2 ).

The Fourier coefficients (3.58) become
48(−1)n 14
an = , a0 = .
kn4 15
The general solution (3.57) will thus be
∞
14 X (−1)n −αkn2 t
u(x, t) = + 48 e cos(kn x).
30 n=1
kn4
This is plotted in Fig. 3.8. What we see now is that the initial temperature distribution
is homogenized along the chain. That is, after a sufficiently long time, the system tends
to a homogeneous solution. In fact, since a0 /2 is the only term which does not have an
exponential dependence, when t → ∞ we will simply get u ss (x) = 14/30.
In fact, this is actually general for Neumann conditions: after a long time has
elapsed, all terms with n > 0 will decay exponentially. The system will thus converge
to the steady-state
ZL
a0 1
u ss = = u0 (x)dx, (3.59)
2 L
0

86
Figure 3.8: u(x, t) for the Neumann boundary condition.

meaning the bar will tend to have a homogeneous temperature profile, given by the
average of the initial temperature distribution u0 (x). This happens because we insulated
the end-points, so that no heat can flow to the outside world. What the heat equation
predicts, therefore, is that the initial temperature profile u0 (x) will simply be distributed
uniformly through the bar.

Conserved quantity
Let
ZL
Q(t) = u(x, t)dx. (3.60)
0
In the case of particle diffusion, this represents the total number of particles in the
region [0, L]. In the case of heat, it represents the integrated temperature along the bar.
Now we start from the heat equation ut = αu xx and integrate from 0 to L:
ZL ZL
ut (x, t)dx = α u xx (x, t)dx.
0 0

The left-hand side is

ZL ZL
d dQ
ut (x, t)dx = u(x, t)dx = .
dt dt
0 0

The right-hand side, on the other hand, is

ZL ZL
∂u x h i
α u xx (x, t)dx = α dx = α u x (L, t) − u x (0, t) .
∂x
0 0

87
What I did here was to “remember” that u xx is the derivative of u x , and so the integral
of a derivative must be the function evaluated at the end points. But now we can relate
this to the heat current J(x, t) = −αu x (x, t). We therefore see that

ZL
α u xx (x, t)dx = J(0, t) − J(L, t).
0

Hence, combining everything we finally get

dQ
= J(0, t) − J(L, t). (3.61)
dt

This is a really nice result. It says that the net number of particles Q(t) in the system
can only change due to a flow of particles at the endpoints. Or that the net temperature
of the bar will only change if there is a flow of heat at the endpoints. To analyze the
consequences of this result, we look separately at the Neumann and Dirichlet cases.

Neumann BCs: In this case we explicitly set J(0, t) = J(L, t) = 0. Hence dQ/dt = 0,
so Q is a conserved quantity:

Q(t) = Q(0) (Neumann BCs).

We actually saw this in Fig. 3.8: the initial temperature profile simply spread out
through the system and eventually became uniform. What was not obvious from the
example was that this was indeed a conservation law. That is, that there was indeed a
certain quantity Q, which for Neumann problems do not change in time.

Dirichlet BCs: In this case J(0, t) and J(L, t) are not zero and so Q(t) is not a con-
served quantity. This is because we are keeping the end-points at a fixed temperature,
which can only be done if heat flows from the outside world. Let us take the exam-
ple analyzed in Fig. 3.7. The quantities Q(t), dQ/dt, J(0, t) and J(1, t) are plotted in
Fig. 3.9. What we see is that initially Q(t) changes with time. This is because we start
with a certain number of particles Q(0) = u0 (x)dx, which is allowed to change in
R

time. In this particular case, it decreases, but this is merely due to the choice of ini-
tial conditions. Notwithstanding, what actually matters is that as time progresses, Q(t)
tends to a constant, in general different from zero.. Consequently, dQ/dt → 0. But hav-
ing dQ/dt = 0 does not mean that the currents J ss (0) and J ss (1) are themselves zero.
What it means, according to Eq. (3.61), is that they should be equal: J ss (0) = J ss (1).
This is in fact what we see in the left-most plot of Fig. 3.9.
In the Dirichlet case we must therefore distinguish between the transient and the
steady-state. During the transient Q(t) changes and so all 3 terms in Eq. (3.61) will
be non-zero. But in the steady-state the system will no longer change, so dQ/dt = 0,
meaning that J ss (0) = J ss (L). This type of state is called a non-equilibrium steady-
state (NESS). It is a steady-state because stuff is no longer changing. But it is not in

88
��
��
-�� -��
��

��/��

��
-�� -��
�

�� J0
-�� -��
J1
-��
��
� � �

Figure 3.9: The quantities Q(t), dQ/dt, J(0, t) and J(1, t) for the same example as in Fig. 3.7.

equilibrium because there are still currents flowing. Conversely, if we happen to have
J ss (0) = J ss (L) ≡ 0, then we say the system is in an equilibrium steady-state (or simply
“equilibrium state” for short). This happens, for instance, if T 1 = T 2 .

Uniqueness and the energy method

Consider the Dirichlet problem

ut = αu xx , (3.62)

u(0, t) = 0 u(L, t) = 0,

u(x, 0) = u0 (x).

or the Neumann problem

ut = αu xx ,

u x (0, t) = 0 u x (L, t) = 0,

u(x, 0) = u0 (x).

We are going to show that the solution u(x, t), in both cases, is unique. To do that, we
introduce the notion of “energy”. This is a concept that appears in many PDEs. In
the wave and Schrödinger equations, energy will actually be an “energy”, with a clear
physical interpretation. In the heat equation, not so much: energy for us will only be a
quantity with convenient properties.
Start with ut = αu xx , multiply both sides by u(x, t) and integrate from 0 to L:

ZL ZL
uut dx = α uu xx dx.
0 0

On left-hand side we can write

du 1 d 2
uut = u = u.
dt 2 dt

89
Thus,
ZL ZL
d u2
uut dx = dx.
dt 2
0 0

As for the right-hand side, we integrate by parts:

ZL L ZL
uu xx dx = uu x − u2x dx.

0
0 0

The cross term vanishes, in the Dirichlet case because u(0, t) = u(L, t) = 0, and in the
Neumann case because u x (0, t) = uL (0, t) = 0. Thus,

ZL ZL
α uu xx dx = −α u2x dx.
0 0

Combining everything we then arrive at

ZL ZL
d u2
dx = −α u2x dx.
dt 2
0 0

Energy of the heat equation

We define the “energy” associated to u(x, t) as

ZL
1
E(t) = u2 dx. (3.63)
2
0

It then follows that

ZL
dE
= −α u2x dx 6 0. (3.64)
dt
0

For the heat equation, the energy can only decrease:

E(t) 6 E(0). (3.65)

We now use this to show that the solution is unique. We do it for the Dirichlet prob-
lem (3.62). The reasoning for the Neumann problem is absolutely analogous. Suppose
u1 and u2 are two solutions of this problem and define v = u1 − u2 . We are going to
show that v(x, t) = 0 for all x, t, and so u1 = u2 .

90
By linearity, v will satisfy the Dirichlet problem

vt = αv xx , (3.66)

v(0, t) = 0 v(L, t) = 0,

v(x, 0) = 0.

Please take a second to convince yourself that this is true. The important part is the
initial condition, which now reads v(x, 0) = 0. The energy associated to v will thus be

ZL ZL
1 1
E(t) = v(x, t) dx 6 E(0) =
2
v(x, 0)2 dx = 0
2 2
0 0

Since E(t) is a non-negative quantity by construction, the only remaining possibility is

that v(x, t) = 0 for all t, which implies u1 = u2 . qed.
I will leave it for you as an exercise to prove that the solution of Dirichlet and
Neumann problems with non-zero boundary conditions (e.g. u(0, t) = T 1 ) are also
unique.

3.5 Interlude: numerical solutions of differential equa-

tions
Before we continue with our discussion of PDEs, I want to stop for a second to
discuss the numerical solution of PDEs and ODEs. The first thing you need to realize
is that you should probably not write your own code. Here are some of the reasons
why:
• Differential equations is a big business. Weather forecast depends on it. So
does the financial market. There has been significant intelectual investment in
producing insanely sophisticated codes.
• Some of these algorithms were entire PhD theses of computer scientists; we
are talking about some of the brightest minds on the planet, dedicating years to
making incredible algorithms. How can we compete?
• There are tons of open- and closed-source libraries at the tip of your hand. Why
spend hours implementing an algorithm if you can solve the DE using 1 line of
Mathematica or python code?

I know this may seem a bit disappointing and I am pretty sure some of you may be
thinking “this guy is an idiot; he doesn’t know what he is talking about” (maybe you
were already thinking that before). But what you should keep in mind is the difference
between a mathematical blackbox and a physical blackbox. Embrace the former and
avoid the latter. I use the term blackbox here to mean a computer code that performs
some operation, but which you don’t exactly know how it is doing that. Some of these

91
blackboxes may very well be open source (that is, they are not really black), but the
code inside can be so complicated, that it would take too much time to fully understand
the algorithm. So, for all intents and purposes, they behave like blackboxes.
A mathematical blackbox is a computer code that performs a well defined mathe-
matical operation. For instance, finding the roots of a polynomial, or the eigenvalues
of a matrix, or solving an ODE. Codes for this are available in all scientific computing
libraries, such as Mathematica, Scipy/Numpy, Matlab, Julia etc. Physical blackboxes,
on the other hand, solve some specific physical problem. For instance, a physical black-
box may solve Newton’s law for a bunch of particles. You input the mass and initial
coordinates, as well as their interaction potentials between the particles. The code then
integrates Newton’s 2nd law. My general advice is to avoid blackbox libraries of this
type. Build your own. The reason is simple: physical simulations always involve as-
sumptions and approximations. But physical blackboxes don’t make that clear, so you
never know exactly what you are doing. In other words, physical blackboxes make you
dumb. Mathematical blackboxes, on the other hand, are fine: they are performing well
defined operations, not prone to ambiguities.
Even though you don’t need to understand the details of a mathematical blackbox
implementation, you still need to understand the basic idea. This way you can know
when the code is performing well or not. Or when your specific problem may require
some additional ingredient. In this section we are going to discuss the basic ideas
behind the numerical solution of differential equations. PDEs are usually solved by
mapping them into ODEs. And, in the end of the day, both are solved by using discrete
versions of derivatives. So this is where we start.

Finite differences
Consider a function f (x). The derivative of f (x) is defined in a Calculus textbook
as
f (x + ∆x) − f (x)
f 0 (x) = lim . (3.67)
∆x
∆x→0

In a computer there is no such thing as “∆x → 0”; we can choose ∆x to be small, but
it is always finite. The recipes on how to approximate derivatives, using a finite ∆x, go
by the name of finite differences.
The basic idea is to keep track of the error we make when we choose ∆x finite. We
can do that using a Taylor expansion2
1 00 1
f (x + ∆x) = f (x) + f 0 (x)∆x + f (x)∆x2 + f 000 (x)∆x3 + . . . . (3.68)
2 3!
The naı̈ve definition of the derivative, which we call forward difference, is
h i f (x + ∆x) − f (x) 1
∆F f (x) := = f 0 (x) + f 00 (x)∆x. (3.69)
∆x 2
2 Maybe you learned to do Taylor expansions around a fixed point x0 :
1 00
f (x0 )(x − x0 )2 + . . . .
f (x) = f (x0 ) + f 0 (x0 )(x − x0 ) +
2
They are both the same thing. You can get (3.68) by replacing x → x + ∆x and x0 → x.

92
As can be seen, the error we make is proportional to ∆x. We sometimes write this using
a notation introduced by Landau:
h i
∆F f (x) = f 0 (x) + O(∆x). (3.70)

The “big-O notation” means the error we are making is at least of the order of ∆x. The
quantity in front, in this case f 00 /2, may be big or small. But when we say that the error
is of the order of ∆x, it means that if we halve ∆x, we halve the error.
We could also have defined a backward difference:
h i f (x) − f (x − ∆x)
∆B f (x) := . (3.71)
∆x
In the idealized world of calculus, this is entirely equivalent to (3.67) when ∆x is in-
finitesimal. But for finite ∆x it is not. To find the error, we do a Taylor expansion of
f (x − ∆x). This is exactly like Eq. (3.68), but with ∆x → −∆x:
1 00 1
f (x − ∆x) = f (x) − f 0 (x)∆x + f (x)∆x2 − f 000 (x)∆x3 + . . . . (3.72)
2 3!
Thus, we see that
h i 1
∆B f (x) = f 0 (x) − f 00 (x)∆x.
2
The error is once again O(∆x). Although the prefactor differs a little, the order of the
error is the same.
With a bit of tweaking, however, one can come up with a much better differentiation
scheme, with error O(∆x)2 . This is done using centered differences:
h i f (x + ∆x) − f (x − ∆x)
∆C f (x) = . (3.73)
2∆x
How do we know that this is good? We use the Taylor expansions (3.68) and (3.72).
Let me write them side by side, so it is easier for you to see what is happening:
1 00 1
f (x + ∆x) = f (x) + f 0 (x)∆x + f (x)∆x2 + f 000 (x)∆x3 + . . . ,
2 3!
1 00 1
f (x − ∆x) = f (x) − f 0 (x)∆x + f (x)∆x2 − f 000 (x)∆x3 + . . . .
2 3!
If we now subtract the two, the terms proportional to f 00 will cancel and we will be left
with
2
f (x + ∆x) − f (x − ∆x) = 2 f 0 (x)∆x + f 000 (x)∆x3 .
3!
Thus, we see that the centered difference has an error
h i 1
∆C f (x) = f 0 (x) + f 000 (x)∆x2 , (3.74)
3!
which is O(∆x)2 , one order higher than the forward and backward differences. Now if
we halve ∆x, we improve the precision by a factor of 4. This is quite dramatic.

93
What about f 00 (x)? The naive approach would be to apply the forward difference
twice. I will spare you the details of these calculations, as they start to get clumsy real
quick. But if you do it, either by hand or using Mathematica, you will find
h i
∆F ∆F [ f (x)] = f 00 (x) + f 000 (x)∆x.

The error is thus O(∆x). We can definitely do better. One alternative would be to apply
the centered difference twice:
h i f (x + 2∆x) − 2 f (x) + f (x − 2∆x)
∆C ∆C [ f (x)] =
4∆x2
1 (iv)
= f 00 (x) + f (x)∆x2 .
3
The error is O(∆x)2 , which is great. But this now introduces another detail, which we
haven’t looked at so far: This derivative does not use f (x ± ∆x), but further away points
f (x ± 2∆x). We are sort of drifting away fast from the center point x.
It turns out we can get the best of both worlds. We can get an error O(∆x)2 and
still keep the derivative compact, involving only x and x ± ∆x. The trick is to do one
forward and one backward (or vice-versa):
h i h f (x) − f (x − ∆x) i
∆F ∆B [ f (x)] = ∆F
∆x
f (x+∆x)− f (x)
∆x − f (x)−∆x
f (x−∆x)
= (3.75)
∆x
Simplifying the formula, we then find
h i f (x + ∆x) − 2 f (x) + f (x − ∆x)
∆F ∆B [ f (x)] = (3.76)
∆x2
f (iv) (x) 2
= f 00 (x) + ∆x ,
12
where in the last line I used the Taylor expansions of f (x ± ∆x). This is a very famous
formula for the second derivative of a function: left plus right minus twice the center.
And, as we can see, the error is O(∆x)2 and it involves only nearest neighbor points.

Finite differences

A good approximation for the first and second derivatives of a function f (x)
are
f (x + ∆x) − f (x − ∆x)
f 0 (x) ' , (3.77)
2∆x
f (x + ∆x) − 2 f (x) + f (x − ∆x)
f 00 (x) ' . (3.78)
∆x2

94
Both have errors O(∆x)2 .

Systems of ODEs
Consider a 1st order ODE of the form
dy
= g(t, y), (3.79)
dt
where g is an arbitrary function of both t and y (it can be arbitrarily non-linear). It
turns out that most ODEs, even those or order higher than 1, can always be written as
a system of ODEs of the form (3.79). For instance, consider the ODE

ÿ + 2γẏ + ω2 y = f (t),

that was studied in detail last chapter. Define r = ẏ. Then ÿ = ṙ and we can write this
as the system

ẏ = r

ṙ = −2γr − ω2 y + f (t)

Thus, quite generally, ODEs can be written in the form

dyi
= gi (t, y1 , y2 , . . .), (3.80)
dt
for a system of variables yi . As far as computers are concerned, this is all we actually
need.

Runge-Kutta like methods

Based on the above, let us then focus on (3.79). The most naive method to solve it
is to write dy/dt as a forward derivative:
dy y(t + ∆t) − y(t)
= .
dt ∆t
We discretize time in small steps,

tn = n∆t, n = 0, 1, 2, 3, . . . .

We also denote by yn = y(tn ), the variable y at the discrete times. Then the discrete
version of (3.79) would read
yn+1 − yn
= g(tn , yn ).
∆t
This can be used to get an update rule, telling us how to build yn+1 from yn :

yn+1 = yn + g(tn , yn )∆t, (3.81)

95
which is known as Euler’s method .
This method sucks, however. You should never use it. The problem is that we are
using the forward difference, which has a very low precision O(∆t). If you want a much
much better method, with almost minimal effort, use the midpoint method. Here is the
idea. Instead of looking only at times tn and tn+1 = tn + ∆t, we focus on the midpoint,
tn+1/2 = tn + ∆t/2. Of course, that by itself would not give higher precision because if
we simply use halve the interval, the error would be ∆t/2, which is still linear in ∆t.
Instead, the idea is to use a a centered difference (3.73) on this midpoint, which we
know has an error O(∆t2 ):
yn+1 − yn
ẏ(tn+1/2 ) ' . (3.82)
∆t
The denominator is ∆t, instead of 2∆t, because in this case the step is ∆t/2. The
ODE (3.79) establishes that ẏ = g(t, y). Thus, the left-hand side of (3.82) must be

ẏ(tn+1/2 ) = g tn+1/2 , yn+1/2 .

Isolating yn+1 from the right-hand side of (3.82), we then get

yn+1 = yn + g tn+1/2 , yn+1/2 ∆t. (3.83)

This is not the end of the story, because we still need to know y(tn+1/2 ). We can
estimate it using a Taylor series expansion:
∆t
yn+1/2 ' y(tn ) + ẏ(tn )
2
∆t
= yn + g(tn , yn ) .
2
This is the missing piece to finish the algorithm. Feeding this in the right-hand side
of (3.83) then leads to:

yn+1 = yn + g tn+1/2 , yn + g(tn , yn )∆t/2

Midpoint method

A simple but good method for solving the ODE (3.79) is the midpoint algo-
rithm, based on the following update rule:

yn+1/2 = yn + g(tn , yn )∆t/2, (3.84)

yn+1 = yn + g(tn+1/2 , yn+1/2 )∆t. (3.85)

This can also be combined into a single line:

yn+1 = yn + g tn+1/2 , yn + g(tn , yn )∆t/2 ∆t. (3.86)

96
which shows how to update directly from yn to yn+1 .

The midpoint method serves to illustrate what is one of the basic principles in the
numerical solution of ODEs: To update from yn to yn+1 , we first find intermediate
points, at times t ∈ [tn , tn+1 ] where to evaluate the function (in this case the point
tn+1/2 ). The update then combines yn with function evaluations at these intermediate
points to update to yn+1 . The relative contribution of each midpoint, however, comes
with different weights, for instance in (3.84) there is a factor of 1/2 and in (3.85) there is
none. This is due to the clever use of finite differences to approximate the derivatives.
Ultimately, the factors are carefully chosen to build an algorithm whose error scale
much better with ∆t than the simple Euler step.
This introduces two ideas: the algorithm’s order p and the number of stages s.
An algorithm of order p has an accumulated error that scales as O(∆t) p . Stages, on
the other hand, refer to the number of intermediate function evaluations required for
updating yn to yn+1 . The midpoint algorithm we just presented is therefore of order
p = 2 and also has s = 2 stages. It can be shown that for p = 1, 2, 3, 4, one must
always have s > p. That is, if you want an algorithm of order p = 4, you need at least
s = 4 stages. For p = 5, 6, 7, . . ., things become trickier and it is not known whether
the bound s = p can ever be reached. For instance, all currently known methods of
order p = 8 have at least s = 11 stages.

Runge-Kutta fourth order method

A very popular 4th order method is the Runge-Kutta 4 (RK4). (It would be
funny to have a 7th order method invented by scientists with initials C and R).
It reads
yn+1 = yn + k1 + 2k2 + 2k3 + k4 ∆t/6,

(3.87)
where

k1 = g(tn , yn ), (3.88)

k2 = g tn+1/2 , yn + k1 ∆t/2 ,

(3.89)

k3 = g(tn+1/2 , yn + k2 ∆t/2) (3.90)

k4 = g(tn+1 , yn + k3 ∆t). (3.91)

For some people, RK4 is the final word in ODE solving. But modern algorithms
show that one can do better. Much better.

Adaptiveness, Multistep, Stiffness

Numerical libraries such as Matlab, Mathematica, Scipy/Numpy and others nowa-
days use methods that have at least 3 additional, and game-changing, features. The
most basic is the use of adaptive step sizes. When we integrate an ODE, sometimes
the solution can be very smooth (change very little with time). But then, in some other

97
��
��
��

�(�)�Δ�
��
��
��
��
� � ��
�

Figure 3.10: Red: numerical solution using Mathematica’s NDSolve, of the ODE y0 = y cos(x +
y), starting at y(0) = 0.2. Black: the step sizes at each step of the numerical
solution, which uses adaptive step sizes.

regions, it may change extremely fast. A good algorithm should be able to adapt and
take big step sizes when nothing is happening and small step sizes when the action
really starts. This is not so trivial, because it requires some way of keeping track of
the errors that are being made. But practically all numerical libraries use adaptive step
sizes nowadays. An example is shown in Fig. 3.10. The solution is plotted in red, while
in black I show the size of the step taken at each point. As can be seen, in many regions
the step taken can be very small. And in others very big.
The second major feature of modern algorithms is “multistep”. The RK4 algo-
rithm uses the function evaluated at tn and tn+1/2 to update yn+1 . But then, in the next
step, we don’t use these points again to update from yn+1 to yn+2 . Multistep methods
recycle previous calculations. It helps to think in terms of cost; i.e., computer time.
The costly part of an ODE solver is the evaluation of g(t, y). A method with s stages
will therefore evaluate it s times for each step. Modern algorithms may very well use
s = 10, so that can be a lot. It therefore becomes essential that we recycle evaluations
of g(t, y) from previous steps in the next one. This is the idea behind multistep meth-
ods. I will not write down the explicit algorithm, as they can get a bit ugly. But if you
want, you can check them out here. There are two main multistep methods, Adams
and BDF (backward differentiation formulas, not BTS!). Both methods have varying
orders, but try to use previous steps in future calculations.
The difference between Adams and BDF is that the latter handles stiffness. This
term, although widely used in the literature, does not have a precise definition. But
loosely speaking, a stiff equation is one for which a very small ∆t’s is fundamental.
What I mean is the following. In some systems, big ∆t’s simply mean low precision,
but the general shape of the solution remains the same. But in other systems, big ∆t’s
can lead to completely crazy behavior (like the function diverging, for instance), so
very small ∆t’s need to be used. These systems are said to be stiff and BDF methods
are better at handling them (for somewhat technical reasons that is better not to get
into). Mathematica and Scipy/Numpy use Adams and BDF side by side. Adams is
faster, but BDF is better for stiff systems. Most solvers are in fact smart: they can
detect stiffness and automatically switch from Adams to BDF.

98
To solve PDEs, convert them into ODEs
Let us now talk about PDEs. To be concrete, I will choose the heat equation with
Dirichlet boundary condition as an example:

ut = αu xx , (3.92)

u(0, t) = 0 u(L, t) = 0,

u(x, 0) = u0 (x).

There are many libraries for solving PDEs of this form. They use either one of two
methods: finite differences or finite elements. Both methods convert (3.92) into a
system of coupled ODEs. Finite differences accomplish that by discretizing the deriva-
tives, using the Taylor formulas we developed earlier. Finite elements, on the other
hand, expand u in a certain basis of functions, whose properties are well known. Finite
elements is generally superior. But it is also more sophisticated and takes some effort
to implement. For this reason, I will only talk about finite differences here.
We start by discretizing space into steps of some small number ∆x:

x j = j∆x, j = 0, 1, 2, . . . , N,

where N = L/∆x is a large integer defining the number of points in the spatial grid.
Each position x will have associated to it a temperature

u j (t) = u(x j , t).

In Eq. (3.92) we have a 2nd derivative, so it is convenient to use Eq. (3.78) to discretize
it
u j+1 − 2u j + u j−1
u xx = ,
∆x2
which we know is O(∆x)2 . This converts the heat equation into a system of coupled
ODEs:
α
u̇ j = u j+1 − 2u j + u j−1 , (3.93)
∆x 2

where I wrote u̇ j for ut (x j , t). A nice feature of this system is that the right-hand side is
linear. We therefore have in our hands a system of linear coupled ODEs, which can be
solved quite easily. Other PDEs, like the Navier-Stokes equation of fluid mechanics,
will in general be non-linear.
In principle there are N + 1 coupled equations in (3.93), for u0 , u1 , . . . , uN . But
we also need to treat the boundary conditions. They read u(0, t) = u0 (t) = 0 and
u(L, t) = uN (t) = 0 where, recall, N was defined so that L = N∆x. Thus, out of the
N +1 points x0 , x1 , . . . , xN in our grid, the two on the boundaries are trivially fixed. That
is, we have in practice only a system of N − 1 coupled ODEs. Moreover, for this same
reason, we have to be a bit careful when we use Eq. (3.93) with j = 1 or j = N − 1. In
the former, since u0 = 0, we should have
α
u̇1 = u2 − 2u1 ,
∆x 2

99
��
��
��

��
�(��)

�� 0.05 ��

�� 0.1
�� 0.3 ��
��
��
��
� �

Figure 3.11: Numerical solution of the Dirichlet problem (3.92) with u0 (x) = x(1 − x), for t =
0.05, 0.1 and 0.3. Left: The solid lines were obtained using Mathematica’s built-in
NDSolve, while the dots refer to the discretized system of ODEs (3.93), with the
interval [0, 1] discretized with N = 20 points. Right: the difference between the
two solutions.

and for j = N − 1,
α
u̇N−1 =
− 2uN−1 + uN−2 .
∆x 2

To be compact, we usually write only Eq. (3.93), but with the additional caveat that
u0 = uN = 0.
Now that we have converted this into a system of ODEs, we can use the algorithms
discussed previously to solve them. In libraries such as Mathematica, a PDE such as
the heat equation can be solved automatically using NDSolve[]. That is, you don’t
need to do this type of discretization. Mathematica will do it for you, under the hood.
But knowing that this is what is happening is useful. For instance, particular in higher
dimensions, handling less conventional boundary conditions can be hard. That is, if
you want to solve for instance the heat equation for a bar that does not have a simple
shape.
An example of how all this works is shown in Fig. 3.11. See also the accom-
panying Mathematica notebook. What I did was to solve numerically the Dirichlet
problem (3.92) with u0 (x) = x(1 − x). The solid lines are obtained using NDSolve[]
directly. The points, on the other hand, refer to the solution of the discretized sys-
tem 3.93, with N = 20 points. As can be seen, even such a small discretization grid, of
N = 20, is already enough to produce a decent answer. The error between the two is
shown in the right-hand plot and is around 10−4 for initial times.

3.6 Laplace’s equation in 2D

The heat equation in arbitrary dimensions reads

∂u
= α∇2 u.
∂t
As we have seen in the previous section, the solutions of this equation tend to relax,
in the long-time limit, to a steady-state, which is time-independent. This steady-state

100
must therefore satisfy ∂u/∂t = 0. Or, what is equivalent, Laplace’s equation

∇2 u = 0, (3.94)

subject to the same boundary conditions as the time-dependent problem (but no ini-
tial conditions to worry about). Laplace’s equation therefore describes the steady-state
temperature profile of a material, in any dimension. It also appears in electromag-
netism, with u being now the electrostatic potential.
We already solved Laplace’s equation in 1D and the result was not a lot of fun. In
that case the equation reduced to u xx = 0, so the solution had to be of the form

u(x) = c1 + c2 x,

for constants c1 , c2 . In 2D and 3D, on the other hand, the set of allowed solutions turns
out to be much richer. In this section we are going to study Laplace’s equation in 2D:

∇2 u = u xx + uyy = 0. (3.95)

The first thing to consider in this case, is that the geometry of the system becomes
important. We are going to focus on rectangular systems. That is, some piece of
material with sides L x and Ly , as depicted in Fig. 3.12. This choice really makes a
difference. We could’ve, for instance, solved it for a disk, or some other irregular
shape. The mathematical structure of the solutions would be completely different.
Once the geometry is fixed, the next step is to specify the boundary conditions. For
the rectangle in Fig. 3.12, we have 4 boundaries to specify: bottom, top, left, right. And
the BCs can be either Dirichlet or Neumann. Dirichlet conditions, as before, specify
the value of the function at the boundaries. Something like

(bottom) u(x, 0) = f0 (x), (3.96)

(top) u(x, Ly ) = f1 (x), (3.97)

(left) u(0, y) = g0 (y), (3.98)

(right) u(L x , y) = g1 (y). (3.99)

The temperature does not have to constant at the boundaries. Thus, at the bottom and
top walls, the BC can be a function of x, and at the left and right walls, it can be a
function of y.
We can also work with Neumann conditions, where instead of specifying the value
of u at the boundaries, we specify its derivative. But now things get trickier because we
have two variables, so which derivative do we specify? Recall that Neumann conditions
are meant to specify the heat flux J = −α∇u. Thus, Neumann conditions usually focus
on the component of the current that is normal to the surface. At the bottom and top
walls, the relevant normal component is uy and at the left and right walls the relevant

101
�
��(�)
��

��(�) ��(�)

�
��(�) ��

Figure 3.12: Rectangular geometry we are going to use to study Laplace’s equation (3.95).

component is u x . Thus, Neumann conditions have the form

(bottom) uy (x, 0) = f0 (x), (3.100)

(top) uy (x, Ly ) = f1 (x), (3.101)

(left) u x (0, y) = g0 (y), (3.102)

(right) u x (L x , y) = g1 (y), (3.103)

for other arbitrary functions f0 , f1 , g0 and g1 .

Of course, in addition we could also mix these boundary conditions and have
Dirichlet on some walls and Neumann in others. There are, therefore, a total of 16
possibilities. At first this may seem a bit overwhelming. But there is actually a nice
trick which can greatly facilitate our lives: the principle of superposition. For in-
stance, let ub (x, y) denote the solution of the Dirichlet problem

ub (x, 0) = f0 (x), ub (x, Ly ) = ub (0, y) = ub (L x , y) = 0.

That is, with the bottom wall kept at f0 (x) and the other 3 walls at zero. Similarly, let
ut (x, y) denote the solution of the Dirichlet problem with the top wall being non-trivial
and the other 3 kept at zero:

ut (x, Ly ) = f1 (x), ut (x, 0) = ut (0, y) = ut (L x , y) = 0.

Now consider the sum of the two, u = ub + ut . This satisfies Laplace’s equation, due to
linearity. Moreover, it satisfies the Dirichlet boundary conditions

u(x, 0) = f0 (x), u(x, Ly ) = f1 (x), u(0, y) = u(L x , y) = 0.

Moral of the story: if you want to solve the Dirichlet problem with each wall kept
in a different configuration, first separately solve the problems in which one wall is
non-trivial and all others are kept at zero. Then add up the result.

Separation of variables
The starting point to solve any such problems is, once again, the method of sep-
aration of variables. That is, we look for solutions of the form u(x, y) = X(x)Y(y).

102
Plugging this in (3.95) then yields
X 00 Y 00
=− = a, (3.104)
X Y
where a is a constant, since the left-hand side depends only on x and the right-hand
side only on y. Thus, we find two equations:

X 00 (x) = aX(x), (3.105)

Y 00 (y) = −aY(y). (3.106)

I am going to be quite careful about the nature of this constant a. What we know is that
a must be real, since we are interested in real solutions. But it can be either positive or
negative. The goal is to figure out precisely what values of a lead to valid solutions. The
fact that the signs of Eqs. (3.105) and√(3.106) must
√ be different has deep consequences:
√ of X = aX are either e or e− ax , while the solutions of Y 00 = −aY are
00 ax
the√ solutions
ei ay or e−i ay . Thus, the solutions will always be either real or complex exponentials,
depending on whether a itself is positive or negative: if a > 0 the X solution will be
a real exponential and the Y solution will be oscillatory. But if a < 0 then the roles
are inverted. This is the new feature which makes the 2D Laplace equation richer than
its 1D counterpart: now we can construct non-trivial solutions which oscillate in one
direction, but are damped in another. We will see exactly how this unfolds, by looking
at a specific example.

Dirichlet for the bottom wall

Consider the Dirichlet problem:

u xx + uyy = 0, (3.107)

u(x, 0) = f0 (x) (3.108)

u(x, Ly ) = u(0, y) = u(L x , y) = 0. (3.109)

We start looking at Eq. (3.105) for X(x). Since u(0, y) = u(L x , y) = 0, this equation
must be subject to X(0) = X(L x ) = 0. The X problem thus reduces to:

X 00 (x) = aX(x), X(0) = X(L x ) = 0.

This is the same problem found in the 1D heat equation. The solution demands some-
thing that goes up and then goes down. So it cannot be a real exponential, since ex-
ponentials are either monotonically increasing or decreasing. The solution must thus
be oscillatory. That is, a < 0. In fact, we already know from our work on the heat
equation that the solutions only exist provided
!2
nπ
a=− := −kn2 , n = 1, 2, 3, . . . . (3.110)
Lx
and they are of the form X(x) = sin(kn x).

103
Next we turn to Eq. (3.106) for Y(y), which is now modified to

Y 00 = kn2 Y.

Hence, the Y solutions must be real exponentials, either ekn y or e−kn y . We assume linear
combinations of the form
Yn (y) = an ekn y + bn e−kn y ,
where an and bn are constants that will be used to impose the remaining boundary
conditions, u(x, 0) = f0 (x) and u(x, Ly ) = 0. The latter, in particular, forces an ekn Ly =
−bn e−kn Ly , so the solution must have the form

Yn (y) = an ekn y − e−kn y e2kn Ly .

We can make things a bit more symmetrical if we define an = An e−kn Ly /2, which we
can do, of course, since these are just constants anyway. This allows us to write the
solution as
An kn (y−Ly )
Yn (y) = − e−kn (y−Ly ) = An sinh kn (y − Ly ) .

e
2
Combining everything, the general solution will then have the form
∞
X
u(x, y) =

An sinh kn (Ly − y) sin(kn x).
n=1

Finally, we impose the remaining boundary condition on the bottom wall:

∞
X
u(x, 0) = An sinh(kn Ly ) sin(kn x) = f0 (x).
n=1

The factor sinh(kn Ly ) is a bit ugly, but it is just a constant which can be absorbed into
An . It would, in fact, be more convenient if we relabel An → An / sinh(kn Ly ). That is,
write the solution as
∞
X sinh kn (Ly − y)
u(x, y) = An sin(kn x).
n=1
sinh(kn Ly )

We have the freedom to to this because An is a constant. And the only reason we do it
like this is because now the initial condition reads
∞
X
u(x, 0) = An sin(kn x) = f0 (x),
n=1

so we immediately see that An will be the Fourier sine coefficients of f0 (x):

ZLx
2
An = f0 (x) sin(kn x)dx.
Lx
0

104
Figure 3.13: Example of the solution (3.114) for u(x, 0) = x(x2 − 3x + 2), with L x = 1, Ly = 1/2.

Dirichlet bottom wall

The general solution of the Dirichlet problem

u xx + uyy = 0, (3.111)

u(x, 0) = f0 (x) (3.112)

u(x, Ly ) = u(0, y) = u(L x , y) = 0. (3.113)

is
∞
X sinh kn (Ly − y) nπ
u(x, y) = An sin(kn x), kn = , (3.114)
n=1
sinh(kn Ly ) Lx
where
ZLx
2
An = f0 (x) sin(kn x)dx. (3.115)
Lx
0

Example: suppose f0 (x) = x(x2 −3x+2) with L x = 1 and Ly = 1/2. Then Eq. (3.115)
yields An = 12/kn3 . The solution is shown in Fig. 3.13. As can be seen, since the other
3 walls (left, right, top) are all kept at 0, the temperature profile of the bottom wall is
attenuated and damped out toward the middle of the slab.

Semi-infinite slab
A particular case of the above solution is when the slab of material is semi-infinite.
That is, when Ly → ∞. Recall that sinh(x) = (e x − e−x )/2. When x is large, the 2nd
exponential becomes negligible and we can thus approximate sinh(x) ' e x /2. Hence,

105
in the limit where Ly is large, the hyperbolic sines in Eq. (3.114) become
ekn (Ly −y)

sinh kn (Ly − y)
' k L = e−kn y .
sinh(kn Ly ) en y
Thus, in this case the general solution (3.114) is simplified to
∞
X nπ
u(x, y) = An e−kn y sin(kn x), kn = , (3.116)
n=1
Lx

whereas the expression for An remain unchanged [Eq. (3.115)]. This result nicely il-
lustrates how we have oscillations on one direction and exponential damping on the
other. This is the cool mechanism that allows us to find such a rich set of solutions to
Laplace’s equation in 2D.

Summary of all Dirichlet problems

Once we know how to solve for the bottom wall, it is not hard to adapt the solution
to the other 3 Dirichlet cases (top, left, right). For instance, we can adapt it to the top
wall by replacing sinh kn (Ly − y) with sinh(kn y), since this will be zero when y = 0.

Similarly, to get the left and right walls instead, we just exchange the roles of y and x.
Summarizing, we then get:
• Bottom wall: u(x, 0) = f0 (x).
∞
X sinh kn (Ly − y) nπ
u(x, y) = An sin(kn x), kn = , (3.117)
n=1
sinh(kn Ly ) Lx

ZLx
2
An = f0 (x) sin(kn x)dx.
Lx
0

• Top wall: u(x, Ly ) = f1 (x).

∞
X sinh(kn y) nπ
u(x, y) = An sin(kn x), kn = , (3.118)
n=1
sinh(kn Ly ) Lx

ZLx
2
An = f1 (x) sin(kn x)dx.
Lx
0

• Left wall: u(0, y) = g0 (x).

∞
X sinh kn (L x − x) nπ
u(x, y) = An sin(kn y), kn = , (3.119)
n=1
sinh(kn L x ) Ly

ZLy
2
An = g0 (y) sin(kn y)dy.
Ly
0

106
Figure 3.14: Solution of BVP0000 ( f0 , f1 , g0 , g1 ) by adding up the solutions for each wall. Each
wall is taken to be in a boxcar, centered around the middle of the slab, of width 0.3
and temperature 1, for bottom and top, and 2 for left and right. The big plot on the
right is what happens when we sum the solutions. I also set L x = 1 and Ly = 1.5.

• Right wall: u(L x , y) = g1 (y).

∞
X sinh(kn x) nπ
u(x, y) = An sin(kn y), kn = , (3.120)
n=1
sinh(kn L x ) Ly

ZLy
2
An = g1 (y) sin(kn y)dy.
Ly
0

Notice that for the bottom and top walls, the quantization values of the kn depend on L x ,
while in the left and right walls it depends on Ly . An example of how to add up these
solutions to consider more complicated boundary conditions is shown in Fig. 3.14.

Insulating the left and right walls

Another fun situation is when the left and right walls are insulated, meaning

u x (0, y) = u x (L x , y) = 0.

This means heat can only flow through the bottom or top walls. To obtain the full
solution, we of course still need to specify two more boundary conditions, at bottom

107
and top. But let us see what general conclusions we can draw by assuming only that
the left and right walls are insulated.
The X part continues to satisfy X 00 = aX, but is now subject to Neumann conditions
X (0) = X 0 (L x ) = 0. An identical problem was studied in the 1D heat equation. The
0

fact that the derivative is zero at the border can only happen for oscillatory solutions
(that is, a < 0). Hence, we redefine a := −k2 . The solutions will then be X(x) =
a cos(kx) + b sin(kx). But X 0 (0) = bk = 0, so we must have b = 0, meaning the
solution in this case has to contain only cosines, X(x) = a cos(kx). Next we impose
X 0 (L x ) = −ak sin(kL x ) = 0, which means we must have
nπ
kn = , n = 0, 1, 2, 3, . . . .
Lx
In this case n = 0 is also included, since it leads to a non-trivial solution X0 (x) = const.
The solutions are thus either X0 (x) = const or Xn (x) = cos(kn x).
Next we turn to Y 00 = −aY = kn2 Y. Now something quite interesting happens: we
need to separately treat n = 0 and n , 0. The reason is that, when n = 0 the ODE for
Y becomes Y 00 = 0, whose solution is Y0 (y) = c1 + c2 y. This is fundamentally different
from the n , 0 case, where the ODE is Y 00 = kn2 Y and thus admits real exponentials
an ekn y + bn e−kn y . Thus, summarizing, the solutions are:

n=0: X0 (x)Y0 (y) = c1 + c2 y

n,0: Xn (x)Yn (y) = (an ekn y + bn e−kn y ) cos(kn x).

And the general solution will have to be linear combinations of these solutions.

Insulated left and right walls

The general solution of

u xx + uyy = 0, (3.121)

u x (0, y) = u x (L x , y) = 0, (3.122)

will be of the form

∞
X
u(x, y) = c1 + c2 y + (an ekn y + bn e−kn y ) cos(kn x), (3.123)
n=1

where kn = nπ/L x . To determine the remaining constants c1 , c2 , an and bn , one

still needs to specify the boundary conditions at the bottom and top walls.

108
Dirichlet on bottom and top: suppose the bottom and top walls are fixed at u(x, 0) =
T 1 and u(x, Ly ) = T 2 . Imposing this in Eq. (3.123) yields
∞
X
u(x, 0) = c1 + (an + bn ) cos(kn x) = T 1 , (3.124)
n=1
∞
X
u(x, Ly ) = c1 + c2 Ly + (an ekn Ly + bn e−kn Ly ) cos(kn x) = T 2 . (3.125)
n=1

These are easily solved by setting an = bn = 0. We then get c1 = T 1 and c2 =

(T 2 − T 1 )/Ly . Thus, in this case the solution is exactly like in the 1D case,

(T 2 − T 1 )
u(x, y) = T 1 + y.
Ly

This is independent of x, which makes sense on symmetry grounds: there is nothing

distinguishing one point x from another in the BCs, so the solution can only depend on
y.

Dirichlet on bottom and top (inhomogeneous): The last example is a very special
case, where the dependence on x vanishes entirely. Let us consider a more interesting
Dirichlet example. Suppose that u(x, 0) = 0 but u(x, Ly ) = f1 (x). Imposing u(x, 0) = 0
in Eq. (3.123) leads to
∞
X
u(x, 0) = c1 + (an + bn ) cos(kn x) = 0,
n=1

which implies that c1 = 0 and bn = −an . Thus, the solution must have the form
∞
X
u(x, y) = c2 y + an sinh(kn y) cos(kn x), (3.126)
n=1

where I relabeled an → an /2 to make the sinh appear. Next we impose u(x, Ly ) = f1 (x).
This means
∞
X
u(x, Ly ) = c2 Ly + an sinh(kn Ly ) cos(kn x) = f1 (x).
n=1

We are now in familiar Fourier territory: recall the recipe

∞ ZLx
a0 X 2
+ an cos(kn x) = f (x) → an = f (x) cos(kn x)dx.
2 n=1 Lx
0

Thus,
ZLx
1
c2 Ly = f1 (x)dx,
Lx
0

109
Figure 3.15: Example of the solution (3.126), with the left and right walls insulated, with
u(x, 0) = 0 and u(x, Ly ) given by Eq. (3.127).

and
ZLx
2
an sinh(kn Ly ) = f1 (x) cos(kn x).
Lx
0

To provide a concrete example, suppose f1 is a Boxcar entered at L x /2 and of width

∆:
f1 (x) = θ x − (L x − ∆)/2 θ (L x + ∆)/2 − x . (3.127)
The corresponding solution is shown in Fig. 3.15 and looks like a lamp in the ceiling:
the temperature spot at the top wall propagates downwards toward y = 0. And this
happens without any heat exchange at the borders.

3.7 Electromagnetic waves, strings and vibrations in solids

We now turn to the wave equation (3.3). In this section I will show you three
different physical scenarios where this equation may naturally appear:
• Electromagnetism;
• Vibrations of a string or membrane.

• Vibrations of a solid;
The solutions of the wave equation itself will be explored in the next section.

110
Electromagnetic waves
Maxwell’s equations for the electric and magnetic fields, E and B, in the absence
of charges or currents, reads

∇ · E = 0, (Gauss) (3.128)
∂B
∇×E = − , (Faraday) (3.129)
∂t
∇ · B = 0, (No magnetic monopoles) (3.130)
1 ∂E
∇×B = (Ampère-Maxwell). (3.131)
c2 ∂t
√
where c = 1/ µ0 0 and µ0 and 0 are the permeability and permittivity of free space
respectively. In the presence of charges or currents, we have to modify Eqs. (3.128)
and Eq. (3.131) respectively.
The following derivation is a bit basic, and you may have seen it before. But I think
it is one of those things we should all do at least once in our lives, so I put it here for
completeness. We start by taking the curl of Eq. (3.129):
∂B ∂ 1 ∂2 E
∇ × (∇ × E) = ∇ − =− ∇×B =− 2 2 .

∂t ∂t c ∂t
We now use the general vector identity

∇ × (∇ × E) = ∇(∇ · E) − ∇2 E,

where ∇2 E is the vector whose i-th component is ∇2 Ei . But because of (3.128), the
first term vanishes and we are thus left only with

1 ∂2 E
−∇2 E = − .
c2 ∂t2
Thus, we conclude that each component of the electric field satisfies the wave equation:

1 ∂2 E
− ∇2 E = 0. (3.132)
c2 ∂t2
The same is also true for the magnetic field. You can check it by taking instead the curl
of Eq. (3.131), and then repeating the same procedure.
We therefore conclude that each component of the electric and magnetic fields sat-
isfy the wave equation.

Vibrating string
Next we consider a vibrating string. Imagine a guitar string in one dimension,
along direction x. It stays perfectly flat if we don’t play it, or vibrate in the direction
u perpendicular to x, when we do (Fig. 3.16). The motion of the string can thus be
described by the function u(x, t), which says how much the string is pushed away from

111
� ��
��
��

�
��

Figure 3.16: The displacement u of a vibrating string in one dimension.

� →
��
θ�

θ� →
��

�
��

Figure 3.17: The displacement u of a vibrating string in one dimension.

equilibrium at point x in space and time t. We can prove that if the displacement is
small, u(x, t) will satisfy the wave equation. The basic idea is to assume that each atom
in the string can only move up or down (i.e. along the u direction) and never left or
right (along the x direction). This is called the transverse vibration approximation. It
essentially means the vibrating string cannot transport matter from left to right. It can
only wiggle up and down. What is fun about waves is that notwithstanding, one can
still transport energy, as we will see.
We discretize the string by considering small pieces of length ∆x. The interval
[0, L] is discretized as

x j = j∆x, j = 0, 1, 2, . . . , N,

where N = L/∆x. We let u j (t) = u(x j , t) denote the displacement of the small piece of
string located at position x j . The mass of this string is m ' ρ∆x, where ρ is the density
of the string. In the end of the calculation, we will take the limit ∆x → 0.
The basic force diagram is illustrated in Fig. 3.17. The only forces acting on the
piece of string u j are the tensions associated to the piece to its left (u j−1 ) and to its right
(u j+1 ). Other forces, such as gravity, can be neglected. This makes sense if you think
about a guitar string, which is highly tensioned. The horizontal and vertical forces

112
acting on this small piece will thus be

Fh = T 2 cos θ2 − T 1 cos θ1 ,

Fv = T 2 sin θ2 − T 1 sin θ1 ,

where T i = |T~i | is the magnitude of each tension and θ1 and θ2 are the angles that the
string makes to the left and right.
The horizontal force would cause the atoms in the string to move to the left or
right. Since we are assuming the vibrations are transverse, this cannot happen so the
horizontal components must vanish,

T 2 cos θ2 = T 1 cos θ1 ≡ T. (3.133)

This is one of the basic assumptions in the derivation and is a very special property of
transverse vibrations. It means that the horizontal tension is the same on the left and
right sides and, hence, must be the same at all points x j of the string. This quantity T
is thus just a property of the string. It is related, for instance, to how hard you stretched
the string of your guitar when you first set it up.
We can use Eq. (3.133) to write T 2 = T/ cos θ2 and T 1 = T/ cos θ1 . This allows us
to express the vertical component of the force as
h i
Fv = T tan θ2 − tan θ1 . (3.134)

The reason why this is convenient is because, as the drawing in Fig. 3.17 indicates, the
angles θ1 and θ2 are related to the slope between u j and its neighbors u j±1 :
u j+1 − u j
tan θ2 = ,
∆x
u j − u j−1
tan θ1 = .
∆x
This part is a bit confusing, I know. Please take a second to make sure you understand.
Looking at Fig. 3.16 helps. In any case, this allows us to finally conclude that
" #
u j+1 − u j u j − u j−1
Fv = T −
∆x ∆x
h u j+1 − 2u j + u j−1 i
= (T ∆x)
∆x2
Here I also multiplied and divided by ∆x because the quantity in square brackets then
becomes exactly the second derivative of u in Eq. (3.75). Whence
h u j+1 − 2u j + u j−1 i
Fv = (T ∆x) ' (T ∆x)u xx . (3.135)
∆x2
This vertical force will cause u(x, t) to move up and down according to Newton’s
2nd law,
d2 u j
m 2 = Fv ,
dt

113
where m = ρ∆x. Using our just obtained formula for Fv , the factors of ∆x cancel out
and we are left with
d2 u j h u j+1 − 2u j + u j−1 i
ρ 2 =T . (3.136)
dt ∆x2
Or, in the limit ∆x → 0,

utt = c2 u xx , c2 = T/ρ. (3.137)

The string thus obeys the 1D wave equation. And, what is more, we have found a neat
microscopic interpretation for the constant c. As we saw in the electromagnetic case,
c is the velocity of propagation of the waves. The relation c2 = T/ρ thus makes sense:
the waves propagate faster if the tension in the string is higher (T large) and if the string
is light (ρ small).

Energy of the string

Vibrations contain energy. And since we are assuming there is no dissipation, en-
ergy should be a conserved quantity. Let us therefore find out what the expression for
the energy is. I will do two derivations. The first is pragmatic. It uses only the wave
equation (3.137) and finds a quantity E which is a constant of motion, E(t) = E(0), ir-
respective of what this E physically means. The second is more physical and provides
a nicer justification to the same result.

Pragmatic derivation: Start with Eq. (3.137), multiply by ut on both sides and inte-
grate from 0 to L:
ZL ZL
utt ut dx = c2 u xx ut dx.
0 0

The integrand on the left-hand side is

1d
utt ut = (ut )2 ,
2 dt
so we get
ZL ZL
d u2t
dx = c2 u xx ut dx. (3.138)
dt 2
0 0

I can put the d/dt outside since the integral is on x and I am assuming that u is smooth.
We now integrate the term on the right by parts. Integration by parts transfers the
derivative, from one function to another, plus a boundary term:

ZL L ZL
u xx ut dx = u x ut − u x utx dx,

0
0 0

114
where the last term involves utx because we transferred the derivative from u xx to ut .
The first term is what we call a boundary term. It involves the function and its deriva-
tives, evaluated at the boundaries. Since the strings are clamped at the boundaries
(u(0, t) = u(L, t) = 0), this term will be zero, leading to
ZL ZL ZL
1 d
u xx ut dx = − u x utx dx = − (u x )2 dx.
2 dt
0 0 0

In the last equality I also used my naughtiness to write u x utx = 1 d 2

2 dt u x .

Plugging this in Eq. (3.138) we finally get

ZL
d u2t u2
+ c2 x dx = 0. (3.139)
dt 2 2
0

We have therefore arrived at a conserved quantity. This motivates us to define

the energy contained in the wave as

ZL
1
E= u2t + c2 u2x dx, (3.140)
2
0

so that dE/dt = 0.

In the case of the heat equation, energy was in general not conserved. But for the wave
equation it always is. This is a consequence of the 2nd derivative in time. Eq. (3.140) is
extremely useful, as it says that during the evolution there is this thing which is always
constant. But the downside of the above derivation is that it does not clarify what this
energy really means. For that, we move on to the 2nd derivation.

Mechanical derivation: We go back to the discretized version of the problem, Eq. (3.136).
This can be viewed, from a classical mechanics perspective, as a system of N − 1 par-
ticles u1 , u2 , . . . uN−1 interacting with with each other through conservative forces. The
total energy will then be the sum of kinetic and potential energies:
N
1X
E= ρ∆xu̇2j + V(u1 , . . . , uN−1 ),
2 j=0

where V(u1 , . . . , uN−1 ) is the potential energy function, which is such that
∂V h u j+1 − 2u j + u j−1 i
− = Fv, j = T ∆x .
∂u j ∆x2
We now need a bit of reverse engineering: what is the function V(u1 , . . . , uN−1 ), which
is such that when we differentiate with respect to a given u j , yields the expression

115
above? I will tell you the answer and then we can check:
N
T X
V(u1 , . . . , uN−1 ) = (u j − u j+1 )2 , (3.141)
2∆x j=0

with the proviso that u0 = uN = 0 (to fix the boundary conditions). To see why, choose
a certain j to differentiate. Say j = 42. There are only be two terms in this sum that
contain u42 ; the term (u42 − u43 )2 and the term (u41 − u42 )2 . Thus,
∂V T h i
= (u42 − u43 ) − (u41 − u42 ) ,
∂u42 ∆x
which is exactly what we are looking for.
The energy of the system will therefore be
N N
1X T X
E= ρ∆xu̇2j + (u j − u j+1 )2 .
2 j=0 2∆x j=0

To finish we write this as

N ( u − u 2 )
1X j j+1
E= ρu̇2j + T ∆x.
2 j=0 ∆x

This is now exactly in the form of the Riemann sum you learned in Introductory Cal-
culus: X Z
lim f (x j )∆x = f (x)dx.
∆x→0
xj

Thus, in the limit ∆x → 0 we will simply get the integral of whatever is inside the sum.
The first term is simply u̇ j → ut , the first time derivative. And the second term contains
(u j+1 − u j )/∆x → u x . Thus, we finally arrive at

ZL
1
E= ρu2t + T u2x dx. (3.142)
2
0

It is convenient to factor out ρ and use c2 = T/ρ. We then get

ZL
ρ
E= u2t + c2 u2x dx. (3.143)
2
0

This is now almost the same as (3.140), differing only due to the pre-factor ρ. The rea-
son for this discrepancy is that Eq. (3.140), actually, is not really an energy (it does not
have energy units), while (3.143) is. This happened because the pragmatic derivation
used only the wave equation ut = c2 u xx , where ρ is hidden in c2 . Usually we don’t
really care about this discrepancy, though: Eq. (3.140) is still the energy, just rescaled
to units of ρ.

116
Figure 3.18: A one dimensional bar depicted as a system of particles coupled by springs.

Vibrations of a solid
Another major application of the wave equation is in the description of the vibra-
tions of a solid. Imagine a long and thin bar, made e.g. of steel or something. If you
now go on one side and hit it with a hammer, the bar will vibrate and this vibration will
propagate through the solid. It can even be felt from the other side.
These vibrations are different from the ones of a string that we just studied. The
reason is that they are longitudinal, instead of transverse. We can imagine the solid as a
bunch of atoms, each sitting in a certain equilibrium position labeled as x1 , x2 , x3 , . . . , xN .
For simplicity we are going to think in 1D, as in Fig. 3.18. if nothing is vibrating, the
atoms are just standing still. But whey the start to vibrate, they will be displaced, very
slightly, from their equilibrium positions. We call u(xn , t) the displacement of atom
n from its equilibrium position xn . Notice how everything is 1D. This u(xn , t) refers
to the displacement to the left or right. This is different from Fig. 3.16, where the
displacement was up and down.
The atoms are bound together by chemical forces, which in general can be quite
complicated. But lucky for us, that does not matter. What matters is that the positions
x1 , x2 , . . . correspond to the equilibrium configuration of these forces. That is, they are
the positions where everything balances out. If an atom now moves a bit, its neighbors
to the left and right will tend to push it back. Try to picture this like people on a
crowded subway (wearing masks because of Covid). For instance if u(xn ) > 0, then the
atom at xn+1 will tend to push the atom at xn to the left and the atom at xn−1 will push
it to the right. If u(xn ) < 0 the situation reverses. The force acting on atom xn by the
atom at xn+1 can be approximated by a harmonic force, with spring constant k. That is,
something like h i
Fn+1→n = k u(xn+1 , t) − u(xn , t) .
The rationale is as follows: the force is zero only if both are in their equilibrium posi-
tions, u(xn ) = u(xn+1 ) = 0. If u(xn+1 ) = 0, the the force must have the opposite sign of
u(xn ), because it is supposed to be a restoring force (it pushes in the direction opposite
of the motion). Conversely, if u(xn ) = 0, the force should have the same sign as u(xn+1 )
because if u(xn+1 ) < 0, it should push u(xn ) to the left and vice-versa. Just like people
in a crowded subway.
Similarly, the force of atom n − 1 on n will be
h i
Fn−1→n = k u(xn−1 , t) − u(xn , t) .
The same logic applies. Thus, Newton’s 2nd law for atom n reads
h i
mutt (xn , t) = k u(xn+1 , t) − 2u(xn , t) + u(xn−1 , t) ,

117
where m is the mass of the atom. The thing on the right almost looks like a 2nd
derivative. But here things are still discrete. We will only get a wave equation if we
look at the dynamics from farther away. This is called a coarse graining. We can make
a 2nd derivative appear by dividing on both sides by ∆x2 , where ∆x is the equilibrium
spacing of the atoms. Then we can approximate
u(xn+1 , t) − 2u(xn , t) + u(xn−1 , t)
' u xx (x, t).
∆x2
and so our equation of motion becomes

1 k∆x2 E
utt = u xx , c2 = := , (3.144)
c2 m ρ
where ρ := m/∆x is the density of the bar, just like in Eq. (3.137). The quantity E :=
k∆x, on the other hand, is called the Young modulus and is a property characterizing
the elasticity of the material. For instance, steel has a Young modulus which is 3 times
that of aluminum. But the density of steel is also almost 3 times higher, so that the
speed of propagation of waves in steel and aluminum are somewhat close.

3.8 Wave equation in 1D (strings)

In this section we are going to consider the motion of a string, like the one in a
guitar or harp. Such a string is modeled by the wave-equation in 1D, with u(x, t) being
the displacement of the string at position x and time t. Moreover, the string is fixed
(clamped) at the end-points, and so must be subject to Dirichlet boundary conditions
u(0, t) = u(L, t) = 0. Finally, since the wave equation is 2nd order in time, we must
provide two initial conditions, for u(x, 0) and ut (x, 0). The full boundary value problem
is thus summarized by

utt = c2 u xx , (3.145)

u(0, t) = 0 u(L, t) = 0,

u(x, 0) = f (x) ut (x, 0) = g(x),

for given functions, f and g. As always, we solve this using separation of variables, by
setting u(x, t) = X(x)T (t) (Sec. 3.2). This yields

T̈ X 00
= = −k2 ,
c2 T X
where k is a constant to be determined. This results in the pair of equations

X 00 = −k2 X, (3.146)

T̈ = −c2 k2 T. (3.147)

118
Since the spatial part is subject to the boundary conditions X(0) = X(L) = 0, we obtain
the familiar family of solutions
nπ
Xn (x) = sin(kn x), kn = , n = 1, 2, 3, . . . (3.148)
L
The time-part, on the other hand, will have solutions

T n (t) = an cos(ckn t) + bn sin(ckn t),

for constants an and bn . You can also use complex exponentials if you prefer. For the
present problem, sines and cosines are a bit more convenient.
The general solution will thus be a linear superposition of these basic solutions:
∞
X
u(x, t) = an cos(ckn t) + bn sin(ckn t) sin(kn x). (3.149)
n=1

Finally, the values of an and bn are determined from the initial conditions:
∞
X
u(x, 0) = an sin(kn x) = f (x), (3.150)
n=1
∞
X
ut (x, 0) = ckn bn sin(kn x) = g(x). (3.151)
n=1

Using the orthogonality of the Fourier coefficients we then find

ZL
2
an = f (x) sin(kn x)dx
L
0

ZL
2
bn = g(x) sin(kn x)dx.
Lckn
0

which are nothing but the Fourier sine coefficients of f (x) and g(x) on [0, L]. Actually,
to be precise, the Fourier sine coefficient of g(x) is ckn bn , not bn . With this we arrive at
the following general solution:

Solution of the 1D wave equation

The solution of the boundary problem (3.145) can be written as

∞
X
u(x, t) = an cos(ckn t) + bn sin(ckn t) sin(kn x), (3.152)
n=1

119
Figure 3.19: Example solution of the wave equation, Eq. (3.152), for f (x) = x(x3 − 2Lx2 + L2 )
and g(x) = x(L − x). The plot was made with L = c = 1.

where
ZL
2
an = f (x) sin(kn x)dx (3.153)
L
0

ZL
2
bn = g(x) sin(kn x)dx, (3.154)
Lckn
0

are the Fourier sine coefficients of f (x) and g(x) on [0, L]. Some conditions
must be met by f (x) and g(x) for this solution to actually be valid. These will
be discussed below.

Example: We take f (x) = x(x3 − 2Lx2 + L3 ) and g(x) = x(L − x). The solution is
shown in Fig. 3.19. Since (3.152) is composed only of oscillatory functions, there is no
damping and the solution just keeps repeating itself over and over. See also this GIF.
This is very different from the heat equation in Sec. 3.4, which was damped in time.

Energy and Uniqueness

We can prove that the solution is unique using the concept of energy, developed in
Sec. 3.7. Recall that the energy contained in the string is given by Eq. (3.140):

ZL
1
E= u2t + c2 u2x dx. (3.155)
2
0

We showed that the energy is a constant of motion, dE/dt = 0. Moreover, from its
definition, we can see that E > 0.

120
�=� �=� �=� �=� �=�
��
��
��
-�� -�� -�� -�� -��
-�� -�� -�� -�� -��
� ��
�/� �/� �/� �/� �/�

Figure 3.20: The first few harmonics sin(kn x).

Now suppose we found two solutions, u1 and u2 , of the boundary value prob-
lem (3.145), and define u = u1 − u2 . Since the wave equation is linear, u is also
a solution of utt = c2 u xx . Moreover, it is subject to the same boundary conditions
u(0, t) = u(L, t) = 0. But, since it is the difference between u1 and u2 , it is now subject
to the initial conditions u(x, 0) = ut (x, 0) = 0. Thus, at time t = 0 the energy contained
in u will be E = 0. And since energy is a constant of motion, it has to remain zero for
all times. But E is an integral of positive quantities and so the only way it can be zero
is if the integrand itself is zero. Hence, we conclude that u(x, t) = 0 for all times, which
implies u1 = u2 . That is, the solution is unique.
The above reasoning is actually a very powerful method for establishing the unique-
ness of solutions. Please take a second to make sure you understand it.
The solution (3.152) is given by an infinite sum of terms. Each term,

un (x, t) = an cos(ckn t) + bn sin(ckn t) sin(kn x), (3.156)

is called the n-th harmonic of the string. For each harmonic, there are certain special
positions x where un = 0 at all times. This always happens at the boundaries, un (0, t) =
un (L, t) = 0, of course. But it also happens if kn x = `π, for some integer `. These points
are called nodes. Since kn = nπ/L, this implies x = (`/n)L. Of course, we must also
have x ∈ [0, L], so the number of integers ` which satisfy this is limited by n. That is,
the n-th harmonic will have n − 1 nodes, located at

xnodes = (`/n)L, ` = 1, . . . , n − 1. (3.157)

The first harmonic, n = 1 has no nodes. The second harmonic has 1 node and the third
harmonic has 2. This is illustrated in Fig. 3.20.
Each harmonic also evolves in time. The oscillation frequency is seen to be

ωn = ckn (3.158)

Such a linear relationship between frequency and k is a characteristic trait of waves.

We will talk more about it in the next section. Eq. (3.158) says that higher harmon-
ics therefore also oscillate more rapidly. This is also visible in the energy of each
harmonic,
ZL "
1 ∂un 2 2 ∂un 2
#
En = +c dx.
2 ∂t ∂x
0

121
I will leave it for you as an exercise to check that, if we plug (3.156) and carry out the
integral, we are left with
En = L(a2n + b2n )ω2n (3.159)
The energy density, En /L, is therefore directly proportional to the frequency ωn (and to
the constants an , bn that determine the amplitude of that harmonic). Higher harmonics
are thus more energetic. The total energy in the string is then simply the sum of the
energy contained in each harmonic,
∞
X ∞
X
E= En = L (a2n + b2n )ω2n , (3.160)
n=1 n=1

which is nothing but a slightly modified version of Parseval’s identity (Sec. 1.6).

Conditions on f (x) and g(x); plucked strings

Are the solutions above valid for any f (x) and g(x)? Certainly not. In fact, it turns
out solutions only exist provided the following list of restrictions apply:

f (0) = f (L) = 0, (3.161)

f (x) is twice differentiable, (3.162)

f 00 (0) = f 00 (L) = 0, (3.163)

g(0) = g(L) = 0 (3.164)

g(x) is differentiable. (3.165)

These restrictions are required for essentially two reasons: (i) to ensure that u satisfies
the boundary conditions u(0, t) = u(L, t) = 0 at all times and (ii) to ensure that second
derivatives utt and u xx actually make sense. For instance, if f (x) is not twice differen-
tiable, then at the initial times how can we compute u xx (x, 0)? It can be shown (see
Trench, Sec. 12.3) that if these restrictions apply, then (3.152) will be a valid solution
of (3.145) (we have already shown that it is unique from the energy argument).
But what if f and g violate some of these conditions? There are, in fact, physically
important problems which do. One is the so-called plucked string, given by g(x) = 0
and 
 x/a

 0 6 x 6 a,
f (x) = 

 (3.166)
(L − x)/(L − a) a 6 x 6 L,


(see Fig. 3.21). This is what happens if you start the string at rest (g(x) = 0), but stretch
it in a certain point a, like we would do in a guitar or harp. This f (x) is not twice
differentiable, so strictly speaking a solution of the boundary value problem (3.145)
with this initial condition does not exist.
This is a bit frustrating. What is going on? A pragmatist would argue that this is
a mathematical pathology. When we pluck a string, the profile f (x) does not actually

122
��
��
��

�(�)
��
��
��
��
�

Figure 3.21: Initial condition for a plucked string, Eq. (3.166).

have a kink, like Fig. 3.21, but curves smoothly. Thus, a “real” f (x) would be twice
differentiable.
The naive approach, on the other hand, would be to simply say “I don’t care”: I’ll
just plug this f (x) in Eqs. (3.152) and (3.153) and see what happens. Doing that we
find that the an are given by
2
an = sin(kn a), (3.167)
a(L − a)kn2
while bn = 0 since g(x) = 0. Thus Eq. (3.152) becomes
∞
2 X 1
u(x, t) = sin(kn a) cos(ckn t) sin(kn x) (3.168)
a(L − a) n=1 kn2

This doesn’t look too bad. See, for instance, the following GIF. The solution is kind
of weird, but it does evolve in an oscillatory fashion, as one would expect from the
wave-equation.
It turns out that the wave equation admits something called “Sobolev generalized
solutions”. And the plucked string is one such example. These are not actual solutions
of the PDE, but they are solutions of another integral equation associated to it. This
is why, when we make a plot, they still like they are working just fine. For more
information, see Djairo Sec. 5.11.

D’Alembert’s solution and the light cone

Let
∞
X
S f (x) = an sin(kn x), (3.169)
n=1
∞
X
S g (x) = ckn bn sin(kn x), (3.170)
n=1

denote the Fourier series of the initial conditions f (x) and g(x) [c.f. Eqs. (3.153)
and (3.154)]. The reason why I don’t call this f (x) and g(x) is because, strictly speak-

123
ing, these two functions only need to be defined in [0, L], while S f and S g are defined
for any x (being, of course, periodic with period L).
Now let us go back to the general solution (3.152) and use the trigonometric iden-
tities:
1h i
sin(x) cos(y) = sin(x − y) + sin(x + y) , (3.171)
2
1h i
sin(x) sin(y) = cos(x − y) − cos(x + y) . (3.172)
2
We then get
∞ ∞
X 1X h i
an cos(ckn t) sin(kn x) = an sin(kn (x − ct)) + sin(kn (x + ct))
n=1
2 n=1

1h i
= S f (x − ct) + S f (x + ct)
2
Similarly,
∞ ∞
X 1X h i
bn sin(ckn t) sin(kn x) = bn cos(kn (x − ct)) − cos(kn (x + ct))
n=1
2 n=1

This doesn’t look at all like S g . To make it appear, we use the following trick:
Zx+ct
1h i
sin(kn y)dy = cos(kn (x − ct)) − cos(kn (x + ct)) .
kn
x−ct

Thus, we may write

∞ ∞ Zx+ct Zx+ct
X 1 X 1
bn sin(ckn t) sin(kn x) = ckn bn sin(kn y)dy = S g (y)dy.
n=1
2c n=1 2c
x−ct x−ct

Combining everything, we see that the general solution (3.152) can also be written as
Zx+ct
1h i 1
u(x, t) = S f (x − ct) + S f (x + ct) + S g (y)dy,
2 2c
x−ct

which is called d’Alembert’s solution of the wave equation.

D’Alembert’s solution of the wave equation

The solution of the 1D wave equation (3.145) can be written as

Zx+ct
1h i 1
u(x, t) = S f (x − ct) + S f (x + ct) + S g (y)dy. (3.173)
2 2c
x−ct

124
��
�� (�)
�� (�+��) �(�-��)
�(�)

��
��
��
��
� (��)

Figure 3.22: (Left) A localized initial condition f (x) propagates to the left and right according
to Eq. (3.173). (Right) The light cone, described by the straight lines x − ct and
x + ct.

where

∞ ZL
X 2
S f (x) = an sin(kn x), an = f (x) sin(kn x)dx, (3.174)
n=1
L
0

is the Fourier series of u(x, 0) = f (x) and

∞ ZL
X 2
S g (x) = ckn bn sin(kn x), bn = g(x) sin(kn x)dx, (3.175)
n=1
Lckn
0

is the Fourier series of ut (x, 0) = g(x).

D’Alembert’s solution is not necessarily useful from a computational perspective.

But it provides extremely valuable physical insight. The most important aspect to
notice is that the solution depends only on x − ct or x + ct. To understand the idea,
suppose that g(x) = 0 and f (x) is given by a highly localized function, such as that
shown in Fig. 3.22. The solution at further times t will then be simply
1h i
u(x, t) = S f (x − ct) + S f (x + ct) .
2
This behaves like two copies of f (x): the copy with x − ct moves to the right because
if originally the function was peaked in some point x = x0 , then after a time t it will
be peaked in x − ct = x0 ; that is, in x = x0 + ct. Thus S f (x − ct) is the function f (x),
moving to the right. Similarly, S f (x + ct) is the function moving to the left. Of course,
I am assuming the wave has not reached the borders yet. When it does, the solution
will be reflected back. This can be seen in this GIF. But what is important about this
example is that the initial condition f (x) propagates half to the right and half to the left.
This same logic also holds for g(x).
This introduces the idea of a light cone (Fig. 3.22, right). At some instant of time

125
t, the solution u(x, t) at a certain point x will have been influenced only by the parts of
the initial conditions f (x0 ) and g(x0 ) if that point x0 is inside the cone defined by x − ct
and x + ct. Or, put it differently, the effects of the initial conditions f (x0 ) and g(x0 ) at a
given point x0 will only start to influence u(x, t) after at least a time t = (x − x0 )/c. The
influence of the initial conditions therefore travel at a speed c, which is why we call c
the speed of propagation of the waves (mind-blowing!). We use the term “light-cone”
because light satisfies the wave equation. But, of course, this also holds for strings,
where c is the speed of propagation, which depends on the properties of the string (and
has nothing to do with light).

3.9 Waves, Schrödinger, Klein-Gordon and Dirac

In this chapter we have focused on partial differential equations subject to boundary
conditions. For instance, when we treated strings, we assumed it had length L and
was clamped at the boundaries (Dirichlet conditions). We can also study PDEs without
boundary conditions. For instance we can study the propagation of waves in free space.
This would be described by:

utt = c2 u xx , (3.176)

u(x, 0) = f (x),

ut (x, 0) = g(x).

Or we can study the diffusion of particles in an infinite river, which is described by

ut = αu xx ,

u(x, 0) = f (x).

Or we can study a free quantum particle in 1D, which obeys Schrödinger’s equation,

∂ψ ~2 ∂2 ψ
i~ =− , (3.177)
∂t 2m ∂x2
ψ(x, 0) = f (x),

where m is the mass and ~ is Planck’s constant. In any case, we still have to specify the
initial conditions (2 in the case of waves, 1 in the case of diffusion and Schrödinger).
But we do not impose any boundary conditions. Problems of this form are called
Cauchy problems.
The best tool for handling Cauchy problems is the Fourier transform, which will
be the subject of the next chapter. Here, I just want to prepare the terrain. More specif-
ically, what I would like to do is to compare the wave Eq. (3.176) with Schrödinger’s
Eq. (3.177). In fact, Schrödinger himself talked about the solutions of his equation as
describing matter waves. So it is only fair we compare them.

126
Wave equation and plane waves
Let us start with the Cauchy problem for the wave equation (3.176). We search for
solutions of the form u(x, t) = Aei(kx−ωt) , where k and ω are to be determined (and A is
just a silly constant). Plugging this in (3.176) yields

A(−ω2 ) = c2 A(−k2 ).

We therefore see that this will indeed be a solution, but only if k and ω are related by

ω = c|k| (3.178)

This is very similar to what we found last section, Eq. (3.158). But in that case, due
to the boundary conditions, the allowed values of k were discrete. Here, because there
are no boundary conditions, any k ∈ R provides a valid solution.
The fact that now k can vary continuously is a bit of a complication, because we
cannot write the general solution as a sum of these basic building blocks. Instead, it
must be an integral. This will be the main motivation to define a Fourier Transform in
the next chapter, which is the continuous analog of a Fourier series.
A solution like ei(kx−ωt) is called a plane wave. The quantity k is called the wave
vector and a relation like (3.178) is called a dispersion relation. That is, the disper-
sion relation establishes how frequency is related to wave vector. In high school you
probably learned that for light the wavelength λ was related to the frequency ν accord-
ing to ν = c/λ. This is actually the same as (3.178) because now we are using angular
frequency, ω = 2πν. Moreover, the wave vector is related to the wavelength according
to k = 2π/λ.

Schrödinger equation
Next let us try a plane wave solution ψ = Aei(kx−ωt) for Schrödinger’s equation. This
yields
~2
A(i~)(−iω) = − A(−k2 ).
2m
It therefore also works, but only if ω and k are related via
~k2
ω= . (3.179)
2m
To understand what this means, let us recall that Planck’s constant has the value

~ = 1.054 × 10−34 J · s (3.180)

It thus has units of energy times seconds. Hence, E = ~ω will have units of energy.
Moreover, [k] = 1/m and so [~k] = kg m/s, which are units of momentum. This
suggests the association
E = ~ω, p = ~k. (3.181)
Using this, Eq. (3.179) becomes
p2
E= , (3.182)
2m

127
which is nothing but the usual energy-momentum relation of classical mechanics.
Dispersion relations are therefore a synonym of energy-momentum relations. Fre-
quency is just an ~ away from energy. And wave vector is just an ~ away from momen-
tum. This finally explains why, all the way back in chapter (1), I called the quantities
kn = nπ/L appearing in Fourier series, as “momentum”.
This also motivates us to do the same for the wave equation dispersion relation (3.178).
This yields
E = c|p|, (3.183)
which is the energy-momentum relation for photons. As you will learn when you study
Quantum Physics, a photon of frequency ω has energy E = ~ω. Moreover, even though
photons have no mass, they still carry momentum. I know this may seem weird since
we always think of momentum as mass times velocity. But momentum is actually more
general and you can have momentum even if m = 0.
A key distinction between waves and matter waves, therefore, is in the energy-
momentum dispersion relation. Photons have a linear dispersion E ∝ |p|, while for
matter waves this is quadratic, E ∝ p2 . It turns out that these are both limiting cases of
a more general dispersion relation
q
E = m2 c4 + p2 c2 , (3.184)

called the relativistic dispersion relation. If m = 0, this yields the photon case (3.183).
Conversely, if the momentum is small, pc mc2 , we can write
q
E = mc2 1 + p2 /(m2 c2 )
√
and use the series expansion 1 + x ' 1 + x/2, which yields

E ' mc2 1 + p2 /(2m2 c2 ) ,

or
p2
E ' mc2 + . (3.185)
2m
The first term is called the self-energy: it is the energy that the particle has even if p =
0. And the second term is the Schrödinger dispersion relation (3.179). Schrödinger’s
equation is thus valid in the non-relativistic limit. That is, it is valid for motion with
pc mc2 .

Klein-Gordon equation
You may also be wondering, which kind of PDE yields the general relativistic dis-
persion relation (3.184). The answer is Klein-Gordon’s equation

utt = c2 u xx − µ2 u. (3.186)

where µ = mc2 /~. I will leave it for you to check that this admits plane wave solutions
ei(kx−ωt) , but with
ω2 = c2 k2 + m2 c4 /~2 . (3.187)

128
Multiplying by ~ on both sides yields (3.184). Eq. (3.186), or some variations of it, are
in fact one of the basic equations obeyed by many elementary particles.
When Schrödinger was trying to derive his equation, he actually first considered
the Klein-Gordon equation (3.186). But he found a problem: this equation does not
admit a continuity equation. We discussed continuity equations in Sec. 3.4. They have
the form
∂q
= −∇ · J , (3.188)
∂t
where q is a “charge” and J is a current. Continuity equations happen naturally when
the PDE is 1st order in time. But they don’t hold when it is 2nd order. Schrödinger
knew that.
Continuity equations ensure that particles are not spontaneously created or de-
stroyed. Schrödinger was interested in describing electrons, protons and atoms, which
cannot simply disappear out of nowhere. The wave equation can only describe pho-
tons, which can be spontaneously destroyed or created. Schrödinger therefore realized
that his equation had to be 1st order in time. But it was also supposed to describe wave
motion. And we know that a 1st order PDE like the heat equation cannot describe
waves. So what to do? This was Schrödinger’s big eureka moment: introduce complex
numbers. Write down a 1st order PDE, but put an i in front to turn real exponentials
into complex exponentials.
Despite its enormous success, Schrödinger’s equation only describes non-relativistic
motion. How to describe relativistic particles? This only came a few years later with
Dirac. He realized that the only way of doing this was to introduce an additional level
of complexity: the function ψ could not be a number; it had to be a vector. And what is
funnier, it has to be a vector of dimension 4. Dirac’s equation for a free particle reads

ψ1  ψ4  −ψ4   ψ3   ψ1 
         
ψ  ψ   ψ  −ψ   ψ 
i~∂t  2  = −i∂ x  3  + ∂y  3  − i∂z  4  + m  2  (3.189)
ψ3  ψ2  −ψ2   ψ1  −ψ3 
ψ4 ψ1 ψ1 −ψ2 −ψ4

Pretty funny eh? Of course, I will not have the time for us to go into more detail about
Dirac’s equation. But I wanted you to see how a deep understanding of PDEs can aid us
in formulating fundamental laws of physics. I think this is really beautiful and shows
how something seemingly simple, like the order of a derivative, can have a profound
influence on how we describe physical systems.

129
Chapter 4

Fourier Transforms

4.1 Introduction
Let f (x) be an arbitrary function of x. It doesn’t have to be periodic or even real.
We define its Fourier Transform (FT) as
Z∞
1
f˜(k) = √ f (x)e−ikx dx. (4.1)
2π
−∞

The FT integrates over x, but introduces this new parameter k, so that the result is a
function of k. When we take the Fourier transform of a function, we like to say we
moved to Fourier space. Real space has x. Fourier space has k. As we will see
soon, when x is position, this k has the interpretation of momentum or wave vector.
Conversely, when dealing with functions of time, f (t), we usually use ω instead of k:
Z∞
1
f˜(ω) = √ f (t)e−iωt dt. (4.2)
2π
−∞

Mathematically speaking, this is silly: we can call this parameter whatever we want.
But physically, we like to keep the correspondence x ↔ k and t ↔ ω.
The Fourier Transform is an invertible transformation. That is, we can always go
back and find f (x) from f˜(k). This is called the Inverse Fourier Transform and is
given by
Z∞
1
f (x) = √ f˜(k)eikx dk. (4.3)
2π
−∞

The integral is now in dk and all that changes is the sign in the exponential. It is not at
all obvious why Eq. (4.3) works; but we are going to prove it below.
The Fourier transform generalizes the notion of Fourier series to functions which
are not necessarily periodic. When dealing with Fourier series, we had a set of param-
eters cn . In the Fourier transform this becomes a continuous function f˜(k). We can

130
�� (�) �� (�)
��
��
� ��

��(�)

��(�)
��
��


��
��
��
-� -� -� � � � � -�� -� � � ��
� �

Figure 4.1: (a) The boxcar functions (4.4) for different values of a. (b) The corresponding
Fourier transform (4.5). When f (x) is wide, f˜(k) tends to be thin and vice-versa.

move from series to transform if we take f (x) to be a periodic function, but with period
L → ∞. We will do this in a second. But before, let us work out some examples.

Boxcar
Consider a boxcar centered around 0, with width 2a and height 1/2a (so that the
area under the curve is 1):
1
Ba (x) = θ(x + a)θ(a − x), (4.4)
2a
where θ(x) is the Heaviside theta function. See Fig. 4.1(a). The Fourier transform (4.1)
yields
Za
e−ika − eika
!
1 1 −ikx 1
B̃a (k) = √ e dx = √ .
2π 2a 2π2a −ik
−a

Or, simplifying a bit,

sin ka
B̃a (k) = √ . (4.5)
2πka
This is plotted in Fig. 4.1(b). The FT is a decaying oscillatory function and thus looks
very different from the original boxcar. One thing you may notice, however, is that
when f (x) is wide, f˜(k) tends to be thin and vice-versa. This is actually a general
feature of FTs, as we will see.

Dirac δ-function
The Fourier transform of a Dirac δ function is
Z∞
1 1
δ̃(k) = √ δ(x)e−ikx dx = √ . (4.6)
2π 2π
−∞

It is thus a constant, independent of k. The δ function is the sharpest function one can
have. And we see that its FT is the widest function one can have: that is, a constant
on the entire real line. We can also see Eq. (4.6) as a particular case of the boxcar

131
example (4.4). Recall that δ(x) is the limit of Ba (x) when a → 0. And indeed, if we
take the limit of a → 0 of Eq. (4.5) and use the famous limit sin x/x → 1, we also get
1/2π.
Now that we have δ̃(k), we can also recover δ(x) using the Inverse FT (4.3):
Z∞
1
δ(x) = eikx dk. (4.7)
2π
−∞

This formula turns out to be incredibly useful. We call it an integral representation

of the δ function. It essentially expands δ(x) into an integral of plane waves eikx . This
formula may seem a bit fishy at first: if we try to carry out the integral we find
Z∞ ∞
1 1 eikx
e dk =
ikx
,
2π 2π ix k=−∞

−∞

which looks super weird. This thing is supposed to mimic the δ-function. That is, it
should be infinite when x = 0 and zero when x , 0. The case x = 0 is consistent
since integrating 1 from −∞ to ∞ will definitely give infinity. But if x , 0, it is less
clear because eikx just oscillates indefinitely, so how can we evaluate it at k = ±∞? The
intuition is that ei10000x is a very very fast oscillatory function in x, which oscillates
symmetrically between positive and negative values. Thus, “on average”, it should be
zero. But this, of course, is not at all rigorous. A rigorous derivation can be done if
we regularize the integral. That is, we introduce some term which makes eikx slowly
decay as k → ±∞. An example of how to do this will be discussed below.
There is also another way of understanding (4.7). Let f (x) be some generic function
and let us try to combine the FT with the Inverse FT; that is, we insert Eq. (4.1) in
Eq. (4.3):
Z∞ Z∞ Z∞
dk ikx ˜ dk ikx dy
f (x) = √ e f (k) = √ e √ f (y)e−iky .
2π 2π 2π
−∞ −∞ −∞
Here I used a different letter y because this variable is being integrated, so we should
not confuse it with x. We now exchange the integrals over dk and dy (assuming this is
allowed) and write this as
Z∞ Z∞
dk ik(x−y)
f (x) = dy f (y) e .
2π
−∞ −∞

The integral in k results in a function of x − y: That is, this must have the form
Z∞
f (x) = dy f (y) × (function of x − y),
−∞

which should be compared this with the definition of the Dirac delta function:
Z∞
f (x) = dy f (y)δ(x − y).
−∞

132
Thus, we conclude that
Z∞
dk ik(x−y)
δ(x − y) = e , (4.8)
2π
−∞

which is exactly Eq. (4.7). This is nice, so let me summarize it in a box.

Fourier Transform and the Dirac delta

The FT and inverse FT are given by

Z∞ Z∞
dx dk
f˜(k) = √ f (x)e−ikx , f (x) = √ f˜(k)eikx . (4.9)
2π 2π
−∞ −∞

What links the two is the δ-function integral representation

Z∞
dk ik(x−y)
δ(x − y) = e . (4.10)
2π
−∞

This reflects the fact that if we apply first the FT, and then the Inverse FT, we
must get back to where we started.

If we change the integration variable in Eq. (4.10) from k to k0 = −k, we get a minus
sign in dk0 = −dk. But the integration limits now go from +∞ to −∞. So inverting
them back to the natural order, gets rid of the other minus sign. As a consequence, we
may also write (4.10) as
Z∞ 0
dk −ik0 (x−y)
δ(x − y) = e .
2π
−∞

That is, all that changes is the sign of the exponent. This shows that δ(x − y) = δ(y − x).
Changing integration variables in this way, from k → −k, is a common trick, which is
worth getting used to.

Gaussian
Consider a Gaussian function

e−x /2σ
2 2

f (x) = √ , (4.11)
2πσ2
where σ measures the width of the Gaussian. The pre-factor was chosen so that f (x)
has unit area:
Z∞ −x2 /2σ2
e
√ dx = 1. (4.12)
2πσ2
−∞

133
This is shown in Fig. 4.2(a). In the limit σ → 0, the Gaussian tends to a δ function.
The Fourier transform (4.1) will be
Z∞
1 1 2
/2σ2 −ikx
f˜(k) = √ √ dxe−x e
2π 2πσ2
−∞

To compute this we complete squares:

x2 1 1 h i
2
+ ikx = 2
(x + 2ikσ2 x) = 2
(x + ikσ2 )2 − (ikσ2 )2 ,
2σ 2σ 2σ
or, simplifying a bit,
x2 1 k 2 σ2
+ ikx = (x + ikσ) 2
+ .
2σ2 2σ2 2
The last term is not a function of x and so can be taken outside the integral, leading to
Z∞
e−k σ /2 e−(x+ikσ ) /2σ
2 2 2 2 2

f˜(k) = √ dx √
2π 2πσ2
−∞

Finally, we change variables to y = x + ikσ2 , which leads to1

Z∞
e−k σ /2 e−y /2σ
2 2 2 2

f˜(k) = √ dy √
2π 2πσ2
−∞

The remaining integral is nothing but Eq. (4.12) again. Hence, we finally obtain

e−k σ /2
2 2

f˜(k) = √ (4.13)
2π
The Fourier Transform of a Gaussian is thus also a Gaussian, but with width 1/σ in-
stead of σ. Once again, we have this interplay where wide in real space implies thin in
Fourier space and vice-versa. This is shown in Fig. 4.2(b).
The inverse Fourier Transform (4.3) now states that
Z∞
e−k σ /2 ikx
2 2

f (x) = e dk. (4.14)

2π
−∞

We can view this as a regularization of the δ function integral representation (4.7). We

know that when σ → 0, the Gaussian becomes the δ. So Eq. (4.14) is actually providing
a regularized representation of (4.7): the oscillatory term eikx is now multiplied by
e−k σ /2 , which decays smoothly when k gets large. That is, the integral is now always
2 2

bounded within a finite region of space. And this will be true no matter how small σ
is, as long as σ , 0.
1 This step is actually a bit subtle since y is complex, so that the integral is now in the complex plane,

instead of the real line. But using complex integration methods, one can show that this turns out to be
unimportant.

134
�� (�) �� (�)
��
��
� ��

� (�)
�(�)
��


��
��

��
-� -� -� � � � � -�� -� � � ��
� �

Figure 4.2: (a) The Gaussian (4.11) for different values of σ. (b) The corresponding Fourier
transform (4.13).

Where to put the 2π’s

The FT (4.1) involves e−ikx , while the Inverse FT (4.3) involves eikx . But this is
absolutely arbitrary and is a matter of taste. Each person uses a different definition.
We could have defined the FT with eikx if we wanted. What matters is that you are
consistent: if you use eikx to go to Fourier space, then you should use e−ikx to come
back. √
We also have some freedom in choosing the 2π’s. You don’t need to put 1/ 2π in
both the FT and the Inverse FT. You can also put just 2π in one of them, and nothing
on the other. For instance, we could have defined
Z∞ Z∞
1
f˜(k) = ∓ikx
f (x)e dx, f (x) = f˜(k)e±ikx dk, (4.15)
2π
−∞ −∞

All that matters is that in a full loop, you get an overall factor of 1/2π.
In the FT business, you have to be clear about which notation you are using. And
you have to be consistent and use the same notation throughout. If you are both clear
and consistent, then feel absolutely
√ free to use whatever notation you prefer. Mathe-
matica, for instance, uses 1/ 2π and a eikx in the FT (opposite sign of (4.1)).

Fourier Transform from Fourier Series

We are now ready to understand the relation between Fourier Transforms and
Fourier series. Let f (x) be a periodic function of period L. Its Fourier series is given
by
ZL/2
X 1
f (x) = cn e i2πnx/L
, cn = f (x)e−i2πnx/L dx. (4.16)
n∈Z
L
−L/2

We will now see how the FT emerges from this series when L → ∞. Instead of labeling
the coefficients by n, let us label them by
2πn
kn = ,
L

135
These kn are in one-to-one correspondence with n. Moreover, they are spaced by ∆kn =
2π/L. Thus, as L → ∞ the spacing between the kn becomes smaller and smaller,
suggesting we can use it to take the limit of a continuum.
In terms of kn , we write (4.16) as
X
f (x) = cn eikn x .
n∈Z

To make this sum look like an integral, we introduce a “convenient 1”: since ∆kn =
2π/L, we can multiply f (x) by
L
1= ∆kn .
2π
This leads to
L X ikn x
f (x) = cn e ∆kn .
2π n∈Z
Now the remaining sum looks exactly like the Riemann sum we learn in calculus: when
L → ∞ the increment ∆kn becomes infinitesimal and the kn tend to vary continuously.
Thus, defining
L
f˜(kn ) = √ cn , (4.17)
2π
leads to
X ∆kn Z∞
dk
f (x) = ˜
√ f (kn )e → ikn x
√ f˜(k)eikx ,
n 2π 2π
−∞
which is exactly Eq. (4.3).
We can also do the same for the coefficients cn . From Eq. (4.16),
ZL/2
L 1
f˜(kn ) = √ cn = √ f (x)e−ikn x dx.
2π 2π
−L/2

Taking the limit L → ∞ then yields exactly Eq. (4.1). Incidentally, we have also proven
that Eq. (4.3) is indeed the inverse operation of (4.1).

Example: Consider the boxcar (4.4), but let us assume that it is piecewise periodic
with period L (we assume a < L/2). The Fourier coefficients are
ZL/2 Za
1 1
cn = Ba (x)e −ikn x
dx = e−ikn x dx,
L 2aL
−L/2 −a

or, carrying out the integral,

sin(kn a)
cn = .
kn aL
Plugging this in Eq. (4.17) then yields
sin(kn a)
f˜(kn ) = √ ,
2πkn a
which tends exactly to Eq. (4.5) in the limit L → ∞.

136
Real functions and other symmetries
The variable x is real, but the function f (x) can in principle be complex. Let us
see what happens if f (x) is real. We start with the Fourier transform (4.1) and take the
complex conjugate. But since f (x)∗ = f (x), we get
Z∞
1
f˜(k)∗ = √ f (x)eikx dx. (4.18)
2π
−∞

The only thing that changed is that e−ikx → eikx . So this would be the same as comput-
ing f˜(−k). Thus, we conclude that
f˜(k)∗ = f˜(−k) when f (x) ∈ R. (4.19)
When the function is real, the Fourier coefficients may still be complex, in general. But
are not entirely arbitrary; instead, they satisfy this special symmetry.
What if f (x) is both real and even? This was the case of the Boxcar and Gaussian
examples we studied before. And, coincidentally, in both cases we found that f˜(k)
happened to be real. Is this general? We start with Eq. (4.18) and use the fact that
f (x) = f (−x):
Z∞
1
f˜(k) = √
∗
f (−x)eikx dx.
2π
−∞
We now change variables to x0 = −x. There is a minus sign in the differential, dx0 =
−dx. But the integral is now from +∞ to −∞. So in the end we get a double sign
change, leading to
Z∞
˜f (k)∗ = √1 0
f (x0 )e−ikx dx0 = f˜(k).
2π
−∞
which is nothing but the original FT (4.1). Thus, if f (x) is even, the FT will be real:
f˜(k)∗ = f˜(k). Similarly, if f (x) is odd, we use f (−x) = − f (x) to get
Z∞
1
f˜(k)∗ = − √ f (−x)eikx dx.
2π
−∞

Changing variables again to x0 = −x now leads to

Z∞
1 0
f˜(k) = − √
∗
f (x0 )e−ikx dx0 = − f˜(k).
2π
−∞

Thus, in this case the FT is purely imaginary.

Symmetries of the FT

In general f˜(k) is a complex function.

137
• If f (x) is real, then f˜(k) will still be complex, but will satisfy f˜(k)∗ =
f˜(−k).
• If f (x) is real and even, the FT will be real, f˜(k)∗ = f˜(k).
• If f (x) is real and odd, the FT will be purely imaginary, f˜(k)∗ = − f˜(k).

Fourier transform in several dimensions

Fourier Transforms are easily extended to several dimensions. For instance, if we
have a function of 2 variables, f (x1 , x2 ), we define its FT and inverse FT as
Z
dx1 dx2
f˜(k1 , k2 ) = √ √ f (x1 , x2 )e−i(k1 x1 +k2 x2 ) (4.20)
2π 2π
Z
dk1 dk2
f (x1 , x2 ) = √ √ f˜(k1 , k2 )ei(k1 x1 +k2 x2 ) . (4.21)
2π 2π
All integrals
√ are from −∞ to ∞, which is usually omitted for clarity. T o keep track
of the 2π’s,√ simply remember that each differential, be it dxi or dki , always get one
factor of 1/ 2π. We could’ve also, if we wanted, defined the FT with respect to only
one of the variables. For instance,
Z
dx1
f˜(k1 , x2 ) = √ f (x1 , x2 )e−ik1 x1 .
2π
We only integrate over x1 , so the result is still a function of x2 . The question of over
which variables we should take the FT depends on the problem in question.
Often, specially when dealing with PDEs, we take Fourier transforms of functions
of space, r = (x, y, z). This produces a function in Fourier space k = (k x , ky , kz ), which
reads
d3 r
Z
f˜(k) = f (r)e−ik·r . (4.22)
(2π)3/2
Conveniently, the argument of the exponential is now written as the dot product k · r =
k x x+ky y+kz z. Moreover, here d3 r = dxdydz and the factor of (2π)3/2 appeared because
√
each differential got its own 2π.
Many times we also have to deal with a function f (r, t), which depends both on r
and t. In this case we can take the FT only of the spatial coordinates or of both. Taking
the FT only of r produces a function f˜(k, t), which still depends on t:

d3 r
Z
f˜(k, t) = f (r, t)e−ik·r . (4.23)
(2π)3/2

Conversely, taking the FT in both r and t would lead to a function f˜(k, ω),

d3 r dt
Z
f˜(k, ω) = √ f (r, t)e−i(ωt+k·r) . (4.24)
(2π)3/2 2π

138
where ω is the Fourier variable associated to t, as in Eq. (4.2). We could have called it
a 4D vector k = (k1 , k2 , k3 , k4 ), but as I mentioned above, we like to distinguish time
and space.
In this case, it is also customary to define the time part with an opposite sign: as
we discussed around Eq. (4.15), it does not matter if we use eiωt or e−iωt in the FT. It is
just a matter of convention. So usually, when we have to take the FT in both time and
space, we would define it as

d3 r dt
Z
f˜(k, ω) = √ f (r, t)ei(ωt−k·r) . (4.25)
(2π)3/2 2π

That is, with e+iωt but e−ik·r . We don’t actually have to do this. We only do it be-
cause, as we saw in Sec. 3.9, plane waves usually have the form ei(ωt−k·r) , so Eq. (4.25)
naturally looks like an expansion in plane waves.

4.2 Characteristic function of a probability distribution

(optional)
Fourier transforms have a nice application in probability theory. Let P(x) denote
some probability distribution. We are often interested in the moments of the distribu-
tion, defined as Z ∞
hxn i = P(x)xn dx. (4.26)
−∞
The first moment hxi is the average. The second moment is used to construct the
variance σ2 = hx2 i − hxi2 , whose square root, σ, is the standard deviation. And so on.
The Fourier transform of P(x) is called the characteristic function
Z∞
G(k) = P(x)eikx dx = heikx i. (4.27)
−∞
√
In this case it is customary to define it without the 1/ 2π. The characteristic function
is thus nothing but the average of eikx . The variable k has no physical interpretation. It
is simply an auxiliary variable. The reason why G(k) is useful is because it contains all
moments hidden in it. Each moment requires the evaluation of an integral, which can
be difficult. But if we are able to compute just one integral, Eq. (4.27), then it turns out
we have compute all moments, n = 1, 2, 3, 4, . . .. Pretty cool eh!?
We can see how this works by expanding eikx in a power series,

(ik)2 2 (ik)3 3
eikx = 1 + (ik)x + x + x + ....
2! 3!
Plugging this in Eq. (4.27) and using the fact that the average is a linear operation,

139
hA + Bi = hAi + hBi, we get
(ik)2 2 (ik)3 3
G(k) = 1 + (ik)hxi + hx i + hx i + . . .
2! 3!
∞
X (ik)n
= hxn i. (4.28)
n=0
n!
Thus, we see that if we do a series expansion of G(k), the coefficient multiplying kn
will be proportional to hxn i.
So suppose you started with a certain P(x) and succeeded in finding G(k). Now we
expand it in a power series as
∞
X cn kn
G(k) = , (4.29)
n=0
n!
where
dnG
cn = . (4.30)
dkn k=0
Comparing (4.28) and (4.29) we can then conclude that cn = in hxn i. Or, what is equiv-
alent:
1 dnG
hx i = n n .
n
(4.31)
i dk k=0
We therefore obtain the moments by differentiating G(k). This is infinitely easier than
the integral (4.26).

Example: Consider the exponential distribution

P(x) = λe−λx θ(x), (4.32)
where λ > 0 is a parameter. Here θ(x) is the Heaviside function, which is introduced
simply to restrict the distribution to x > 0. This distribution is properly normalized,
P(x)dx = 1. The characteristic function is
R

Z∞ ∞
e−(λ−ik)x λ
G(k) = λ e−(λ−ik)x dx = λ = ,
−(λ − ik) 0 λ − ik
0

where the term evaluated at ∞ vanished since λ > 0. We now use the Geometric series
1
= 1 + a + a2 + a3 + a4 + . . . ,
1−a
which leads to
1
G(k) = = 1 + (ik/λ) + (ik/λ)2 + (ik/λ)3 + . . . .
1 − ik/λ
Comparing this with (4.28) we then find that
hxn i 1 n!
= n → hxn i = n .
n! λ λ
This therefore determines a neat and compact formula for all moments of the exponen-
tial distribution.

140
Example: Consider the Gaussian distribution

e−x /2σ
2 2

P(x) = √ . (4.33)
2πσ2
The Fourier transform √ was already computed in Eq. (4.13). We just need to adjust it
slightly because
√ of the 2π. In that occasion, we were using the definition (4.1), which
divided by 2π. The characteristic function (4.27) does not. Hence,
2
σ2 /2
G(k) = e−k

Using the series expansion of e x we get

∞ ∞
X (−k2 σ2 /2)n X σ2n
G(k) = = n n!
(ik)2n .
n=0
n! n=0
2

To make this look like (4.28), I introduced in the second equality a fake “i”, by writing
(−1)n k2n = (ik)2n . The resulting series contain only even powers in k. Hence, we can
immediately conclude that all odd moments must vanish, hx2n+1 i = 0. As for the even
moments, if we multiply and divide by (2n)! we get
∞
X (2n)!σ2 k2n
G(k) = .
n=0
2n n! (2n)!

This is now exactly the even part of the series (4.28), so that we can recognize
(2n)! 2n
hx2n i = σ . (4.34)
2n n!
In particular, when n = 1 we obtain the second moment hx2 i = σ2 . Usually σ2 is the
variance hx2 i − hxi2 , but in this case hxi = 0.

4.3 Operations involving Fourier Transforms

Parseval’s relation
Consider two functions, f (x) and g(x), with Fourier transforms f˜(k) and g̃(k). We
define their inner product as
Z∞
( f, g) = f ∗ (x)g(x)dx. (4.35)
−∞

This is the analog of the dot product, but for functions instead of vectors. In particular,
the inner product of f with itself is
Z∞
( f, f ) = | f (x)|2 dx. (4.36)
−∞

141
Clearly, this quantity is non-negative and will be zero if and only if f (x) is zero. Func-
tions which are such that ( f, f ) is finite are said to be square integrable. Intuitively
speaking, square integrable functions are those that decay sufficiently fast as x → ±∞;
that is, that are essentially “confined” within some finite region of space (and hence
vanish at x → ±∞).
Using the definition of the Inverse FT, Eq. (4.3), we can also write (4.35) as
Z∞ Z∞ Z∞
dk ˜∗ dk0 0
( f, g) = dx √ f (k)e−ikx √ g̃(k0 )eik x
2π 2π
−∞ −∞ −∞

Z Z∞
dx i(k0 −k)x
= dkdk0 f˜∗ (k)g̃(k0 ) e .
2π
−∞

All I did here was to change the order of the integrals, so that we can first integrate
over x. This is convenient since the resulting integral is nothing but the δ-function
representation (4.10):
Z∞
dx i(k0 −k)x
e = δ(k − k0 ).
2π
−∞

All factors of 2π eventually cancel and we are left only with

Z
( f, g) = dkdk0 f˜∗ (k)g̃(k0 )δ(k − k0 )

Finally, the δ eliminates one of the integrals, leaving us with

Z∞ Z∞
( f, g) = f (x)g(x)dx =
∗
f˜∗ (k)g̃(k)dk. (4.37)
−∞ −∞

This is known as Parseval’s relation. It means we can take the inner product in real
space or in Fourier space; it does not matter. In particular, if we take the inner product
of f (x) with itself, we are essentially computing the norm of the function. In this case
Eq. (4.37) yields

Z∞ Z∞
| f (x)| dx =
2
| f˜(k)|2 dk. (4.38)
−∞ −∞

One may, in fact, show that the Fourier Transform maps the space of square integrable
functions onto itself, in a one-to-one manner. Eq. (4.38) corroborates this idea.

142

�(�) � (�)
ℱ [�(�)]

Figure 4.3: The Fourier Transform as a machine/blackbox. The input function f (x) is processed
by the transform F to output a new function f˜(k).

Bases for the space of square integrable functions

The Inverse FT,

Z∞
dk ˜
f (x) = √ f (k)eikx ,
2π
−∞
√
can be viewed as an expansion of f (x) in a basis of plane waves eikx / 2π,
with coefficients f˜(k). This is similar to the case of Fourier series, but now k is
continuous, so the sum becomes an integral. Alternatively, we can also use the
δ-function to write
Z∞
f (x) = dy f (y)δ(x − y).
−∞

This is also an expansion in a basis: √ now the basis elements are δ(x − y) and
the coefficients are f (y). Both eikx / 2π and δ(x − y) are valid choices of basis
for the space of square integrable function. And Parseval’s identity (4.38) is
essentially saying that the choice of basis does not affect the inner product (just
like in the usual case of vectors).

Fourier derivatives and integrals

It helps to think of the Fourier Transform as a kind of machine, or blackbox, as
depicted in Fig. 4.3. In addition to the notation f˜(k), we sometimes also use the notation
F [ f (x)] ≡ f˜(k) to denote the operation of taking the Fourier Transform of f (x). That
is, Eq. (4.1) is a machine F , that inputs some function f (x) and spits out a new function
f˜(k).
Seeing things this way allows us to more easily determine some important proper-
ties of FTs. For instance, it is clear from the definition (4.1) that F is a linear operation:

F f (x) + g(x) = F f (x) + F g(x) .

(4.39)

Indeed, F is a linear operator from the space of square integrable functions to itself.
Similarly, I will leave it for you as an exercise to show that

F f (x)e−icx = f˜(k + c).

(4.40)

143
Or that
F f (x/a) = a f˜(ak),

(4.41)
Most importantly for us will be what happens when we take derivatives or integrals
of Fourier Transforms. Start with
Z∞
dk
f (x) = √ f˜(k)eikx .
2π
−∞

Now take the derivative with respect to x:

Z∞
dk
f 0 (x) = √ ik f˜(k)eikx .
2π
−∞

Thus, we see that if the FT of f (x) is f˜(k), then the FT of f 0 (x) will be ik f (k). In
Fourier space, derivatives are mapped into multiplication by ik:

d
⇐⇒ ik (4.42)
dx

We can also see this in another way. Start with the FT of f 0 (x):
Z
dx
F f 0 (x) = √ f 0 (x)e−ikx

2π
Now we integrate by parts. This just transfer the derivative from f 0 to e−ikx , plus a
boundary term:
∞ Z
1 dx d
F f (x) = √ f (x)e − √ f (x) e−ikx .
0 −ikx
2π −∞ 2π dx
But if the function is square integrable, it must vanish at ±∞, so the boundary term
vanishes. Moreover, the d/dx in the last term simply produces a factor of −ik, so that
we are left with Z
dx
F f 0 (x) = ik √ f (x)e−ikx = ik f˜(k),

2π
which again shows that the FT of f 0 (x) is just ik f˜(k).
We may also repeat the process as many times as we want. For instance,
F f 00 (x) = (ik)2 f˜(k).

Each time we differentiate, we simply get an extra ik. And the reverse logic holds for
the indefinite integral of f (x):
∞
hZ i Z dk f˜(k)
F dx f (x) = √ eikx .
2π ik
−∞

When we integrate, we get instead 1/ik. These properties, as we will see in the next
section, will be the key for solving PDEs using Fourier Transforms.

144
Convolutions
The convolution between two functions f (x) and g(x) is another function
Z∞
dy
( f ∗ g)(x) = √ f (x − y)g(y). (4.43)
2π
−∞

This may seem like a weird way of combining two functions. But it turns out convo-
lutions appear often in physics, specially related to differential equations and Green’s
functions. For instance, in Sec. 2.4 we saw how the particular solution of an inhomoge-
nous ODE Ly = f (t) could be written as
Z∞
y p (t) = G(t − t0 ) f (t0 )dt0 ,
−∞

where the Green’s function G(t) was the solution of LG(t) = δ(t). As you can see, this
is precisely a convolution.
The convolution (4.43) is actually symmetric in f and g:
Z∞ Z∞
dy dy
( f ∗ g)(x) = √ f (x − y)g(y) = √ g(x − y) f (y) = (g ∗ f )(x) (4.44)
2π 2π
−∞ −∞

To see that, we need to change variables in (4.43) to y0 = x − y. This makes dy0 = −dy,
but also flips the integration limits, so that in the end both changes cancel each other:
Z∞ Z−∞ Z∞
dy dy0 dy0
√ f (x − y)g(y) = − √ f (y0 )g(x − y0 ) = √ g(x − y0 ) f (y0 ).
2π 2π 2π
−∞ ∞ −∞

Convolutions also satisfy a very special Fourier property: the FT of a convolution

is the product of the FTs:

F f ∗ g = f˜(k)g̃(k).

(4.45)

To see why, we simply take the Fourier Transform (4.1) of f ∗g, using also the definition
of the convolution in (4.43)
Z∞
dx
F f ∗g = √ e−ikx ( f ∗ g)(x)

2π
−∞

Z∞ Z∞
dx dy
= √ e−ikx √ f (x − y)g(y).
2π 2π
−∞ −∞

145
We now do the following sorcery: we write e−ikx = e−ik(x−y) e−iky (woooow! Ninja!).
This allows us to rearrange the integrals and put the one over y to the left:
Z Z
dy −iky dx
F f ∗g = √ e−ik(x−y) f (x − y).

√ e g(y)
2π 2π
The reason why this is useful is because, if we now change variables to x0 = x − y in
the x integral, while keeping the y integral intact, we get
dx0
Z Z
dy 0
F f ∗g = √ e−iky g(y) √ e−ikx f (x0 ).

2π 2π
The two integrals are now completely factored. And, what is more, each integral is
nothing but the Fourier Transforms of g(x) and f (x). We therefore arrive at Eq. (4.45).

4.4 Cauchy problem for the heat equation

Consider the 1D heat equation

ut = αu xx , u(x, 0) = u0 (x), (4.46)

defined for all x ∈ R. This is a Cauchy problem: we impose initial conditions, but
no boundary conditions. The best way of solving this is via Fourier Transforms. We
define the FT with respect only to the position x as
Z∞
dx
ũ(k, t) = √ u(x, t)e−ikx , (4.47)
2π
−∞

which is therefore
√ still a function of t. To take the FT of (4.46), we multiply on both
sides by e−ikx / 2π and integrate from −∞ to ∞:
Z∞ Z∞
dx dx
√ ut (x, t)e−ikx = α √ u xx (x, t)e−ikx . (4.48)
2π 2π
−∞ −∞

The left-hand side can be written as

Z∞ Z∞
dx d dx d
√ ut (x, t)e−ikx = √ u(x, t)e−ikx = ũ(k, t). (4.49)
2π dt 2π dt
−∞ −∞

Conversely, on the right-hand side we have u xx , which is converted into multiplication

by (ik)2 :
Z∞
dx
√ u xx e−ikx = −k2 ũ(k, t). (4.50)
2π
−∞

146
Heat in Fourier space

Combining Eqs. (4.49) and (4.50) in Eq. (4.48) we then finally find that

ut = αu xx → ũt = −αk2 ũ. (4.51)

This result is very important: when we move to Fourier space, we convert

derivatives into multiplication by ik. As a consequence, we convert a PDE into
an ODE for ũ(k, t), as a function of t.

Solving (4.51) is now super easy:

2
ũ(k, t) = e−αk t ũ(k, 0),

where ũ(k, 0) is the initial condition, which can be found directly from Eq. (4.47):
Z∞
dx
ũ(k, 0) = √ u0 (x)e−ikx . (4.52)
2π
−∞

Moving to Fourier space makes solving the PDE very easy. But now we have to go
back to real space, by taking the inverse FT:
Z∞ Z∞
dk dk 2
u(x, t) = √ ũ(k, t)eikx = √ ũ(k, 0) eikx−αk t . (4.53)
2π 2π
−∞ −∞

To actually compute this integral, we need to know ũ(k, 0) or, what is equivalent, the
initial condition u0 (x).

Delta initial condition

To practice, let us start by assuming that initially the system was in a δ peak:

u0 (x) = δ(x − x0 ).

If we think of u as a concentration, this then means that the problem started with u
being sharply concentrated on a certain point x0 . The corresponding initial condition
in Fourier space, Eq. (4.52), will be simply

e−ikx0
ũ(k, 0) = √
2π
Plugging this in Eq. (4.53) leads to
Z∞
dk −αk2 t ik(x−x0 )
u(x, t) = e e . (4.54)
2π
−∞

147
��
0.1
�� 0.5

�(��)
�� 1
�� 2

��
-� -� � � �
�

Figure 4.4: The solution (4.55) of the 1D heat equation for a δ initial condition, with x0 = 0 and
different values of αt.

This is exactly the Fourier transform of the Gaussian; see, for instance, Eq. (4.14) with
x replaced by x − x0 and σ2 = 2αt. The result will thus be Eq. (4.11):
1 (x−x0 )2
u(x, t) = √ e− 4αt , t > 0. (4.55)
4παt
The initial δ-peak therefore evolves as a Gaussian, centered at position x0 but with a
growing variance σ2 = 2αt. That is, as time evolves the Gaussian spreads out. This
makes a lot of sense, when we think of u as a concentration. The variance of the
Gaussian is σ2 = 2αt, so that the standard deviation scales as
√
σ ∼ t. (4.56)

This type of spreading is usually taken as a trademark of diffusion. That is, when deal-
ing with
√ more general or complicated problems, we say that a problem is “diffusive” if
σ ∼ t.
For a generic initial condition, we can plug (4.52) in (4.53), leading to
Z∞ Z∞
dk 2 dy
u(x, t) = √ eikx−αk t √ u0 (y)e−iky .
2π 2π
−∞ −∞

We now change the order of the integrals and write this as

Z∞ Z∞
dk ik(x−y)−αk2 t
u(x, t) = dy u0 (y) e . (4.57)
2π
−∞ −∞

The integral over k is exactly the δ-function solution (4.54) and (4.55).

Greens function for the heat equation in 1D

The solution (4.55) of the δ initial condition is also called the Green’s func-

148
tion of the heat equation:
1 x2
G(x, t) = √ e− 4αt θ(t), (4.58)
4παt
With this definition, we see that the general solution (4.57) can be written as
Z∞
u(x, t) = dy u0 (y)G(x − y, t). (4.59)
−∞

We therefore see that from the Green’s function, we generate the solution to
any initial condition by taking the convolution of u0 (y) with G(x − y, t).

Green’s functions and relation to ODEs

When we first talked about Green’s functions in Sec. 2.4, we defined it as the par-
ticular solution of an ODE to a δ-like inhomogeneity. That is, as the particular solution
of LG(t) = δ(t). Here I am instead using the name Green’s function to refer to the
solution to a δ-like initial condition. These seem like two different problems. But it
turns out that they are not. Consider the inhomogeneous heat equation:
L u = ∂t u − α∂2x u = f (x, t), (4.60)
where L is the differential operator and f (x, t) is the inhomogeneity. Since we have
PDEs, both now involve the two variables, x and t. But other than that, the idea is the
same as in the case of ODEs.
Now here is how the “two Green’s functions” connect to each other: we analyze the
solution of Eq. (4.60) with zero initial conditions u0 (x) = 0, and with a inhomogeneity
f (x, t) = δ(t)δ(x − x0 ). That is, initially the concentration is zero, but at t = 0 we give
it a δ kick at x = x0 . For t > 0 the equation is again homogenous. We will show that
the solution is exactly G(x, t) in Eq. (4.58). That is, we can actually replace an initial
condition by a δ-kick. This is why we use the term Green’s functions interchangeably.
To prove this claim, we start by taking the FT of Eq. (4.60):
ũt + αk2 ũ = f˜(k, t),
where f˜(k, t) is the FT of f (x, t) with respect to to x [defined exactly like Eq. (4.47)].
This is a 1st order inhomogeneous ODE, which was treated in Sec. 2.5. The solution is
given in Eq. (2.37). Adapted to our notation, it reads
Zt
−αk2 t 2
(t−t0 )
ũ(k, t) = ũ(k, 0)e + dt0 e−αk f˜(k, t0 ). (4.61)
0

But since u(x, 0) = 0 so ũ(k, 0) = 0. Hence we are only left with

Zt
2
(t−t0 )
ũ(k, t) = dt0 e−αk f˜(k, t0 ). (4.62)
0

149
Greens function an inhomogeneous PDE
2
Define G̃(k, t) = e−αk t . This is the FT of the Green’s function (4.58) [c.f.
Eq. (4.54)]. With this we can then write the solution as

Zt
ũ(k, t) = dt0 G̃(k, t − t0 ) f˜(k, t0 ). (4.63)
0

This result is fairly cool: we have a PDE in x and t, but we only moved to
Fourier space with respect to x. As a result, we find that ũ(k, t) is a convolution
in time between G̃ and f˜. But as far as x and k are concerned, this is just a
product of the two Fourier transforms. Thus, according to the convolution the-
orem (4.45), if we now move back to real space, the result will be a convolution
in position between G and f (the integral in t0 does not interfere at all):

Zt Z∞
u(x, t) = dt 0
dy G(x − y, t − t0 ) f (y, t0 ). (4.64)
0 −∞

Isn’t this awesome?! The Green’s function propagates the solution. It says how
u(x, t) responds to the forcing f (y, t0 ), that occurred at previous times t0 and at
different positions y.

We haven’t yet proven our claim about the initial conditions, though. So far this is
general and holds for any inhomogeneity f . We now specialize it to f (x, t) = δ(t)δ(x −
x0 ). This kills both integrals in Eq. (4.64), leaving us precisely with u(x, t) = G(x −
x0 , t). Thus, indeed, what we are calling a Green’s function is actually a solution of two
different, but related problems:
• G(x, t) is the solution of ut − αu xx = 0, with u0 (x) = δ(x).
• G(x, t) is the solution of ut − αu xx = δ(t)δ(x), with u0 (x) = 0.
Notice also how, in both cases, we use G(x, t) to build more general solutions:
• The solution of ut − αu xx = 0, with arbitrary u0 (x) can be computed from
Eq. (4.59).
• The solution of ut − αu xx = f (x, t) with u0 (x) = 0 but arbitrary f (x, t) can be
computed from Eq. (4.64).

Heat equation in arbitrary dimensions

It is easy to generalize the above results to diffusion in arbitrary dimensions. We
consider the Cauchy problem
∂t u = α∇2 u, u(r, 0) = u0 (r), (4.65)

150
where r = (x1 , x2 , . . . , xd ), with d = 1, 2, 3, . . . being the dimension of the system (if
this confuses, imagine that d = 3 and (x1 , x2 , x3 ) = (x, y, z)). We now take the Fourier
Transform of the spatial coordinates, by defining

dd r
Z
ũ(k, t) = u(r, t)e−ik·r , (4.66)
(2π)d/2

where dd r = dx1 dx2 . . . dxd . Taking the FT on both sides of (4.65) then leads to

dd r
Z
∂t ũ(k, t) = α (∇2 u)e−ik·r . (4.67)
(2π)d/2

Since ∇2 u = ∂2x1 u + . . . ∂2xd u, the right-hand side will involve a sum of terms. In each
term, the same logic of the 1D case means we should replace ∂ x j into ik j . Put it differ-
ently, the substitution logic is now generalized to:

∇2 ⇐⇒ −k2 = −(k12 + . . . kd2 ). (4.68)

The PDE is therefore transformed into the ODE

∂t ũ(k, t) = −αk2 ũ(k, t), (4.69)

whose solution is
2
ũ(k, t) = e−αk t ũ(k, 0).
Things will start to change once we plug this back in the inverse FT:

dd k ik·r−αk2 t
Z
u(r, t) = e ũ(k, 0).
(2π)d/2
This integral is now more complicated because it is multidimensional. To learn how to
do it, we focus on the Green’s function case, where ũ(k, 0) = 1/(2π)d/2 . In this case
the solution will read
dd k ik·r−αk2 t
Z
G(r, t) = e . (4.70)
(2π)d
If we can compute this integral, then the solution for a generic initial condition will be,
in analogy with Eq. (4.59),
Z
u(r, t) = dd r 0 u0 (r 0 ) G(r − r 0 , t). (4.71)

Integrals of the form (4.70) can be very difficult. But today is actually our lucky
day, because this one in particular is super easy! What we have to realize is that k2 =
k12 + . . . + kd2 , so that the integral factors as a product:
Z Z Z
dk1 ik1 x1 −αk2 t dk2 ik2 x2 −αk2 t dkd ikd xd −αk2 t
G(r, t) = e 1 e 2 ... e d .
2π 2π 2π

151
Each integral is just a copy of the 1D Gaussian integral that led us from (4.54) to (4.55).
This is really an exercise in pattern matching: we don’t have to redo anything; just re-
cycle previous results. We therefore conclude that G(r, t) will be a product of solutions
of the form (4.55):
1
e−x1 /4αt e−x2 /4αt . . . e−xd /4αt .
2 2 2
G(r, t) =
(4παt)d/2

Finally, we can combine everything and write G solely in terms of r2 = x12 + . . . + xd2 :

1
e−r /4αt .
2
G(r, t) = (4.72)
(4παt)d/2
Diffusion in arbitrary dimensions is therefore also Gaussian. The concentration of a
particle in a river, or in the air, diffuses in all directions.
One interesting thing to notice about Eq. (4.72) is that the Green’s function depends
only on the magnitude of the position r = |r|. This is a consequence of the fact that
Eq. (4.65) is isotropic; that is, it is symmetric under rotations. We could also study
anisotropic diffusion: it would read something like

∂t u = α x1 (∂2x1 u) + . . . α xd (∂2xd u).

That is, with different diffusion constants for each direction. I will leave it for you as
an exercise to think about how the Green’s function would change in this case.

Heat equation in arbitrary dimensions

To summarize, the general solution of

∂t u = α∇2 u, u(r, 0) = u0 (r),

is Z
u(r, t) = dd r 0 u0 (r 0 ) G(r − r0 , t),

where
1
e−r /4αt .
2
G(r, t) =
(4παt)d/2

4.5 Quantum dynamics and Heisenberg’s uncertainty

principle
Another central application of Fourier Transforms is in quantum theory. Schrödinger’s
equation reads
∂ψ
i~ = Ĥψ, (4.73)
∂t

152
where Ĥ is called the Hamiltonian operator. In this section, to be more careful, I will
put hats on operators, so that we know they are not just plain numbers. In the simplest
case of a free particle in 1D, we have
p̂2
Ĥ = , (4.74)
2m
where m is the mass and
∂
p̂ = −i~
, (4.75)
∂x
is the momentum operator. Eq. (4.73) then becomes
∂ψ ~2 ∂2 ψ
i~ =− , (4.76)
∂t 2m ∂x2
which is similar to the heat equation (4.46), but with a complex left-hand side.
There is, however, a fundamental conceptual difference. The wavefunction ψ(x, t)
represents a probability ampitude (instead of a probability). The actual probability is
|ψ(x, t)|2 . Thus, for instance, the average position of the particle is given by
Z
hxi = dx |ψ(x, t)|2 x, (4.77)

while the second moment is

Z
hx2 i = dx |ψ(x, t)|2 x2 . (4.78)

We can then use this to build the variance,

σ2x = hx2 i − hxi2 , (4.79)

which measures the spread of the wavefunction in space.

We can also use ψ(x, t) to compute the expectation value of an operator Ô. It is
defined as Z
hÔi = dx ψ∗ (x, t) Ô ψ(x, t). (4.80)

The order of terms inside the integral matters: Ô is a differential operator and therefore
acts on anything to its right. So the integrand in (4.80) means that we must first act
with Ô on ψ and then multiply the result by ψ∗ . For instance, the average momentum
is
∂ψ
Z
h p̂i = −i~ dx ψ∗ , (4.81)
∂x
and the average momentum squared is
∂2 ψ
Z
h p̂2 i = −~2 dx ψ∗ 2 . (4.82)
∂x
Again, we use these to compute the variance of the momentum

σ2p = h p̂2 i − h p̂i2 , (4.83)

153
which measures overall fluctuations of the momentum. Similarly, we can compute the
average kinetic energy
h p̂2 i
hĤi = . (4.84)
2m
The position is also an operator x̂, but is one whose effect is kind of trivial: applying x̂
on ψ is the same as multiplying ψ by the number x.
In classical mechanics we describe systems in terms of both position and momen-
tum. In quantum mechanics the wavefunction is only a function of x, while momentum
is upgraded to an operator. We then extract information about the momentum by taking
averages such as (4.80).

Momentum and Fourier Space

Consider now the Fourier Transform
Z
dk
ψ(x, t) = √ ψ̃(k, t)eikx . (4.85)
2π
I want to show there is a direct connection between Fourier space and momentum. To
see that, we plug this definition in Eq. (4.81) twice (for ψ and for ψ∗ ):

dk0 ∂
Z Z ! Z !
0 dk
h p̂i = −i~ dx √ e−ik x ψ̃∗ (k0 , t) √ eikx ψ̃(k, t) .
2π ∂x 2π
The derivative ∂/∂x acts only on eikx , leaving us with
dxdk0 dk −ik0 x ∗ 0
Z
h p̂i = −i~ e ψ̃ (k , t) (ik) eikx ψ̃(k, t)
2π
Z Z
dx i(k−k0 )x
= dkdk0 (~k) ψ̃∗ (k0 , t)ψ̃(k, t) e .
2π
Z
= dkdk0 (~k) ψ̃∗ (k0 , t)ψ̃(k, t) δ(k − k0 )
Z
= dk (~k) |ψ̃(k, t)|2 .

We therefore reach the important conclusion that, if we work in Fourier space, the
average momentum (4.81) becomes simply
Z
h p̂i = dk (~k) |ψ̃(k, t)|2 . (4.86)

This looks exactly like the average position (4.77). There are no derivatives. We simply
average ~k over all ψ̃. Thus, |ψ̃(k, t)| can be directly recognized as the probability for
finding the particle with momentum ~k, just like |ψ(x, t)|2 is the probability of finding it
at position x. Moving to Fourier space is therefore the same as moving to momentum
space: k and momentum are just an ~ apart.

154
Gaussian wavepacket
Consider a Gaussian wavefunction
1
eiqx−(x−x0 ) /4σ ,
2 2
ψ(x) = (4.87)
(2πσ2 )1/4
where x0 , σ and q are parameters. We are not worrying about any possible time depen-
dence. You can think of this as being the state of the system at some fixed time. We
could use this as an initial condition for Eq. (4.76) and it would then start to evolve
as time goes on. But for now let us just think about the properties of this state, at one
given instant of time.
The constant in front of (4.87) is chosen so that the wavefunction is properly nor-
malized [compare with Eq. (4.12)],
Z Z
1
dx e−(x−x0 ) /2σ .
2 2
dx |ψ| = 1 =
2
(4.88)
(2πσ2 )1/2
A repetition of the Gaussian integrals we have already done quite a few times will show
that
hxi = x0 , hx2 i = x02 + σ2 , (4.89)
so that the x0 and σ2 are directly interpreted as the average position and the variance
of the Gaussian wavepacket: σ x = σ.
But what about the parameter q? It turns out it is related to momentum. Applying
the momentum operator to ψ, we find, with some small simplifications,
x − x0
p̂ψ = −i~ iq − ψ. (4.90)
2σ2
We now use this in Eq. (4.81); that is, we multiply by ψ∗ and integrate, which leads to
Z x − x0 2
h p̂i = −i~ dx iq − |ψ| .
2σ2
The second term will be an average of x − x0 , which is zero since hxi = x0 . In first
term, on the other hand, we can put iq outside the integral. All that remains is therefore
|ψ|2 , which is normalized to 1. Whence, we finally arrive at

h p̂i = ~q. (4.91)

The parameter q in Eq. (4.87) is therefore directly related to the average momentum.
What about the variance of p̂? To find that, we first compute the second moment.
Applying p̂2 to ψ is a little bit messier, but I will just tell you the result:
(x − x0 )2
( )
iq 1
p̂2 = −~2 − q 2
+ (x − x0 ) − ψ.
4σ4 2σ2 2σ2

We insert this in Eq. (4.80). The term which is quadratic in (x − x0 )2 , when averaged,
yields exactly σ2 . And the term linear in x − x0 integrates to zero. Whence, we find
1 1
h p̂2 i = −~2 2
− q2 − ,
4σ 2σ2

155
or simplifying, a bit:
~2
h p̂2 i = ~2 q2 +
4σ2
The variance in momentum σ2p = h p̂2 i − h p̂i2 , will thus be

~2
σ2p = . (4.92)
4σ2
As one might expect, we see here the same kind of trade-off we saw in the Fourier
business: a large variance σ2 in position implies a small variance ~2 /4σ2 in momen-
tum, and vice-versa. This trade-off is neatly summarized by computing the uncertainty
product:
~
σx σ p = . (4.93)
2
The product of the variances is constant. So if we want to decrease one, we must
increase the other.

Gaussian wavepackets

A Gaussian wavepacket in 1D has the form

1
eiqx−(x−x0 ) /4σ ,
2 2
ψ(x) =
(2πσ2 )1/4
where x0 , σ and q are parameters. This packet has

hxi = x0 , σ x = σ,
~
h p̂i = ~q, σp = ,
2σ
so that σ x σ p = ~/2. I will leave it for you as an exercise to show that the
Fourier Transform is
!1/4
2σ2
e−i(k−q)x0 −(k−q) σ ,
2 2
ψ̃(k) =
π

The momentum-space wavefunction is thus also a Gaussian.

The Heisenberg uncertainty principle

Eq. (4.93) is a particular case of Heisenberg’s uncertainty principle: given any
quantum state ψ, it is possible to show that

~
σx σ p > . (4.94)
2

156
The principle states that there is a lower bound to the uncertainty product. We can never
know both x and p with infinite precision. The Gaussian wavepacket is a limiting case,
where the bound is saturated. And even in this case, infinite precision on x would imply
infinite ignorance on p, and vice-versa. For other states ψ, we always get something
larger.
We are going to prove Eq. (4.94). But the first thing to realize is that Heisenberg’s
uncertainty is not only a quantum thing: it is actually a direct consequence of Fourier
analysis and even has important applications in, e.g., signal processing. In fact, it can
be essentially summarized by the statement that a function and its Fourier transform
cannot both be sharply localized. Our proof emphasizes this connection. Actually, to
really emphasize it, I am going to call our function f (x) and its Fourier transform f˜(k).
That way we can be sure that it holds for any square-integral function. We assume,
without loss of generality, that f (x) is normalized, | f (x)|2 dx = 1. And we will be
R

interested in the dispersion of f (x) around zero,

Z
hx2 i := x2 | f (x)|2 dx,

as well as the corresponding dispersion in k-space:

Z
hk2 i := k2 | f˜(k)|2 dk.

We can also suppose, again without loss of generality, that the means vanish, hxi =
hki = 0.2 The basic idea, therefore, is to compare the width of f (x) in real space, with
the width of f˜(k) in Fourier space. This is the essence of Heisenberg’s principle.
To start, consider the integral
Z 2
I(λ) = λx f (x) + ∂ x f (x) dx,

(4.95)

where λ is just an auxiliary parameter. By construction, we must have I > 0. But if we

now open it up, we may write it as
Z Z Z
I(λ) = λ2 x2 | f |2 dx + λ x f ∗ ∂ x f + f ∂ x f ∗ dx + |∂ x f |2 dx.

(4.96)

The first integral is simply hx2 i. In the second integral, we integrate by parts one of the
terms. Assuming all boundary contributions vanish, we get only
Z Z Z Z
(∂ x f ∗ )x f dx = − f ∗ ∂ x (x f )dx = − | f |2 dx − x f ∗ ∂ x f dx.

The last term is going to cancel a corresponding term in Eq. (4.96), while the first is 1
by normalization. Thus, we are left with
Z
I(λ) = λhx2 i − λ + |∂ x f |2 dx.

2 If the means don’t vanish, define a new function F(x) = f (x + x )e−ik0 x , where x = hxi and k = hki.
0 0 0
Then one may verify that hx2 iF = h(x − x0 )2 i f and hk2 iF = h(k − k0 )2 i f , where h. . .iF means an average over
F (and F̃) instead of f .

157
Finally, we play with the last term. Recall Parseval’s identity:
Z Z
| f (x)| dx =
2
| f˜(k)|2 dk.

Moreover, recall that if f˜ is the FT of f , then the FT of ∂ x f will be ik f˜. Thus, Parseval’s
identity for f 0 must yield
Z Z
|∂ x f |2 dx = ik f˜2 dk = hk2 i.

If this
R Parseval argument did not convince you, try plugging the definition of the FT
into |∂ x f |2 dx and show that it is indeed hk2 i. In any case, with this last result, we
finally get
I(λ) = λ2 hx2 i − λ + hk2 i. (4.97)
We started with I(λ) > 0, so this must of course continue to be true. But we can also
look at I(λ) as being a quadratic polynomial in λ. The condition to have I(λ) > 0 is for
this polynomial to have no real roots. The discriminant must thus be non-positive:3

1 − 4hx2 ihk2 i 6 0. (4.98)

Whence,

1
hx2 ihk2 i > . (4.99)
4

This is Heisenberg’s uncertainty solely from Fourier analysis. We can recover the quan-
tum result (4.94) by noticing that h p̂2 i = ~2 hk2 i.

Solution of Schrödinger’s equation by Fourier Transforms

Let us now go back to Schrödinger’s equation for the free particle, Eq. (4.76). We
solve it in the same way as the heat equation, by introducing the Fourier Transform
ψ̃(k, t) defined in Eq. (4.85). This transforms Eq. (4.76) from a PDE to an ODE:

~2 k2
i~∂t ψ̃ = Ek ψ̃, Ek := . (4.100)
2m
The solution is just ψ̃(k, t) = e−iEk t/~ ψ̃(k, 0). This motivates us to define the Fourier-
space Green’s function
G̃(k, t) = e−iEk t/~ . (4.101)
The Fourier space solution then becomes

ψ̃(k, t) = G̃(k, t)ψ̃(k, 0). (4.102)

√
3 The polynomial aλ2 − λ + c has roots λ = (1/2a) ± (1/2a) 1 − 4ac. If the discrimiant 1 − 4ac is
±
negative, this will have no real roots. As a consequence, since a ≡ hx2 i > 0, the function aλ2 − λ + c will be
a parabola pointing up and starting above the x axis. That is, it is guaranteed non-negative.

158
��
��
��

|ψ �
��
�
��
��
��
-� -� -� � � � �
�

Figure 4.5: Solution of Eq. (4.105) for different times, with an initial wavepacket ψ(x, 0) taken
as a boxcar between [−1, 1] (dashed curve). The curves are plotted with m = ~ = 1.

This is therefore the product of two Fourier Transforms, G̃(k, t) and ψ̃(k, 0).
To go back to real space, we now take the inverse FT. We can skip a bit of work
here using the convolution theorem (4.43). Since ψ̃(k, t) is the product of two FTs, then
ψ(x, t) must be the convolution of ψ(x, 0) with4
Z
dk i(kx−Ek t/~)
G(x, t) = e , (4.103)
2π
which is the FT of G̃(k, t). This is the same Gaussian integral that led us to (4.58), but
with α = i~/2m. Thus, we find Hence
r
m
e−mx /2i~t θ(t).
2
G(x, t) = (4.104)
2πi~t
The final solution is thus
Z∞
ψ(x, t) = dy G(x − y, t)ψ(y, 0). (4.105)
−∞

Example: Boxcar wavepacket. To consider an example, we can take ψ(x, 0) to be a

boxcar between [−1, 1]. The integral in (4.105) can then be expressed in terms of the
error function (the result is a bit ugly to write explicitly). Plots of |ψ(x, t)|2 for different
times are shown in Fig. 4.5. The discontinuous nature of the boxcar causes |ψ|2 to go
crazy for initial times. However, as time progresses things cool and |ψ| starts to behave
a bit like a Gaussian, which slowly spreads out. When t → ∞ the packet eventually
becomes infinitely spread through space.

4.6 Poisson’s equation

Poisson’s equation reads
− ∇2 φ(r) = ρ(r), (4.106)
4 This
√ √
is defined with an extra 2π because I removed the 2π in Eq. (4.105).

159
where ρ(r) is the inhomogeneous term, sometimes also called the external source. This
equation can be defined in any dimension and minus sign is placed only for convenience
(it could be absorbed into ρ).
Poisson’s equation appears often in electrostatics, for instance. Consider a region
of space containing a certain charge density ρ(r). Gauss’ law states that the electric
field E(r) generated by this density will satisfy

∇ · E = ρ/0 , (4.107)

where 0 is the vacuum permittivity. If everything is independent of time, we may write

E as the gradient of a potential, E = −∇φ, which then leads to Poisson’s equation

− ∇2 φ = ρ/0 , (4.108)

with an extra constant 0 that could, again, be absorbed into ρ.

To solve (4.106) we take the Fourier Transform on both sides, by defining

d3 r
Z
φ̃(k) = φ(r) e−ik·r . (4.109)
(2π)3/2
Each ∇ is converted into ik, so that −∇2 is then converted into k2 = |k|2 . Thus,
Eq. (4.106) in Fourier space becomes

k2 φ̃ = ρ̃, (4.110)

where ρ̃ is the FT of ρ(r), and is defined exactly as in (4.109). In Fourier space the
solution is thus trivial,
ρ̃
φ̃ = 2 . (4.111)
k
We now go back to real space and write
d3 k ik·r
Z
φ(r) = e φ̃(k)
(2π)3/2
d3 k ik·r ρ̃(k)
Z
= e
(2π)3/2 k2
d3 k ik·r 1 d3 r 0 −ik·r0
Z Z
= e e ρ(r 0 ).
(2π)3/2 k2 (2π)3/2
Rearranging the integrals allows us to identify the Green’s function, just like before:
0
d3 k eik·(r−r )
Z Z
φ(r) = d r ρ(r )
3 0
(2π)3 k2
Z
= d3 r ρ(r 0 )G(r − r 0 ), (4.112)

where
d3 k eik·r
Z
G(r) = , (4.113)
(2π)3 k2

160
Figure 4.6: The convenient choice of axes for carrying out the k-integral in Eq. (4.113).

is the Green’s function; that is, it is the solution of

− ∇2G(r) = δ(r). (4.114)

Eq. (4.112) is a convolution of the Green’s function with the external source. Ac-
cording to the convolution theorem, in Fourier space φ̃(k) should then simply be a
product of the Fourier Transforms of ρ and G. And, indeed, this is exactly what we see
in Eq. (4.111): The Fourier Transform of G(r) is (being sloppy with 2π’s) G̃(k) = 1/k2 ,
which can be read off directly from (4.113). So Eq. (4.111) is indeed nothing but
φ̃(k) = ρ̃(k)G̃(k).
Before we compute G(r), it is fun to realize that you actually already know the
answer; you saw it in your introductory electromagnetism lectures. Eq. (4.114) must
describe (except for a factor of 1/0 ) the electrostatic potential produced by a point
charge (of magnitude q = 1 at position r = 0. Thus, G(r) must be given by Coulomb’s
law:
1
G(r) = . (4.115)
4πr
Of course, numerical factors like 4π are harder to predict. But what is important is that
G ∝ 1/r. Let us now compute the integral in Eq. (4.113) and indeed show that this is
the case.
Integrals of the form (4.114) appear often in this Fourier business. This is a volume
integral, over d3 k = dk x dky dkz . You are probably used to doing volume integrals in real
space. This is the same thing, but in Fourier space. And, in fact, the real space position
r is just a parameter (as far as the integral is concerned), so we must, in principle,
compute a new integral for each value of r we plug in. The trick to evaluate G is to
realize that we have some freedom in how we choose the orientation of the k reference
frame, for a given r. This is illustrated in Fig. 4.6. For a given r, we can always
reorient the k axis and choose it so that kz is parallel to r.

161
We then move to spherical coordinates (in k-space), by defining d3 k = k2 sin θdkdθdϕ.
Since we chose r to be parallel to kz , it then follows that k · r = kr cos θ. The integral
in (4.113) then becomes

Z∞ Zπ Z2π
1 eikr cos θ
G(r) = dk k 2
dθ sin θ dϕ . (4.116)
(2π)3 k2
0 0 0

The integral in ϕ is trivial and gives 2π. For the other 2, we first compute the integral
in θ and then the one in k. That is, we write this as
Z∞ Zπ
2π
G(r) = dk dθ sin θ eikr cos θ .
(2π)3
0 0

To compute the θ integral, we define z = cos θ. This yields dz = − sin θ dθ. Moreover,
the integration limits are now z = 1 when θ = 0 and z = −1 when θ = π. In fact, the
following transformation rule is useful to know:

Zπ Z1
dθ sin θ f (cos θ) = dz f (z). (4.117)
0 −1

As a consequence, we get

Zπ Z1
ikr cos θ eikr − e−ikr
dθ sin θ e = dz eikrz = .
ikr
0 −1

Whence,
Z∞ Z∞
1 (eikr − e−ikr ) 1 sin kr
G(r) = dk = 2 dk .
(2π)2 r ik 2π r k
0 0

Computing this integral is slightly tricky. One way to do it is using residues and com-
plex integration. Alternatively, we can get it as a particular case of an integral that will
appear in the problem set. In any case, the result turns out to be
Z∞
sin kr π
dk = (4.118)
k 2
0

Thus, we finally get

1
G(r) =
4πr
which is exactly Eq. (4.115).

162
Green’s function for Poisson’s equation

The general solution of Poisson’s equation −∇2 φ = ρ(r), for an arbitrary

external source ρ(r), is
ρ(r 0 )
Z Z
φ(r) = d r ρ(r )G(r − r ) =
3 0 0 0
d3 r0 (4.119)
4π|r − r 0 |
where
1
G(r) = , (4.120)
4πr
is the associated Green’s function. Eq. (4.119) is actually the familiar solu-
tion from electrostatics, determining the electric potential formed by a charge
distribution ρ(r).

Notice also how the Green’s function depends only on the magnitude r. This is a
consequence of the fact that −∇2 is an isotropic differential operator.

Example: a thin rod of length 2L. Suppose the external source is generated by a
thin rod of length 2L, displaced along the x axis, from −L to L. In this case we get
ρ(r 0 ) = ρ0 δ(y0 )δ(z0 )θ(x0 − L)θ(L − x0 ), (4.121)
where ρ0 is the magnitude of the source density. That is, two deltas in y and z, plus
apboxcar in x. When plugging this in Eq. (4.119), it is convenient to write |r − r 0 | =
(x − x0 )2 + (y − y0 )2 + (z − z0 )2 . Since the δ’s kill the integrals in y0 and z0 , we are then
left only with
ZL
ρ0 1
φ(r) = dx0 √ ,
4π x − 2xx0 + r2
02
−L
where I wrote (x − x0 )2 + y2 + z2 as x02 − 2xx0 + r2 . Using that
Z
1 √
dx0 √ = ln x0 − x + x02 − 2xx0 + r2 ,
x02 − 2xx0 + r2
we then arrive at
√
ρ0 L − x + L2 − 2xL + r2
( )
φ(r) = ln √ . (4.122)
4π −L − x + L2 + 2xL + r2
In particular, if we are at x = 0, this simplifies to
√
ρ0 L + L2 + r 2
( )
φ(0, y, z) = ln √ .
4π −L + L2 + r2
We can also analyze what happens if the rod is very large. To do that, we first rewrite
this as
1 + 1 + (r/L)2
p
ρ0
( )
φ(0, y, z) = ln .
−1 + 1 + (r/L)2
p
4π

163
Δ�
�

� � ��

Figure 4.7: A function f (t) of period T (infrared) and small be discretized in small steps ∆t
(ultraviolet).

√
When L r, we can then expand 1 + ' 1 + /2, to find

ρ0 2 + (r/L)2 /2 ρ0 4L2
( ) !
φ(0, y, z) ' ln = ln 1 + 2 .
4π (r/L)2 /2 4π r

The last term is now much larger than 1, so that we can also write
ρ0 2 2 ρ0
φ(0, y, z) ' ln 4L /r = ln 2L/r . (4.123)
4π 2π
We could also split the log into ln(2L) and − ln(r). The first term is in principle infinite
if L → ∞. But this is not unphysical since this is a constant and electrostatic potentials
are only defined up to a constant (this term, for instance, does not affect at all the
electric field).

4.7 Discrete Fourier Transform and the FFT algorithm

In mathematical physics, we focus mostly on continuous functions f (t). But in
a computer there is no such thing. To do numerics we must, one way or another,
discretize the function and construct a sequence { f j = f (t j )} lying at certain points
t j . Here we extend Fourier Transforms to the realm of discrete functions. This turns
out to have two major applications. The first is in solid state physics: the atoms in a
crystal are arranged periodically through a lattice. So a discretized version naturally
appears. The second is in computer science. It turns out there is an algorithm, called the
Fast Fourier Transform (FFT), which allows us to compute the FT with remarkable
efficiency. At first it was not clear how useful this would be. People said it was a
“solution waiting for a problem”. But now people found problems. A bunch of them
actually. There is a myriad of crazy applications which use the FFT, even for tasks
which seem to have nothing to do with Fourier. For instance, every time you visualize
a JPEG file, it is doing an FFT. That’s incredible! In fact, many specialists regard the
FFT as the most valuable algorithm of the 20th century.

164
��

��
��
��
��
��
� � � � ��
-��
-��
-�� -��
��

Figure 4.8: Examples of discrete periodic functions.

Infrared and ultraviolet

Back to Chapter 1. Let f (t) denote a periodic function, with period T . Then we
know we can express it as a Fourier series,
X∞
f (t) = cn e−i2πnt/T , (4.124)
n=−∞

ZT
1
cn = dt ei2πnt/T f (t). (4.125)
T
0

The period T is what we are going to call the infrared cut-off of f (t). This name is
borrowed from high energy physics, and is just a mnemonic, to give you some intuition.
The logic is that infrared mean something that has a low energy, or long wavelengths.
And the periodicity T is the longest length there is, and also what defines the lowest
energy harmonic ei2πt/T . As we saw in Sec. 4.1, If we want to obtain a Fourier Trans-
form, we can simply take T → ∞. That is, we eliminate the infrared cut-off and this
converts the sum to an integral. The term “infrared” is thus directly related with the
fact that the series is a sum, instead of an integral.
But now suppose we cannot sample f (t) for all t ∈ [0, T ], but only on a discrete set
of points
t j = j∆t, j = 0, 1, 2, . . . , N − 1, ∆t = T/N (4.126)
That is, we assume [0, T ) is divided into N equally spaced points and we are only able
to determine f at these points (Fig. 4.8). The function evaluated at these points then
generates a sequence of N points f j = f (t j ). Since the function is periodic, we can
of course also extend the sequence to points outside [0, T ). And the corresponding
sequence will thus also be periodic.That is, f j+N = f j . This is illustrated in Fig. 4.8.
The discretization step ∆t will be called the ultraviolet cut-off. We often like to
think that ∆t is very small. So just like infrared fixed the lowest energy harmonic,
we will now see that the ultraviolet cut-off fixes the highest energy harmonic. This is
super cool: the series (4.124) has no ultraviolet cut-off; the harmonics are summed all
the way to infinity. It has only an infrared cut-off due to the periodicity T . Conversely,
discretizing the function, introduces an ultra-violet cut-off. That is, it truncates the sum
in n to a maximum value nmax . This will be the main result of the next section.
To contrast, Fourier Transforms have neither infrared nor ultraviolet cut-offs; the
magnitude of the energy, |ω|, is allowed to range from 0 (lowest energy) to ∞ (highest
energy). The hierarchy of cut-offs is therefore as follows:

165
• Generic continuous function f (t): has a Fourier Transform (no cut-offs).
• Continuous, but periodic with period T : has an infrared cut-off, yielding a series
instead of an integral.
• Discrete and periodic f j : has both infrared and ultraviolet cut-offs. Infrared
means the Fourier representation is a series, while ultraviolet truncates the series
to a finite number of terms.

Fourier series for a discrete function

Consider the Fourier series (4.124), but evaluated only at the discrete points f j :
X
fj = cn e−i2πn j/N . (4.127)
n∈Z

Here I used Eq. (4.126) to write the exponent as

2πnt j 2πn j∆t 2πn j(T/N) 2πn j
= = = .
T T T N
The fact that we are evaluating f only at discrete points introduces a special feature.
Consider the exponent in (4.127) and suppose we shift n → n + N:

e−i2πn j/N → e−i2π(n+N) j/N = e−i2πn j/N e−i2π j = e−i2πn j/N ,

since e−i2π j = 1 for any integer j. Thus, we see that shifting n → n + N takes e−i2πn j/N
to itself. This means that, as we vary n, there are actually only N distinct numbers
e−i2πn j/N . Usually we take them to be those with n = 0, 1, 2, . . . , N − 1. But we are free
to take any other choice. For instance, a popular one is
h N N i
n∈ − , −1 , N even,
2 2
h N − 1 N − 1i
n∈ − , , N odd.
2 2
This produces the same N points e−i2πn j/N . It is just a bit more annoying to use since
we need to keep track of whether N is even or odd. This is illustrated in Table 4.1.
We can therefore regroup the sum in (4.127) as
N−1
X ∞
X
fj = e−i2πn j/N cn+νN .
n=0 ν=−∞

That is, for each n ∈ [0, N − 1], we group all terms cn , cn±N , cn±2N , . . . which will be
multiplied by the same exponentials. Recall that the exponentials e−i2πn j/N form a basis
for the set of functions (in this case these are discrete functions; i.e., sequences). We
may therefore define new coefficients
∞
X
f˜n = cn+νN , (4.128)
ν=−∞

166
Table 4.1: The basic exponentials e−i2πn j/N for n ∈ [0, N − 1] (upper table) and n ∈ [−N/2, N/2 − 1]
(lower table), with N = 10 and j = 3.

n 0 1 2 3 4 5 6 7 8 9
e−i2πn j/N 1 e−i2π/5 e−i4π/5 ei4π/5 ei2π/5 1 e−i2π/5 e−i4π/5 ei4π/5 ei2π/5

n -5 -4 -3 -2 -1 0 1 2 3 4
e−i2πn j/N 1 e−i2π/5 e−i4π/5 ei4π/5 ei2π/5 1 e−i2π/5 e−i4π/5 ei4π/5 ei2π/5

so that the Fourier series becomes

N−1
X
fj = f˜n e−i2πn j/N . (4.129)
n=0

The series now has only N terms. The assumption that f is only evaluated at a discrete
set of points therefore reduces an infinite sum to a finite one. If we want to recover a
continuous function, we take N → ∞. That is, we evaluate f at an infinite number of
points.

Orthogonality of discrete exponentials

We still haven’t really determined the coefficients f˜n in Eq. (4.129). Sure, they are
expressed in terms of the cn , which are given in (4.125). But that is kind of cheating
since (4.125) relies on the complete function f (t). We should think about this more
pragmatically: suppose all we are given is the discrete set of points { f j }. How can we
then determine the Fourier coefficients f˜n ? To do that, we need to establish orthogo-
nality relations of the complex exponentials, just like we did in Chapter 1. But now we
need to do this using sums, instead of integrals. Consider the following sum:
N−1
X 0
ei2πn( j− j )/N
n=0

I want to convince you that when j, j0 are integers, this also satisfies an orthogonality
relation. In fact, we are going to prove that

N−1
1 X i2π( j− j0 )/N
e = δ j j0 , (4.130)
N n=0

where δ j j0 is the Kronecker delta. It is easy to accept that the case j = j0 is correct,
P
since in this case we get n 1, and the sum has N terms. It is much less obvious what
happens if j , j0 (both still integers, of course). The idea is that in this case the complex
exponentials all cancel out. I will try to give two arguments to convince you of this.

167
�=� �=� �=� �=� �=� �=�

, plotted in the complex plane, with N = 6 and

PN−1 i2πn j/N
Figure 4.9: The different terms in n=0 e
different values of j. We see that only for j = 0 the different terms will not cancel
out identically.

The first is Fig. 4.9: I plot the position of ei2πn j/N in the complex plane. Each plot is
for a fixed j, with N = 6. The message is that except for j = 0, in all other cases the
horizontal and vertical components of each red dot will cancel identically.
0
Another, more rigorous way of seeing this is by defining x = ei2π( j− j )/N , so that our
sum becomes
N−1
X N−1
X
0
ei2πn( j− j )/N = xn
n=0 n=0
We assume j , j0 . This is then a truncated geometric series, so we may use the
following tabled result
N−1
X 1 − xN
xn = , (4.131)
n=0
1−x
That is,
N−1 0
X
i2πn( j− j0 )/N 1 − ei2π( j− j )
e = .
n=0
1 − ei2π( j− j0 )/N
0
But this, in turn, will be identically zero since ei2π( j− j ) = 1. Thus, indeed, if j , j0 the
sum yields zero. And if j = j0 , it gives N. We have thus proved (4.130).
Armed with Eq. (4.130) we can now finally compute the Fourier coefficients f˜n in
Eq. (4.129). We multiply both sides by ei2πm j/N and sum over j, from 0 to N − 1:
N−1
X N−1
X N−1
X
f j ei2πm j/N = f˜n ei2π(m−n) j/N .
j=0 n=0 j=0

The sum over j yields Nδn,m , which in turn kills the sum over m, leaving us with
N−1
X
f j ei2πm j/N = N f˜m .
j=0

Whence, we conclude that

N−1
1 X i2πm j/N
f˜n = f je .
N j=0

168
�
�
� �π-Ω
� Ω=�
�

|� �|
�
��


�
-� �
-� �

� ��
� ω�

Figure 4.10: Left: the function f j = sin(Ω j) + R j , where R j is a random number between [−2, 2]
representing some noise that has been added to the data. The plot was produced
using Ω = 1. Right: Corresponding Fourier Transform | f˜n |.

which is our desired formula. It determines the Fourier coefficients directly from orig-
inal function f j . Notice the beautiful symmetry with respect to (4.129).

Discrete Fourier Transform

The Discrete Fourier Transform (DFT) of a sequence { f j } of length N is given

by another sequence { fñ }, also of length N, given by
N−1 N−1
X 1 X i2πm j/N
fj = fñ e−i2πn j/N . fñ = f je , (4.132)
n=0
N j=0

The move from one to the other and back is a consequence of the orthogonality
relation
N−1
1 X i2π( j− j0 )/N
e = δ j j0 , (4.133)
N n=0
which holds for any integers j, j0 , N. The original function is evaluated at a set
of points t j = j∆t, with j = 0, 1, . . . , N − 1. And since

e−i2πn j/N = e−iωn t j ,

the corresponding frequencies will range over ωn = 2πn/N∆t. They therefore

very from 0 up to ∼ 2π/∆t.

Applications of Discrete Fourier Transform

DFTs have many uses. Any kind of time series (e.g. the financial market or whether
forecast) can be analyzed using DFTs. Essentially, what DFT does is give you the
distribution of frequencies of a certain process. Consider the function f j = sin(Ω j).
This function is of course periodic, with frequency Ω. But now let us assume we add

169
��
� ��

��/��
� ��

|� �|

� ��

� ��

��
��
�� ω�

Figure 4.11: Left: ratio between Euro and Brazilian Real as a function of time. Right: Corre-
sponding Fourier Transform | f˜n |.

some noise to it. That is, we consider the sequence f j = sin(Ω j) + R j , where R j is a
random number between [−2, 2]. The result is the left panel in Fig. 4.10. Note that sin()
varies between −1 and 1, while the noise varies from -2 to 2. We are therefore adding a
huge noise on top of it. In fact, the original sinusoidal behavior is barely recognizable.
But we can do a spectral analysis. That is, we take the DFT (4.132) (more on
how to do that on a computer below). The result for | f˜n | is shown in the right panel
of Fig. 4.10. Remarkably, the identification of the frequencies is still quite clear. On
top of all the background noise, two peaks clearly stand out. One peak corresponds to
ωn ∼ Ω. And since sin a = (eia − e−ia )/2i, there will also be a peak at ωn ∼ 2πΩ.
In the real world, the examples are not always this gentle. In Fig. 4.11 I plot the
time series for the ratio between the Euro and the Brazilian real, over the last 12 years.
As can be seen, on top of the overall trend, there is a bunch of noise. This noise, how-
ever, has several forms. Some noises are low frequency, which can mean for instance
seasonal fluctuations and so on. Conversely, there are noises which have very high
frequency, like the fact that the market generally behaves differently on Mondays or
Fridays. Each point in Fig. 4.11 is for one day. So ∆t = 1 day. This defines the ultra-
violet cut-off of this series. The highest frequency is therefore 2π/(1 day). However,
I plot only up to π, since, as we saw in Fig. 4.10, the interval [π, 2π] will just be a
repetition of the previous one.
The Fourier analysis presented in Fig. 4.11 is quite crude and there are many other,
more sophisticated ways of analyzing a time series in Fourier space. This is related to
terms such as “Power spectral density” and “Autocorrelation function”. Unfortunately
I will not have time to go into these details. But you can learn about it in any data
analysis or signal processing book.

Image compression: Another major application of DFTs is in image compression.

This is suuuuuper cool. Every time you open a JPEG file, your computer is doing a
DFT. Please have a look at this Mathematica notebook where I explain the idea in quite
some detail.

170
The Fast Fourier Transform (FFT) algorithm
This section is based on the book “Linear Algebra”, by Gilbert Strang, which, by
the way, is the best book ever written in the entire world.
The reason why the DFT is so useful in practice is because it can be computed
insanely fast. The algorithm that does this is called the Fast Fourier Transform, or FFT.
It was invented by Cooley and Tukey in 1965, although there are bits and pieces of it
already present in Gauss’ work in 1805. For a sequence { f j } of N points, a naive calcu-
lation of { f˜n } using Eq. (4.132) would required N 2 operations: there are N coefficients
f˜n to compute. And for each one, we need to add up N term f j e−i2πn j/N . FFT does it
with 21 N ln2 N. This is huge improvement. It is the difference between 1 million and
5000. Imagine if this was money in a bank account we were talking about. In fact,
the dependence on (ln N) is not very relevant [remember that ln10 (1023 ) = 23], so the
algorithm scales roughly linearly with the size of the list.
The first step in constructing the FFT algorithm is to realize that the DFT (4.132) is
actually nothing but a matrix multiplication. Define the N × N matrix WN , with entries

(WN ) jn = e−i2πn j/N := ωNjn , (4.134)

where ωN = e−i2π/N is introduced by convenience. Note that the indices of the matrix
go from 0, . . . , N − 1, like in many programming languages. With this definition, we
can now write the DFT (4.132) as
N−1
X N−1
X
fj = (WN ) jn f˜n = ωNjn f˜n . (4.135)
n=0 n=0

This can be compared with multiplication of a vector by a matrix:

X
y = Ax → yj = = A jn xn .
n

It is exactly the same thing. We can thus store the sequences { fn } and { f˜n } in vectors
f = ( f0 , f1 , . . . , fN−1 ) and f˜ = ( f˜0 , f˜1 , . . . f˜N−1 ). Then Eq. (4.135) can be written as

f = WN f˜, (4.136)

Doing a DFT is therefore tantamount to multiplying a vector by the matrix WN . For

instance, if N = 4 we have
   
1 1 1 1  1 1
 
1 1 

  
1 ω4 ω24 ω34  1 −i −1 i 
W4 =   =   (4.137)
1 ω2 ω4 ω6  1 −1 1 −1
 4 4 4   
1 ω34 ω64 ω94 1 i −1 −i

171
Eq. (4.136) would then become

1   f˜0 
    
 f0  1 1 1
  
  
 f1  1 −i −1 i   f˜1 
 
  =     .
 f  1
 2   −1 1 −1  f˜2 
     
f3 1 i −1 −i f˜3

This matrix WN is very special. Its entries are complex and it is symmetric. That is,
WT = W. But most importantly, it is what we call, a unitary matrix. To explain what
that means, we first define an operation called the Hermitian conjugate, according to

A† = (AT )∗ .

That is, you first transpose the matrix and then take the complex conjugate. The symbol
A† reads “A-dagger”. A matrix U is said to be unitary when U † U = I, where I is the
identity matrix. Unitary matrices are the complex generalization of rotation matrices.
The reason why they are special is because U † is the inverse of U. Recall that the
inverse of a matrix is defined so that A−1 A = I. So, since, U † U = I, it follows that

U † = U −1 for unitary matrices. (4.138)

Inverting (which is usually a difficult operation) becomes trivial for unitary matrices.
The matrix WN itself is not unitary. But N11/2 WN is:
1 †
WN† WN = NIN → WN−1 = W . (4.139)
N N
This gives us an easy method to invert Eq. (4.136): f˜ = N1 WN† f , which is nothing but
the second equation in (4.132). The journey back is therefore as easy as the journey
forward (I wished it had been this easy for Frodo and Sam). Note also that even though
WT = W, it is not true that W† = W. That is, the matrix is not Hermitian.
We are now ready to describe the main idea behind the FFT algorithm. All the
algorithm does is compute WN f˜. It multiplies a vector by a matrix. This takes N 2
operations and, in principle, one would think that it is hard to do any better than that.
But the FFT can. The trick is recursion. The matrix WN is expressed in terms of
WN/2 . Then WN/2 is expressed in terms of WN/4 . And so on. This is possible because
WN is a very special matrix. First, since entries are ωnNj , they are displaced in a very
special way along the matrix. And second, since ωN = e−i2π/N , we have the very special
relation ωN/2 = ω2N .
Let us see how this factorization works. What we are interested in is computing the
matrix vector product
N−1
X
y = WN x, or yj = ωnNj xn . (4.140)
n=0

where x is a generic vector of size N (I write x, y instead of f˜, f to simplify the

notation a bit). The cool trick, discovered by Cooley and Tukey, is to decompose this

172
sum into even and odd terms:
X X
yj = ωnNj xn + ωnNj xn
n=0,2,4,... n=1,3,5,...

N N
−1 2 −1
2
X X
= ω2k j
N x2k + ω(2k+1)
N
j
x2k+1
k=0 k=0

We now use the fact that ω2N = ωN/2 to write this a s

N N
2 −1
X 2 −1
X
yj = ωkN/2
j
x2k + ωNj ωkN/2
j
x2k+1 .
k=0 k=0

If we stare at this for a second, we will realize that the resulting sums are nothing but
the application of WN/2 to the two smaller vectors, xe = (x0 , x2 , x4 , . . .) and xo =
(x1 , x3 , x5 , . . .). That is,

y j = WN x j = WN/2 xe j + ωNj WN/2 xo j .

(4.141)

This is the essence of the FFT: instead of computing the big product WN x, we compute
the smaller products WN/2 xe/o , associated with the even and odd components of x.
And, as you can probably imagine, we don’t stop there, but keep going recursively.
That is, instead of computing WN/2 xe directly, we split this further into two other
vectors xee = (x0 , x4 , x8 , . . .) and xeo = (x2 , x6 , x1 0, . . .), and apply WN/4 to them,
following the recipe in Eq. (4.141). And similarly for WN/2 xo .
FFT always uses vectors whose lengths are powers of 2. That is, N = 2` . If your
vector has a different length, the algorithm automatically increases it to the closest
power of 2 by padding with zeros. The first step is then to rearrang the vector x
into even/odd components, several times. For instance, it may produce something like
x = (xee , xeo , xoe , xoo ). Or it can go further; it depends on how many recursions
you want. The idea is to go down to a subvector which is small enough, so that the
application of W becomes very fast. This ordering operation is very fast. Once x
is reordered, the algorithm applies the small W to each, saves the result, and then
uses (4.141) recursively to reconstruct back the result. There are ` = ln2 N levels to
move and to reconstruct each level requires N/2 multiplications/additions like the one
in (4.141). The total cost is thus N2 ` = N2 ln2 N.

173
Chapter 5

Legendre Polynomials

This chapter discusses two related concepts: series solutions of ODEs and orthog-
onal polynomials. The prototypical example we are going to analyze is Legendre’s
differential equation
(1 − x2 )y00 − 2xy0 + n(n + 1)y = 0, (5.1)
where n = is a constant. This is a 2nd order, linear ODE but with non-constant coeffi-
cients. It is therefore more difficult to handle than the equations we solved in chapter 2.
This type of equation appears, as we will see, when dealing with Poisson or Laplace’s
equation in spherical coordinates. It therefore appears in electromagnetism and quan-
tum mechanics. For instance, do you remember those funny-looking atomic orbitals
that you learned in chemistry? It will turn out that they are directly related with the
solutions of Eq. (5.1).
The usual method for solving these equations is to try out a series solution; that is,
a solution of the form
∞
X
y(x) = a j x j = a0 + a1 x + a2 x2 + . . . , (5.2)
j=0

where a j are coefficients that we try to adjust. In general the sum will be infinite, so the
solution y(x) may be any kind of exotic function. Sometimes, however, the series trun-
cates, yielding a solution y(x) that is a polynomial in x. In the Legendre equation (5.1),
this will happen when the constants n are integers, n = 0, 1, 2, 3 . . .. The resulting

174
solutions are called Legendre polynomials, and the first few such polynomials are

P0 (x) = 1,

P1 (x) = x,
1 2
P2 (x) = (3x − 1),
2
1 3
P3 (x) = (5x − 3x), (5.3)
2
1
P4 (x) = (35x4 − 30x2 + 3),
8
1
P5 (x) = (63x5 − 70x3 + 15x),
8
1
P6 (x) = (231x6 − 315x4 + 105x2 − 5).
16
For instance, if you are bored, you can check by hand that P6 (x) is a solution of
Eq. (5.1) when n = 6, and so on. We will learn below a more sophisticated and general
way of doing this.
The Legendre polynomials satisfy a remarkable property. Namely, they form a set
of orthogonal functions in the interval [−1, 1]. That is, they satisfy
Z1
2
Pn (x)Pm (x)dx = δn,m . (5.4)
2n + 1
−1

If n , m, they integrate to zero. If n = m, they give this silly constant 2/(2n + 1). It is
easy to check this for, say, P1 and P2 . I recommend you do it, to get a feeling of what
is happening.
Eq. (5.4) should remind you of the orthogonality relations of sines, cosines and
complex exponentials that we learned in chapter 1. In that occasion, it was precisely the
orthogonality relations which allowed us to construct the Fourier series. That is, which
allowed us to express an arbitrary periodic function in terms of a linear combination of
sines and cosines. Here a similar logic will apply. That is, any function f (x) defined in
the interval x ∈ [−1, 1] can be expanded in Legendre polynomials as
∞
X
f (x) = cn Pn (x), (5.5)
n=0

where cn are coefficients that are determined just like in the Fourier case: we multiply
both sides of (5.5) by Pm (x) and integrate from -1 to 1. Due to the orthogonality (5.4),
we are then left with
Z1 ∞ Z1
X 2
dx f (x)Pm (x) = cn dxPn (x)Pm (x) = cm ,
n=0
2m + 1
−1 −1

175
or,
Z1
2n + 1
cn = Pn (x) f (x)dx. (5.6)
2
−1

This kind of procedure is identical to the one we used in the Fourier business, many
times. And this example serves to show that it is actually more general. It works
whenever we want to expand a function as a linear combination of a set of orthogonal
functions. As we will learn, there are in fact quite a few such sets, depending on the
interval in question and the types of properties one is interested in.
This chapter will therefore be centered on these two new concepts: series solutions
of ODEs and orthogonal polynomials. They are, to a great extent, generalizations of the
ideas that we already treated in previous chapters. First, series solutions offer a more
sophisticated method for solving harder ODEs. And second, orthogonal polynomials
generalize the idea of orthogonality of functions beyond sines and cosines. I think this
is a nice way of concluding the course: we spend the entire semester learning about
new ideas and methods. And now we finish it by realizing that these methods are just
the tip of the iceberg, and these core ideas can actually be extended much much further.

5.1 Series solutions of differential equations

We begin by outlining the series solution method, focusing for now on Legendre’s
equation (5.1). The idea is to plug the series ansatz (5.2) into (5.1) and write it as
∞
X
. . . x j = 0,

j=0

where {. . .} will be coefficients that depend on the a j . Since the functions x j are lin-
early independent, for y to be a solution for all x, each coefficient {. . .} must vanish
independently. This will then give us a relation between the a j ’s.
To execute this idea, we need to compute−2xy0 and (1− x2 )y00 , with y given by (5.2).
First,
∞
X
y0 = j a j x j−1 .
j=1

The sum in principle starts at 1 because the derivative of a0 is zero. But notice we
could also write it as starting at 0 since there is a factor j in there, which is zero when
j = 0. Playing around with the index of the sum is an important trick in this business.
So please make sure you understand this point. Now, what we actually want is −2xy0 ,
so we get
X∞
− 2xy0 = −2 j a j x j , (5.7)
j=0

where I already wrote the sum starting from 0. We see that multiplying by −2x replen-
ishes the missing x from the derivative. That is, the result is now already proportional
to x j . We will try to write all our sums as being proportional to x j .

176
Next we turn to the second derivative:
∞
X
y00 = j( j − 1)a j x j−2 .
j=2

The sum now starts at 2, but we could also written it starting from 0 if we wanted, since
j( j − 1) is zero when j = 0 or j = 1. From this we then compute
∞
X ∞
X
(1 − x2 )y00 = j( j − 1)a j x j−2 − j( j − 1)a j x j .
j=2 j=0

The last term is already proportional to x j , so I wrote the sum starting from 0. But the
first sum is still needs some adjustments. Since we want everything involving powers
of x j , we change variables from j to j0 = j − 2, only in this first sum:
∞
X ∞
X 0
j( j − 1)a j x j−2 = ( j0 + 2)( j0 + 1) a j0 +2 x j
j=2 j0 =0

We can now call j0 as j again, since this is a dummy variable (i.e., it is being summed
over). This allows us to combine the two sums as
∞ h
X i
(1 − x2 )y00 = ( j + 2)( j + 1) a j+2 − j( j − 1) a j x j . (5.8)
j=0

Plugging Eqs. (5.7) and (5.8) back into Eq. (5.1) finally yields
X∞ (h i )
( j + 2)( j + 1) a j+2 − j( j − 1) a j − 2 j a j + n(n + 1)a j x j = 0.
j=0

Or, simplifying a bit,

X∞ (h h i )
( j + 2)( j + 1) a j+2 − j( j + 1) − n(n + 1) a j x j = 0.
j=0

Equating each term to zero gives a recursion relation, specifying the a j+2 in terms of
a j:

j( j + 1) − n(n + 1)
a j+2 = a j, j = 0, 1, 2, 3, . . . . (5.9)
( j + 2)( j + 1)

This therefore fixes the even coefficients in terms solely of a0 , and the odd ones in
terms solely a1 . That is, given a0 , we use this to determine a2 , then a4 , then a6 , etc. For
instance,
1
a2 = − n(n + 1)a0 ,
2
1 1
a4 = − (n + 3)(n − 2)a2 = n(n + 1)(n + 3)(n − 2)a0 ,
12 24

177
etc. Similarly, given a1 , this determines a3 , then a5 and so on. The general solution
will thus have the form
" #
1 1
y(x) = 1 − n(n + 1)x2 + n(n + 1)(n − 2)(n + 3)x4 − . . . a0 (5.10)
2 4!
" #
1 1
+ x − (n − 1)(n + 2)x + (n − 1)(n + 2)(n − 3)(n + 4)x − . . . a1 .
3 5
3! 5!

The values of a0 and a1 are undetermined, leaving us with two constants to adjust, as
befits a 2nd order ODE. We have therefore found the two independent solutions. One
is even in x, while the other is odd. I know they may look a bit messy. But it is quite
remarkable that we can actually write them down explicitly. After all, it is not an easy
ODE we are dealing with.
Several questions still remain, though. First, is the solution (5.10) unique? Yes.
We have found two linearly independent solutions to a linear 2nd order ODE. It must
therefore be unique. Second, does the solution converge? We can analyze this using
the ratio test. A sequence ∞j=0 u j converges absolutely when lim j→∞ |u j+1 /u j | < 1. In
P

our case, we have to test the sequence ∞j=0 a j x j (that is, u j = a j x j ). From Eq. (5.9),
P
we have that
j( j + 1) − n(n + 1) x2 = x2 ,

a j+2 x j+2

lim = lim
j→∞ a j x j ( j + 2)( j + 1)
j→∞
since the limit of the j quantities is 1. Thus, we see that the series converges only for
x2 < 1. For x2 = 1, in general, the series does not converge. The limits of validity of
the solution are thus
− 1 < x < 1. (5.11)
Quite often, in applications, we will see that Legendre’s equation appear with x being
the cosine of something. That is, x = cos θ. The restriction to have x2 < 1 thus turns
out to be ok.
Fig. 5.1 shows a plot of the solution (5.10) for (a0 , a1 ) = (1, 0) and (a0 , a1 ) = (0, 1),
with fixed n = 4.5. In the interval x ∈ [−1, 1] the function is well behaved, and wiggles
around happily. But as soon as we leave this interval, it diverges quickly. The table on
image (b) illustrates this divergence, by showing the values of y(x) at x = 1.05 for the
solution (5.10), but going only up to a maximum value jmax .

Legendre polynomials
The solution (5.10) is not a polynomial. Even though it is a power series in x, the
series is infinite and therefore it may sum up to look like any kind of function. But
something quite special happens if n is an integer. Looking at Eq. (5.9), we see that in
this case one of the series truncates. For instance, suppose n = 4. Then, when j = 4,
the numerator will cancel, leading to a6 = 0. But since a6 = 0, we will also have a8 = 0
and then a10 = 0 and so on. The only even coefficients for n = 4 will thus be a0 , a2 and
a4 . Conversely, the odd series does not truncate, because the numerator never vanishes.
Thus, if n is even, the solution proportional to a0 in (5.10) will be a finite polyno-
mial and the other will be an infinite series. And if n is odd, the solution proportional

178
�
(�) (�)
��
� ��
��
� ��

�
��
�

-�
-��
-�
-�� -�� -�� -��
� �

Figure 5.1: Solution (5.10) of Legendre’s equation for n = 4.5. (a) (a0 , a1 ) = (1, 0) and (b)
(a0 , a1 ) = (0, 1). The table in (b) illustrates the values of the solution at x = 1.05,
but summing only up to a certain maximum value jmax . If we were to sum (5.10) to
infinity, outside [−1, 1], we would simply get infinity.

��

��
��

��

��
��
-�� -�� -�� -�� -��
-�� -�� -�� -�� -��
-��-�� -��-�� -��-�� -��-�� -��-��
� � � � �
��
��
��

��

��
��
-�� -�� -�� -�� -��
-�� -�� -�� -�� -��
-��-�� -��-�� -��-�� -��-�� -��-��
� � � � �

Figure 5.2: The first few Legendre polynomials in the interval [−1, 1].

to a1 will be a polynomial, and the other will not. The family of polynomials this
generates are the Legendre polynomials.
For instance, if n = 4 the polynomial will have the form

P4 (x) = a0 1 − 10x2 + 35x4 /3 .

The value of a0 is arbitrary and depends on the convention that is adopted. Usually, we
choose a0 so that P4 (1) = 1. This then gives

P4 (1) = a0 (8/3) = 1, → a0 = 3/8.

The 4th Legendre polynomial is thus

1
P4 (x) (3 − 30x2 + 45x4 ).
8
You can check this in Mathematica using LegendreP[4,x]. The first few Legendre
polynomials are shown in Eq. (5.3) and drawn in Fig. 5.2.

179
Sturm-Liouville theory
Looking back at Eq. (5.1), we can also write the first two terms as
dh di
(1 − x2 )y00 − 2xy0 = (1 − x2 ) y.
dx dx
Let us then define the differential operator
dh di
L= (1 − x2 ) , (5.12)
dx dx
so that (5.1) is converted into
L(y) = −n(n + 1)y. (5.13)
This is now starting to look like eigenstuff business: an operator times a function is
suppose to be a number times the same function. In fact, it can be shown that the
eigenvalue/eigenfunction equation
L(y) = −λy, (5.14)
together with the added assumption that the solution should be regular at x = ±1, has
eigenfunctions which are precisely the Legendre polynomials Pn (x), and eigenvalues
λ = n(n + 1). This type of scenario is usually called a Sturm-Liouville problem. A
generic Sturm-Liouville problem is of the form
dh di
p(x) y + q(x)y = −λw(x)y, (5.15)
dx dx
for given functions p(x), q(x) and w(x). One must also specify the interval of interest,
x ∈ [a, b], and boundary conditions have to be specified at a and b. The problem is
then to find the allowed solutions y(x), together with the eigenvalues λ.
For instance, consider the ODE
y00 = −λy, y(0) = y(L) = 0, (5.16)
which we studied exhaustively in Chapter 3. This is a Sturm-Liouville problem, with
p(x) = w(x) = 1 and q(x) = 0. Indeed, we saw that the eigenfunctions were y(x) =
sin(kx), with eigenvalues λ = k2 , where k = π`/L. Sturm-Liouville problems form an
important part in the Mathematics of differential equations, as there are many applica-
tions which can be placed under this category. Unfortunately, we will not have the time
to discuss the general theory in much detail, but will have to focus only some particular
examples.

5.2 Rodrigues’ formula

In order for us to gain a more solid understanding of Legendre polynomials, we
have to develop a toolbox of techniques and methods for dealing with them at a deeper
level. All methods we are going to develop in this section and the next, however, turn
out to hold also for other families of polynomials, which we will discuss later on. Thus,
I ask that you try to distinguish between what is a general method and what is a result
specific for the Legendre case.

180
Leibniz’ rule for differentiating products
Before we start, we must first briefly discuss a formula that will be quite useful in
what follows. Let u(x) and v(x) be two arbitrary functions. Then
d
(uv) = (∂ x u)v + u(∂ x v).
dx
Similarly, This becomes even clearer if we use ∂ x for the derivatives:

d2
(uv) = (∂2x u)v + 2(∂ x u)(∂ x v) + u(∂2x v).
dx2
You can see here a kind of binomial shape taking place:

(a + b)2 = a2 + 2ab + b2 .

But instead of powers, we have derivatives. We can continue this and prove, by induc-
tion, that
n
dn
!
X n j
(uv) = (∂ x u)(∂n− j
x v), (5.17)
dxn j=0
j

n
which is called Leibniz’ rule for the derivative of a product. Here j = n!
j!(n− j)! is the
binomial coefficient.

Rodrigues formula
We will now use Leibniz’ rule to find a more systematic way of generating the Leg-
endre polynomials, which will make their analytical properties much more transparent.
Namely, we will now show that
1 dn h 2 i
Pn (x) = n n
(x − 1)n .
2 n! dx
We are going to prove this in two ways. First, we will simply check that it indeed
matches the polynomials in Eq. (5.3). And second, we will show that this indeed
satisfies Legendre’s differential equation (5.1).
To compare with (5.3), we use Leibiniz’ rule (5.17), by first factoring x2 − 1 =
(x − 1)(x + 1). This then leads to
n !
1 X n j
∂ x (x − 1)n ∂n−
j
Pn (x) = n x (x + 1) .
n
2 n! j=0 j

The first derivative is

n!
∂ xj (x − 1)n = n(n − 1) . . . (n − j + 1)(x − 1)n− j = (x − 1)n− j .
(n − j)!

181
And the second is
n!
∂n− j
x (x + 1) = n(n − 1) . . . (n − (n − j) + 1)(x + 1)
n n−(n− j)
= (x + 1) j .
j!
I know these two formulas are a bit nasty. It helps if you think about specific examples,
n n = 4 and j = 2 or something. In any case, with these two results, and using also
like
j = j!(n− j)! , we finally get
n!

n !2
1 X n
Pn (x) = (x − 1)n− j (x + 1) j .
2n j=0 j

This provides an explicit, and very useful formula for computing any Legendre polyno-
mial. I leave it to you to verify that if we plug specific values of n, we get for instance
the polynomials in Eq. (5.3).

Rodrigues’ formula for the Legendre polynomials

The Legendre polynomials can be constructed using Rodrigues’ formula

1 dn h 2 i
Pn (x) = (x − 1)n . (5.18)
2n n! dx n

To evaluate the high order derivatives we can use Leibniz’ rule

n
dn
!
X n
(uv) = (∂ xj u)(∂n− j
x v),
dxn j=0
j

which leads to the explicit expression

n !2
1 X n
Pn (x) = (x − 1)n− j (x + 1) j . (5.19)
2n j=0 j

The factor of 1/2n n! is chosen in (5.18) to make Pn (1) = 1 (which is just a

convention). That this is indeed correct can be seen directly from Eq. (5.19)
since, in this case, the only non-zero term is that with j = n.

Next, we verify that Eq. (5.18) is indeed a solution of Legendre’s differential equa-
tion (5.1). Except for a numerical factor, the Legendre polynomials in Eq. (5.18) are
essentially the n-th derivative of the function ψ(x) := (x2 − 1)n . That is,
1 dn ψ
Pn (x) = . (5.20)
2n n! dxn
But since ψ0 = 2nx(x2 − 1)n−1 , this function satisfies the cute property

(x2 − 1)ψ0 = 2nxψ. (5.21)

182
We will now obtain Legendre’s equation by differentiating both sides of this expression
n + 1 times:
dn+1 h 2 i dn+1
n+1
(x − 1)ψ0 = 2n n+1 (xψ). (5.22)
dx dx
We differentiate n + 1 times because Pn is the n-th derivative of ψ, plus Eq. (5.1) is
a second order ODE, while (5.21) still has only one derivative. Differentiating n + 1
times must, therefore, give us a 2nd order ODE for Pn . And then we can verify that
this ODE is indeed (5.1).
To actually carry out the computation, we use Leibniz’ rule (5.17) again. We start
with the right-hand side of (5.22), using u = x and v = ψ:
n+1
dn+1 n+1
X !
(xψ) = (∂ xj x)(∂n+1−
x
j
ψ).
dxn+1 j=0
j

The sum goes now up to n + 1, instead of n, since we want the (n + 1)-th derivative. But
lucky for us, the first derivative, ∂ xj x, will be non-zero only for j = 0 or j = 1. Whence
dn+1 (n + 1)! (n + 1)!
(xv) = x ψ) +
(x)(∂n+1 (1)(∂nx ψ).
dx n+1 0!(n + 1)! 1!(n + 1 − 1)!
Or, simplifying,
dn+1
(xψ) = xψ(n+1) + (n + 1)ψ(n) . (5.23)
dxn+1
Next we do the same for the left-hand side of (5.22), with u = (x2 − 1) and v = ψ0 :
n+1
dn+1 h 2 n+1 j 2
X !
(x − 1)ψ ] =
0
∂ x (x − 1) (∂n+1− j 0
ψ ).

x
dxn+1 j=0
j

The first term is now non-zero only for j = 0, 1, 2:

dn+1 h 2 (n + 1)! 2 (n + 1)! (n + 1)!
(x −1)ψ0 ] = x ψ )+
(x −1)(∂n+1 0
(2x)(∂nx ψ0 )+ x ψ )
(2)(∂n−1 0
dxn+1 0!(n + 1)! 1!(n + 1 − 1)! 2!(n + 1 − 2)!
Or, simplifying,
dn+1 h 2
(x − 1)ψ0 ] = (x2 − 1)ψ(n+2) + 2x(n + 1)ψ(n+1) + n(n + 1)ψ(n) . (5.24)
dxn+1
Inserting this and (5.23) in Eq. (5.22) then yields
(x2 − 1)ψ(n+2) + 2x(n + 1)ψ(n+1) + n(n + 1)ψ(n) = 2nxψ(n+1) + 2n(n + 1)ψ(n) .
Some stuff cancels, leaving us with
(x2 − 1)ψ(n+2) + 2xψ(n+1) − n(n + 1)ψ(n) = 0.
And we are done! To get back to Pn we just multiply by 1/2n n!, as in Eq. (5.20), which
then gives
(x2 − 1)P00n + 2xP0n − n(n + 1)Pn = 0.
This is exactly Eq. (5.1), but with the signs flipped.

183
Orthogonality of the Legendre polynomials
A major application of Rodrigues’ formula is to prove that the Legendre poly-
nomials are orthogonal. In fact, it turns out, the Legendre polynomials are not only
orthogonal among themselves. They are also orthogonal with respect to any monomial
xm , with m < n. That is, we will now show that
Z1
Pn xm dx = 0, m < n. (5.25)
−1

If this is true, then it is also immediately true that

Z1
Pn Pm (x)dx = 0, m , n. (5.26)
−1

To prove (5.25) first write

Z1 Z1
1 dn ψ m
Pn x dx = n
m
x dx,
2 n! dxn
−1 −1

and then we integrate by parts, transferring the derivatives from ψ to xm . Conveniently,

all boundary terms vanish since ψ = (x2 − 1)n is zero when x = ±1. Thus, each time we
integrate by parts, we transfer one derivative, plus a minus sign. We are thus left with
Z1 Z1
1 dn ψ m (−1)n d n xm
x dx = n ψ dx.
2n n! dx n 2 n! dxm
−1 −1

This derivative will now be identically zero whenever m < n. Hence, we just proved (5.25).
From this, we then know that Pn and Pm are orthogonal [Eq. (5.26)] when m , n.
The last thing we need to do is to compute the value of the integral when m = n; that
is,
Z1 Z1 n n
1 d ψd ψ
Pn (x)dx = n 2
2
dx.
(2 n!) dxn dxn
−1 −1
Integrating by parts, as before, we can write this as
Z1 Z1
(−1)n d2n ψ
P2n (x)dx = n 2 ψ dx.
(2 n!) dx2n
−1 −1

This derivative is now easy: ψ = (x − 1) , so if we differentiate 2n times, all terms will

2 n

vanish, except the first term x2n in the binomial expansion, which will give a factor of
(2n)!. Thus
Z1 Z1
(−1)n (2n)!
Pn (x)dx =
2
(x2 − 1)n dx.
(2n n!)2
−1 −1

184
Finally, we compute this remaining integral again using integration by parts. We
first write it (x2 − 1)n = (x − 1)n (x + 1)n and identify u = (x − 1)n and dv = (x + 1)n dx.
The boundary terms again vanish and we get

Z1 Z1
n
(x − 1) (x + 1) dx = −
n n
(x − 1)n−1 (x + 1)n+1 dx.
n+1
−1 −1

Integration by parts transfer one of the powers, from x − 1 to x + 1. We do this a total

of n times. In the numerator, we get a series of factors n(n − 1)(n − 2) . . . 1. And in the
denominator we get (n + 1)(n + 2) . . . 2n. We may thus write this as

Z1 Z1
(n!)2 (n!)2 22n+1
(x − 1) (x + 1) dx = (−1)
n n n
(x + 1)2n dx = (−1)n .
(2n)! (2n)! (2n + 1)
−1 −1

Whence
Z1 2
(−1)n (2n)! n (n!) 22n+1
P2n (x)dx = (−1) .
(2n n!)2 (2n)! (2n + 1)
−1

Or, simplifying,
Z1
2
P2n (x)dx = . (5.27)
2n + 1
−1

This completes the analysis of the orthogonality of the Legendre polynomials.

When m = n, their inner product is zero. And when n = m, we get this funny fac-
tor of 2/(2n + 1). This therefore leads to the orthogonality relation in Eq. (5.4).

5.3 Generating function

Consider the function

1
G(x, λ) = √ . (5.28)
1 − 2xλ + λ2

This is a function of two variables. If we expand it in a Taylor series in λ, the coeffi-

cients of the expansion will still be a function of x:

G(x, λ) = q0 (x) + q1 (x)λ + q2 (x)λ2 + . . . . (5.29)

We are going to show that these coefficients turn out to be exactly the Legendre poly-
nomials; qn (x) ≡ Pn (x). For this reason G is called the generating function of the
Legendre polynomials. If you think about it, this is quite impressive: all polynomials
can be generated from this very simple formula, by a simple Taylor expansion. The

185
variable λ does not need to have a particular physical meaning (although sometimes it
does). It is simply an auxiliary variable for obtaining the polynomials.
The way to prove this is quite fun. We are going to first obtaining a partial differ-
ential equation for G in terms of x and λ. We then try to solve this PDE by the series
method, expanding G as in Eq. (5.29). What we are going to find is that each coefficient
qn (x) will have to be a solution of Legendre’s equation (5.1) for a different value of n.
To cook up our PDE, let us first play with the partial derivatives of G. From (5.28)
we have that
∂G λ
= = λG3 , (5.30)
∂x (1 − 2xλ + λ2 )3/2
∂2 G 3λ2
= = 3λ2G5 , (5.31)
∂x2 (1 − 2xλ + λ2 )5/2
and
∂G x−λ
= = (x − λ)G3 , (5.32)
∂λ (1 − 2xλ + λ2 )3/2
∂2 G 1 3(x − λ)2
= − + = −G3 + 3(x − λ)2G5 . (5.33)
∂λ2 (1 − 2xλ + λ2 )3/2 (1 − 2λx + λ2 )5/2
In this last term, we may also write
(x − λ)2 = x2 − 2xλ + λ2 = (x2 − 1) + (1 − 2xλ + λ2 ) = (x2 − 1) + G−2 .
Thus,
∂2G
= 3(x2 − 1)G5 + 2G3 . (5.34)
∂λ2
We now try to combine all these derivatives in a clever way.
For instance, we can start by trying to match the coefficients that involve G5 .
From (5.31) and (5.34) we see that
∂2 G 2∂ G
2
(1 − x2 ) + λ = 2λ2G3
∂x2 ∂λ2
so we kill the terms with G5 . We now only need to combine ∂G/∂x and ∂G/∂λ in a
way that yields −2λ2G3 . The way to do so is as
∂G ∂G
−2x + 2λ = −2λ2G3 .
∂x ∂λ
Thus, adding the two will finally give zero:
∂2 G ∂G ∂G 2∂ G
2
(1 − x2 ) − 2x + 2λ + λ = 0.
∂x2 ∂x ∂λ ∂λ2
This is a PDE satisfied by G, in terms of x and λ. The last two terms, in particular, can
also be written in a more compact way, leading to
∂2 G ∂G ∂2
(1 − x2 ) − 2x + λ 2 (λG) = 0. (5.35)
∂x 2 ∂x ∂λ

186
We now try to solve this using the power series method, similar to what we did in
Sec. 5.1. But since we have two variables, we attempt a series solution only in λ, just
like in Eq. (5.29). And then see what happens for the coefficients qn (x). The first two
terms in (5.35) are easy to handle. Since the derivatives are with respect to x, they act
only on the qn ’s. That is,
∞
∂2 G ∂G X n o
(1 − x2 ) − 2x = (1 − x2 )q00n − 2xq0n λn .
∂x 2 ∂x n=0

As for the last term in (5.35), we have

∞ ∞
∂2 ∂2 X X
λ (λG) = λ qn λ n+1
= n(n + 1)qn λn .
∂λ2 ∂λ2 n=0 n=0

Thus, combining everything we finally find that

∞
∂2 G ∂G ∂2 Xn o
(1 − x2 ) − 2x + λ (λG) = (1 − x 2 00
)q − 2xq 0
+ n(n + 1)qn λn = 0. (5.36)
∂x2 ∂x ∂λ2 n=0
n n

This is pretty cool eh? We see that the coefficients are each given precisely by Legen-
dre’s equation (5.1), for integer values of n. Hence, the solutions must be the Legendre
polynomials, qn (x) = Pn (x).
There is one final detail we must address. The Legendre polynomials are normal-
ized so that Pn (1) = 1. And we must check that G is indeed giving this. On the one
hand,
G(1, λ) = q0 (1) + q1 (1)λ + q2 (1)λ2 + . . . .
On the other, if we use (5.28) we get that
1
G(1, λ) = = 1 + λ + λ2 + . . . ,
1−λ
which is nothing but the Geometric series. Comparing the two formulas we then find,
indeed, that qn (1) = 1. Thus, in all details, the qn are exactly the Legendre polynomials.

Generating function for the Legendre polynomials

The generating function for the Legendre polynomials is

∞
1 X
G(x, λ) = √ = Pn (x)λn . (5.37)
1 − 2xλ + λ2 n=0

where polynomials are generated from expanding G in a power series in λ.

Moreover, G satisfies the PDE
∞
∂2 G ∂G ∂2 Xn o
(1 − x2 ) − 2x + λ (λG) = (1 − x 2 00
)q − 2xq 0
+ n(n + 1)q n λn = 0.
∂x2 ∂x ∂λ2 n=0
n n

(5.38)

187
Each term in the sum is given exactly by Legendre’s equation (5.1) for integer
n.

Recurrence relations
One of the major applications of the generating function is in finding recurrence
relations between the Legendre polynomials. Recurrence relations are a little bit like
trigonometric identities. They are relations between the Pn which can be used to sim-
plify the calculations. For instance, one such recursion relation is

(n + 1)Pn+1 = (2n + 1)xPn − nPn−1 , (5.39)

known as Bonnet’s recursion formula. This is not at all obvious, right? And it is also
super useful because it provides a very easy method for constructing the polynomials
systematically: if we know P0 = 1 and P1 = x, we can use this to construct P2 , then P3
and so on.
There is no single general recipe for deriving relations like (5.39). But usually,
the idea is to play around with the generating function and see if you can find useful
relations. For instance, we can prove (5.39) starting with Eq. (5.32) and rewriting it as

∂G
(1 − 2xλ + λ2 ) = (x − λ)G. (5.40)
∂λ
We now plug the expansion (5.37) on both sides, which leads to
∞
X ∞
X
(1 − 2xλ + λ2 ) nPn λn−1 = (x − λ) Pn λn .
n=0 n=0

To get the recursion relation (5.39), we now just need to write everything under the
same power of λn . This can be done by manipulating the indices of the sums, just like
we did in Sec. 5.1. I will leave it for you as an exercise.

Recurrence relations

Here is a list of some useful recurrence relations for Legendre polynomials:

(n + 1)Pn+1 = (2n + 1)xPn − nPn−1 , (5.41)

(x2 − 1)P0n = nxPn − nPn−1 , (5.42)

P0n+1 = (n + 1)Pn + xP0n , (5.43)

(2n + 1)Pn = P0n+1 − P0n−1 (5.44)

188
5.4 Multipole expansion of Poisson’s equation
We now finally arrive at the first big application of Legendre polynomials. In
Sec. 4.6 we discussed the solution of Poisson’s equation

− ∇2 φ = ρ(r), (5.45)

where ρ is the external source and r = (x, y, z). We saw that the solution could be
written as Z
φ(R) = d3 r G(R − r)ρ(r), (5.46)

where G is the Green’s function

1
G(R − r) = . (5.47)
4π|R − r|
I’m using G in order not to confuse with the characteristic function G in Eq. (5.28).
Eq. (5.46) specifies how the solution at point R is affected by the source acting on point
r. The Green’s function (5.47) could thus be viewed, for instance, as the electrostatic
potential generated by a point charge acting at position r.
Let us write
p √
|R − r| = (R − r) · (R − r) = R2 − 2R · r + r2 ,

where r = |r| and R = |R|. Moreover, let θ denote the angle between the vectors r and
R, so that R · r = rR cos θ. We may then write
√ p
|R − r| = R2 − 2Rr cos θ + r2 = R 1 − 2(r/R) cos θ + (r/R)2 .

Whence, Green’s function (5.47) can be written as

1 1
G(R − r) = . (5.48)
4πR 1 − (r/R) cos θ + (r/R)2
p

If we now stare at this for a second, we realize that the square-root part is nothing
but the characteristic function for Legendre polynomials in Eq. (5.37), provided we
identify λ = r/R and x = cos θ. That is
1
G(R − r) = G(cos θ, r/R).
4πR
A series expansion in powers of Eq. (5.48) will then lead to

∞
1 X
G(R − r) = (r/R)n Pn (cos θ). (5.49)
4πR n=0

189
Figure 5.3: Very often one is interested in the potential generated by a localized source ρ(r), at
a distant point R.

This should then to be plugged back into the solution (5.46), leading to
ρ(r)
Z
1
φ(R) = d3 r p
4πR 1 − 2(r/R) cos θ + (r/R)2
∞ Z
1 X
= d3 rρ(r) (r/R)n Pn (cos θ), (5.50)
4πR n=0

which is a series expansion solution of Poisson’s equation.

To understand the significance of this result it helps to have Fig. 5.3 in mind. Very
often we are interested in a situation where the source is localized in some region of
space, and we only want the solution at distant points R. Eq. (5.49) is a series in r/R,
so high n terms will be small when R r. The first term in (5.50) has P0 = 1 and
(r/R)0 = 1. Hence, it reads only
Z
1 Q
d3 r ρ(r) := .
4πR 4πR

This integral is nothing but the total charge Q = d3 r ρ(r). Thus, infinitely far away,
R

any charge distribution looks like a point charge:

1
φ(R) = + ....
4πR
The other terms in Eq. (5.50) offer corrections to this behavior, when the distance
is not insanely huge, but still pretty big. The first correction is with n = 1. And since
P1 (x) = x, it reads
Z Z
1 1
d r ρ(r)(r/R) cos θ =
3
d3 r ρ(r) (r · R̂),
4πR 4πR2

190
where, to make the dependence on r more explicit, I wrote cos θ = (r · R̂)/r, with
R̂ = R/R being the unit vector in the direction of R. The quantity
Z
p= d3 r ρ(r) r, (5.51)

is called the dipole moment of the charge distribution1 Up to first order in (r/R), the
potential (5.50) will thus be
Q p · R̂
φ(R) = + + .... (5.52)
4πR 4πR2
Quite often, we can have distributions ρ(r) where the net charge Q is zero. This hap-
pens if the distribution is neutral, but not necessarily uniform. For instance, when it is
composed of two charges +q and −q. The first term in (5.52) will vanish in this case.
But, as we can see, there will still be an electrostatic potential, although it will now fall
as 1/R2 instead of 1/R (the field will thus fall as 1/R3 ). A non-uniform distribution
will hence still generate an electrostatic potential. But it falls faster with the distance.
Eq. (5.50) goes by the name of multipole expansion. The first term is the monopole
and the second is the dipole. The next term, associated with n = 2, will be associated
with the quadrupole moment. In this case P2 = (3x2 − 1)/2 so this term will have the
form

2 3 cos θ − 1
Z 2 Z 2
1 1 1 2 (r · R̂)
h i
d r ρ(r)(r/R)
3
= d 3
r ρ(r)r 3 − 1 .
4πR 2 4πR3 2 r2
The quadrupole cannot be associated to a single vector, like the dipole in Eq. (5.51).
Instead, it is associated with a tensor,
Z
Ti j = d3 r ρ(r) (ri r j − r2 δi j ). (5.53)

In fact, I will leave it for you as an exercise to check that the n = 2 contribution is
written as
1 1X
Ti j Ri R j . (5.54)
4πR3 2 i j
The solution, up to n = 2, will thus be
Q 1 X 1 1X
φ(r) = + pi Ri + Ti j Ri R j + . . . . (5.55)
4πR 4πR2 i 4πR3 2 i j

You can probably see a pattern emerging: The first term is a scalar operation in R. The
second is the contraction of R with a vector (which is a rank 1 tensor). The next is a
contraction with a rank 2 tensor and so on. There are some systems for which both the
monopole and dipole terms vanish, and the first contribution is the quadrupole. This
happens, for instance, in some types of liquid crystals and is associated with their
shapes and composition.
1 You may have seen the dipole moment in the case of a system with only two charges, +q and −q. To

obtain that, we can simply write ρ(r) = qδ(r − r1 ) − qδ(r − r1 ). The Dirac deltas kill the integral, leaving
us with p = q(r1 − r2 ), which is the more familiar formula for the dipole moment.

191
5.5 Associated Legendre functions
Next section, when we study the Laplacian in spherical coordinates, the following
ODE is going to show up:

m2
" #
(1 − x )y − 2xy + `(` + 1) −
2 00 0
y = 0, (5.56)
1 − x2

where m is an integer. This is called the associated Legendre equation. And, as the very
name hints us, it turns out to be very associated with Legendre’s equation (5.1). First,
clearly if m = 0 we recover (5.1). Next, for m , 0, define a new variable u according
to
y = (1 − x2 )m/2 u. (5.57)
Plugging this in Eq. (5.56) then leads, after a bit of algebra/patience to
h i
(1 − x2 )u00 − 2(m + 1)xu0 + `(` + 1) − m(m + 1) u = 0. (5.58)

This is looking even more like Legendre’s equation (5.1). In fact, again, if m = 0, we
get exactly that.
The solutions of (5.62) are not Legendre polynomials. Instead, as we now show,
they are derivatives of the Legendre polynomials:
dm
u= P` (x).
dxm
To verify that, we start with Legendre’s equation

(1 − x2 )∂2 P` − 2x∂P` + `(` + 1)P` = 0,

where I’m using ∂P` = dP `

dx , just so we can explore a bit different notations. We now
differentiate this m times:
h i h i
∂m (1 − x2 )∂2 P` − 2∂m x∂P` + `(` + 1)∂m P` = 0. (5.59)

Using Leibniz’s rule (5.17) we get

h i
∂m (1 − x2 )∂2 P` = (1 − x2 )∂m+2 P` − 2mx∂m+1 P` − m(m − 1)∂m P` , (5.60)
h i
∂m x∂P` = x∂m+1 P` + m∂m P` . (5.61)

Combining these two quantities in Eq. (5.59) leads to

h i
(1 − x2 )∂m+2 P` − 2(m + 1)x∂m+1 P` + `(` + 1) − m(m + 1) ∂m P` = 0,

which is exactly Eq. (5.62). This therefore proves that the solutions are, indeed, deriva-
tives of the Legendre polynomials.

192
Associated Legendre functions

One solution of the associated Legendre equation

m2
" #
(1 − x )y − 2xy + `(` + 1) −
2 00 0
y = 0, (5.62)
1 − x2

is
dm
2 m/2
Pm
` (x) := (1 − x ) P` (x), (5.63)
dxm
which is called an associated Legendre function. Since P` is a polynomial
of degree `, these solutions only make sense for |m| 6 `. Moreover, since
Eq. (5.62) involves only m2 , Eq. (5.63) is also a solution for m < 0. That is,
` = P` . But please note that some sources, including Mathematica, adopt
P−m m

a different convention, where P−m m

` is only proportional to P` , with a constant
that depends on both ` and m.

Many people call these functions the associated Legendre polynomials; but, as you
may have notice, if m is odd, these will not be polynomials. So the term “functions” is
a bit more precise.
Here is a list of the first few associated Legendre functions

P00 = 1,
√
P01 = x, P11 = − 1 − x2 ,
3x2 − 1 √
P02 = , P12 = −3x 1 − x2 , P22 = 3(1 − x2 ),
2
5x3 − 3x 3√
P03 = P13 = 1 − x2 (1 − 5x2 ), P23 = 15x(1 − x2 ), P33 = −15(1 − x2 )3/2 .
2 2
The associated Legendre functions are not mutually orthogonal. For instance, P11 is not
orthogonal to P22 . But some subsets turn out to be. For instance, one may show that

Z1
2(` + |m|)!
` (x)Pn (x) =
dx Pm δ`,n ,
m
(5.64)
(2` + 1)(` − |m|)!
−1

with the same m in both. I put |m| on the right-hand side to also contemplate the case
m < 0. Similarly, one may show that they also satisfy
Z1
Pm k
` P` (` + |m|)!
dx = δm,k . (5.65)
1 − x2 |m|(` − |m|)!
−1

That is, now with the same ` in both, but varying the upper index. This results only
holds for m, k , 0. If m = k = 0, the integral diverges.

193
Figure 5.4: Spherical coordinates.

5.6 Laplacian in spherical coordinates

Legendre things appear when we treat PDEs in spherical coordinates. The coor-
dinates are, in this case, the vector length r and two angles, θ and φ, defined as in
Fig. (5.4). The relation with Cartesian coordinates is
q
x = r sin θ cos φ, r = x2 + y2 + z2 ,
z
y = r sin θ sin φ, tan θ = p , (5.66)
x 2 + y2
y
z= r cos θ, tan φ = ,
x
and the angles vary in the intervals θ ∈ [0, π) and φ ∈ [0, 2π]. The unit vectors in
spherical coordinates are

ẽr = sin θ cos φe x + sin θ sin φey + cos θez ,

ẽθ = cos θ cos φe x + cos θ sin φey − sin θez (5.67)

ẽφ = − sin φe x + cos φey .

194
And one has the following formulas for the gradient, divergence and Laplacian in
spherical coordinates:
∂f 1 ∂f 1 ∂f
∇ f = er + eθ + eφ , (5.68)
∂r r ∂θ r sin θ ∂φ
1 ∂(vr r2 ) 1 ∂(vθ sin θ) 1 ∂vφ
∇·v = + + , (5.69)
r 2 ∂r r sin θ ∂θ r sin θ ∂φ
1 ∂ 2∂f ∂ ∂f ∂2 f
! !
1 1
∇2 f = 2 r + 2 sin θ + (5.70)
r ∂r ∂r r sin θ ∂θ ∂θ r2 sin2 θ ∂φ2
Unfortunately I will not have time to properly derive these formulas. You may look,
for instance, at Cahill, Sec. 6.4.
The angular part of the Laplacian is very important and it is worth defining the
differential operator

1 ∂ ∂ 1 ∂2 f
" ! #
L̂ = −
2
sin θ + , (5.71)
sin θ ∂θ ∂θ sin2 θ ∂φ2

so that the Laplacian in spherical coordinates can be written as

1 ∂ 2∂f
!
1
∇ f = 2
2
r − 2 L̂2 f. (5.72)
r ∂r ∂r r

In quantum theory, L̂2 turns out to be associated with the angular momentum operator.
Proving this is a bit nasty; you will do it when you study Quantum Mechanics properly.
The idea is to start with the definition of the angular momentum vector, L = r × p, and
upgrade p to an operator, p = −i~∇. One may then verify that L · L = ~2 L̂2 , with L̂2
being the operator in (5.71). You can give it a go. It is a fun exercise. What it means is
that, except for a factor of ~2 , this operator L̂2 is nothing but the square of the angular
momentum. Notwithstanding this interpretation, introducing this separation between
an angular and a radial part turns out to also be quite convenient, as we will see.

Helmholtz equation in spherical coordinates

Consider Helmholtz’ equation

∇2 f = −k2 f, (5.73)

where k is a constant. In spherical coordinates this becomes

1 ∂ 2∂f
!
1
∇2 f = 2 r − 2 L̂2 f = −k2 f. (5.74)
r ∂r ∂r r
We attempt a solution by separation of variables, writing

f (r, θ, φ) = R(r)Y(θ, φ),

195
for unknown functions R and Y. The important thing to notice is that L̂2 acts only on
the angular part, since it only contain derivatives with respect to θ and φ. Hence, we
find !
Y d 2 dR R
r − 2 L̂2 Y = −k2 RY.
r2 dr dr r
Multiplying both sides by r2 /(RY) leads to
!
1 d 2 dR 1
r + k2 r2 = L̂2 Y.
R dr dr Y

The left-hand side is now only a function of r, while the right-hand side is only a
function of θ and φ. This can thus only happen if both quantities are a constant. For
reasons that will become clear in the next section, we label this constant as `(` + 1).
That is, we write

L̂2 Y = `(` + 1)Y, (5.75)

!
d 2 dR
r + k2 r2 R = `(` + 1)R. (5.76)
dr dr

These are called the angular and radial equations, respectively.

At this point, ` can in principle be anything. But we see that Eq. (5.75) is exactly
an eigenthingy equation for the operator L̂2 . And, as we are going to show in the next
section, the eigenvalues of L̂2 are exactly `(` + 1) with ` = 0, 1, 2, . . . being a non-
negative integer. Proving the eigenstuff of L̂2 will actually be a big business, since
this kind of operator appears in many different contexts. We will therefore dedicate the
entire next section to it. But before doing so, let us take this result for granted and play
a bit with the radial equation (5.76).

Radial equation
Consider first the particular case where k = 0. That is, where Helmholtz’ equation
reduces to Laplace’s equation, ∇2 f = 0. The radial equation (5.76) becomes, in this
case, !
d 2 dR
r = `(` + 1)R.
dr dr
Let us try a solution of the form R(r) = rn , for some constant n. We then get R0 = nrn−1
and so ! !
d 2 dR d
r = nr n+1
= n(n + 1)rn .
dr dr dr
This will therefore be a solution, but only provided n(n + 1) = `(` + 1). There are two
possible solutions, befitting of a second order ODE: Either n = ` or n = −(` + 1). Thus,
for each value of `, the general solution will be
b
R` (r) = ar` + , (5.77)
r`+1

196
where a and b are constants. Since ` is a non-negative integer, the solution r` will be
regular at all points, while 1/r`+1 will diverge at r = 0.
In the case where k > 0, Eq. (5.76) becomes more complicated. The solution, in this
case, turns out to be a special set of functions, called the spherical Bessel functions.
These are a bit outside the scope of this course (you will learn about Bessel functions
next semester). So I won’t discuss it any further. Instead, we now turn to the angular
equation.

5.7 Angular equation and Spherical Harmonics

We now turn to the angular equation (5.75), where L̂2 is given in (5.71). Ultimately,
as you can see, what we want to do is find the eigenvalues and eigenfunctions of the
differential operator L̂2 . As we will learn, the eigenvalues are `(` + 1), with ` =
0, 1, 2, 3 . . ., while the eigenfunctions will be a very important family of functions,
called the Spherical Harmonics.
We begin by writing (5.75) as

1 ∂ ∂Y 1 ∂2 Y
" ! #
L̂2 Y = − sin θ + = `(` + 1)Y.
sin θ ∂θ ∂θ sin2 θ ∂φ2
We then attempt a separation of variables again, now writing Y(θ, φ) = Θ(θ)Φ(φ). This
yields, after multiplying both sides by sin2 θ/Y,

sin θ ∂ ∂Θ 1 ∂2 Φ
!
sin θ + `(` + 1) sin2 θ = − .
Θ ∂θ ∂θ Φ ∂φ2

The left-hand side depends only on θ and the right-hand side only on φ. As a conse-
quence, each term must be a constant. Traditionally, we call this constant m2 . That is,
we write
sin θ ∂ ∂Θ
" ! #
sin θ + `(` + 1) sin2 θ = m2 (5.78)
Θ ∂θ ∂θ

1 ∂2 Φ
= −m2 (5.79)
Φ ∂φ2
Things are starting to improve. Eq. (5.79) can now be easily solved:

d2 Φ
= −m2 Φ → Φ = e±imφ .
dφ2

This looks just like the usual harmonic oscillator equation y00 = −ω2 y. There is, how-
ever, one fundamental difference: φ is an angle so, for physical reasons, the solution
must be periodic in φ, with period 2π. That is,

eim(φ+2π) = eimφ .

197
Consequently, we see that not all values of m solve Eq. (5.79); a solution will exist only
if m = 0, ±1, ±2, ±3, . . .. Thus, to summarize, the solution of Eq. (5.79) is

Φ(φ) = eimφ , m = 0, ±1, ±2, . . . . (5.80)

There should be two solutions, eimφ and e−imφ , since the ODE is 2nd order. But we are
already taking care of both by allowing m to run over the negatives as well.
Next we turn to Eq. (5.78) for Θ, which we rewrite as

∂ ∂Θ
! h #
sin θ sin θ + `(` + 1) sin2 θ − m2 Θ = 0. (5.81)
∂θ ∂θ

We change variables to x = cos θ. This causes

∂ ∂x ∂ ∂
= = − sin θ .
∂θ ∂θ ∂x ∂x
The equation is thus transformed to

∂ 2 ∂Θ
! h #
sin θ
2
sin θ + `(` + 1) sin θ − m Θ = 0.
2 2
∂x ∂x

Substituting sin2 θ = 1 − x2 and dividing both sides by sin2 θ, leads to

∂ ∂Θ m2 i
" # h
(1 − x2 ) + `(` + 1) − Θ = 0.
∂x ∂x 1 − x2

And opening up the first term produces

∂ ∂Θ
" #
(1 − x2 ) = (1 − x2 )Θ00 − 2xΘ0
∂x ∂x

Thus, what we find is, ta-da!, exactly the associated Legendre equation (5.62)
h m2 i
(1 − x2 )Θ00 − 2xΘ0 + `(` + 1) − Θ = 0. (5.82)
1 − x2
As we saw all the way back in Sec. 5.1, these solutions will be stable at x = ±1 only
when ` is an integer,
` = 0, 1, 2, . . .
Moreover, as we saw in Sec. 5.5, for each `, the only allowed values of m are integers
satisfying |m| 6 `. Hence, the solutions will be the associated Legendre functions
Pm` (x), defined in Eq. (5.63), with x = cos θ:

Θ(θ) = Pm
` (cos θ),

` = 0, 1, 2, 3, . . . , (5.83)

m = 0, ±1, ±2, . . . , ±`. (5.84)

198
Combining this with the Φ solution, Eq. (5.80), we then finally obtain the general
family of linearly independent solutions of the angular equation,

Y`m (θ, φ) = APm

` (cos θ)e
imφ
, (5.85)

where A is a normalization constant. These solutions are called Spherical Harmonics.

Conventionally, we fix A by imposing that

Zπ Z2π
dθ sin θ dφ |Y(θ, φ)|2 = 1. (5.86)
0 0

Using Eq. (5.64) this yields

Zπ Z2π
2(` + m)!
dθ sin θ dφ |Y(θ, φ)|2 = |A|2 (2π) = 1.
(2` + 1)(` − m)!
0 0

The sign of A is arbitrary and different sources use different conventions. I will adopt
here the “quantum convention” (because it is the convention used in quantum mechan-
ical applications): s
(2` + 1) (` − |m|)!
A = (−1)m .
4π (` + |m|)!
This is a good time to summarize what we have learned.

Spherical Harmonics

The general solution of the angular equation (5.75) are the Spherical Har-
monics s
(2` + 1) (` − |m|)! imφ m
Y` (θ, φ) = (−1)
m m
e P` (cos θ), (5.87)
4π (` + |m|)!
where

` = 0, 1, 2, 3, . . . , (5.88)

m = 0, ±1, ±2, . . . , ±`. (5.89)

They are the eigenfunctions of the operator L̂2 :

L̂2 Y`m = `(` + 1)Y`m , (5.90)

with eigenvalue `(` + 1). The shape of the first few harmonics is illustrated in
Fig. 5.5.

It is interesting to note how Y`m is defined by two indices, ` and m, while the eigenvalues
depend only on `. We say the eigenvalues of L̂2 are degenerate, which means each

199
eigenvalue is associated with more than one eigenfunction. For each given `, there
are in fact 2` + 1 allowed values of m: −`, −` + 1, . . . , ` − 1, `. Hence, we say that
the degeneracy of the eigenvalue `(` + 1) is 2` + 1 (“degeneracy’ is the number of
eigenfunctions associated to a certain eigenvalue).
The Spherical Harmonics form a basis for functions living on the unit sphere.
That is, functions f (θ, φ) which depend only on the angles in spherical coordinates.
The functions eimφ are orthogonal in the sense that

Z2π
0
ei(m−m )φ dφ = 2πδm,m0 .
0

If we combine this with the orthogonality (5.64) of the associated Legendre functions
and our choice of normalization (5.86), we then see that the Spherical Harmonics are
naturally orthonormal,

Z
0
dΩ (Y`m )∗ Y`m0 = δ`,`0 δm,m0 , (5.91)

where dΩ = sin θdθdφ is an element of solid angle. Hence, any function f (θ, φ) can be
expanded as
∞ X
X `
f (θ, φ) = c`,m Y`m (θ, φ), (5.92)
`=0 m=−`

with coefficients Z
c`,m = dΩ f (θ, φ)(Y`m )∗ . (5.93)

This is exactly the same logic as the Fourier business we started our course with.

200
Figure 5.5: Plots of the first few spherical harmonics. The colors distinguish the regions where
Re(Y`m ) > 0 (orange) and < 0 (blue).

201
5.8 The hydrogen atom
Lastly, we turn to what is one of the major applications of orthogonal polynomials:
the quantum properties of a hydrogen atom. We assume that the proton is very heavy
and therefore does not participate in the dynamics. The problem is then reduced to
modeling an electron, of mass m, subject to the Coulomb potential

e2
V(r) = − , (5.94)
4π0 r
where e is the electron charge, 0 is the vacuum permittivity and r = |r| is the po-
sition, measured with respect to the nucleus. Schrödinger’s equation for the electron
wavefunction is given by
∂Ψ
i~ = ĤΨ, (5.95)
∂t
where
p̂2
Ĥ = + V(r), (5.96)
2m
is the Hamiltonian. All quantum problems we treated before were in 1D, while here
one must naturally work in 3D. In this case the momentum operator p̂ will be a vector

p̂ = −i~∇ → p̂2 = −~2 ∇2 . (5.97)

Eq. (5.95) thus becomes

∂Ψ ~2
i~ = − ∇2 Ψ + V(r)Ψ, (5.98)
∂t 2m
which is a partial differential equation for Ψ(r, t).
As always, we start by solving it using separation of variables in time and space:
Ψ(r, t) = e−iEt/~ ψ(r). This transforms (5.98) into an eigenstuff equation Ĥψ = Eψ. Or,
more explicitly,

~2 2
− ∇ ψ + V(r)ψ = Eψ. (5.99)
2m

The goal is then to simultaneously solve this for the eigenfunctions ψ(r) and the eigen-
values E.

Radial and angular equations

The key feature of Eq. (5.99) is that the potential V(r) depends only on the mag-
nitude of the position r. We call it a central potential. This means that the equation
is spherically symmetric. That is, there is nothing in (5.99) which says that θ = 0.3 is
more important than θ = 1.6. All angles in the sphere appear with equal weight. This
strongly suggests we work in spherical coordinates. That is, that we attempt a solution
by separation of variables, ψ(r, θ, φ) = R(r)Y(θ, φ).

202
The situation here is very similar to that of Sec. 5.6. We first write the Laplacian as
in Eq. (5.72), with the operator L̂2 defined in Eq. (5.71). Eq. (5.99) then becomes

~2 1 ∂ 2 ∂ψ
( ! )
1 2
− r − 2 L̂ ψ + V(r)ψ = Eψ. (5.100)
2m r2 ∂r ∂r r

We multiply both sides by −2mr2 /~2 and write ψ(r, θ, φ) = R(r)Y(θ, φ):

2mr2 h
!
d 2 dR i
Y r − RL̂2 Y − 2 V(r) − E RY = 0.
dr dr ~
Finally, we divide both sides by RY:

2mr2 h
!
1 d 2 dR i
r − 2 V(r) − E = L̂2 Y. (5.101)
R dr dr ~
The left-hand side is now only a function of r, while the right-hand side is only a
function of θ and φ. Whence, they must each be a constant. Just like we did in (5.75),
we write this constant as `(` + 1). That is,

L̂2 Y = `(` + 1), (5.102)

2mr2 h
!
1 d 2 dR i
r − 2 V(r) − E = `(` + 1). (5.103)
R dr dr ~
You can probably see now why all our effort in Sec. 5.7 was worth it. The angular
equation does not change. Even though we are now solving a completely different
equation, due to the spherical symmetry of V(r), the angular equation remains intact.
And we already have a full characterization of the operator L̂2 . That is, we already
know that the solution of Eq. (5.102) will be the spherical harmonics Y`m in Eq. (5.87),
with the eigenvalues being ` = 0, 1, 2, . . . and the additional quantum number m =
−`, −` + 1, . . . , ` − 1, `.

The radial equation

All that is left for us to do is deal with the radial equation (5.103). First, I want you
to note something cool. As I mentioned in Sec. 5.6, the operator L̂2 is associated with
angular momentum. Looking at Eq. (5.103) we can now define an effective potential
~2 `(` + 1)
Veff (r) = V(r) + . (5.104)
2m r2
in such a way that Eq. (5.103) becomes

~2 d 2 dR
! h i
− 2
r − Veff (r) − E R = 0. (5.105)
2mr dr dr
This means that, in practice, what the electron feels is not the Coulomb potential, but
Veff . This new term is called a centrifugal potential. It is positive and therefore rep-
resents a repulsion: it tends to throw the particle outward, away from the center. This

203
��

Figure 5.6: Pictorial illustration of the effective potential (5.104). The Coulomb potential is
negative (blue-dashed) while the centrifugal term is positive (green-dotted). One
is thus attractive and the other repulsive. As a consequence, we get the red curve,
2
which has a minimum at rmin = a0 `(` + 1), where a0 = 4π 0~
me2
is Bohr’s radius.

is the strange way in which angular momentum manifests itself in quantum theory:
the eigenvalue ` represent the possible eigenmodes of the angular momentum and, the
higher the value of `, the stronger is the push outward.
This centrifugal term competes with the Coulomb potential (5.94), which is nega-
tive and hence always attractive. The combination leads to something like the potential
shown in Fig. 5.6. One curve goes down, the other goes up. When we mix the two,
we get a potential that eventually has a minimum somewhere. The position of the
minimum can be found by computing dVeff /dr = 0 and reads

rmin = `(` + 1)a0 , (5.106)

where
4π0 ~2
a0 = ' 5.29 × 10−11 m, (5.107)
me2
is known as Bohr’s radius. It can be viewed as a kind of typical distance scale for atomic
stuff. We thus see that the minimum occurs further from the nucleus, the larger is the
angular momentum `. The existence of a minimum is important because it suggests that
there may be stable configurations where the smallest possible energy is “somewhere
in between”. That is, where the electron neither collapses toward the proton nor flies
away to infinity. We call these configurations bound states.
Returning to Eq. (5.105), define u = rR. One may verify that

d 2 dR d2 u
r =r 2,
dr dr dr
so that Eq. (5.105) becomes

~2 d2 u e2 ~2 `(` + 1)
" #
− + − + − E u = 0. (5.108)
2m dr2 4π0 r 2m r2

204
We now have to start cleaning this up. Define
√
−2mE
κ= , (5.109)
~
1 me2
η= = , (5.110)
κa0 4π0 ~2 κ
ρ = 2κr. (5.111)

The constant κ has units of 1/length (wavenumber), so the new variable ρ is dimension-
less (as is η). I will skip a bit of annoying calculations here. But you may check that,
in terms of these new quantities, Eq. (5.108) becomes

d2 u 1 η `(` + 1)
" #
= − + u. (5.112)
dρ2 4 ρ ρ2

This is very nice because now everything is dimensionless. Note that this is still an
eigenequation, so the energies E are also unknowns. In this new notation, they are
hidden away in η. That is to say, Eq. (5.112) is really an eigenequation for determining
both u and η.
At this point things start to get a little nasty. We are interested in bound solutions.
That is, solutions which remain finite as u → ∞ and u → 0. This is where quantization
comes from. Not all values of η will lead to regular solutions. Most, actually, will not.
The situation is very similar to what happened with the Legendre polynomials. We
started the chapter with Eq. (5.1), where n could be any constant. But we then saw
that most solutions were not well behaved and only if n was an integer would we get
something regular. The same thing will also happen here, although seeing it is a bit
trickier. What one has to two is first make another change of variables to2

u(ρ) = ρ`+1 e−ρ/2 v(ρ), (5.113)

for a new unknown function v(ρ). I will leave for you then the boring task of checking
that v(ρ) satisfies
ρv00 + (2` + 2 − ρ)v0 + (η − ` − 1)v = 0. (5.114)
This equation will be associated with the Laguerre polynomials. In fact, in the problem
set, you saw that the associated Laguerre polynomials, defined by the Rodrigues
formula,
dk e x d j j −x
Lkj = (−1)k k L j+k , L j (x) = (x e ), (5.115)
dx j! dx j
2 This may seem quite magical. But there is actually a logic to it. The idea is to strip away the asymptotic

behavior of the solution, when ρ → ∞ and ρ → 0. For instance, if ρ is very large Eq. (5.112) is approximated
by u00 = u, whose solutions must thus be eρ or e−ρ . Since eρ would be divergent, we can then conclude that at
very large distances the solution must decay as u ∼ e−ρ . Similarly, when ρ → 0 the dominant term in (5.112)
will be the centrifugal one and the solution which does not explode will be u ∼ ρ`+1 . What we are doing in
Eq. (5.113) is essentially stripping away these two asymptotic behaviors, in the hope that the equation for v
will turn out to be more manageable.

205
were a solution of the ODE
xy00 + (k + 1 − x)y0 + jy = 0, (5.116)
but only provided k and j were integers.
Eq. (5.114) is of this form, with
k = 2` + 1, j = η − ` − 1.
Thus, regular solutions will only exist provided
j = 0, 1, 2, 3, . . . ,
This quantizes the allowed values of η. It says, essentially, that
η ≡ n = 1, 2, 3, . . . ,
i.e, any positive integer n. Moreover, for each given n, it also imposes that ` must be
constrained to
` = 0, 1, 2, . . . , n − 1.
The regular solutions of (5.112) will thus be, up to a constant,
u(ρ) = ρ`+1 e−ρ Ln−`−1
2`+1
(ρ). (5.117)
Please don’t get me wrong. I am not being rigorous at all here. To really show that
these are the only regular solutions and so on, is a bit harder and, unfortunately, we
will not have time to do it. What I am doing here is merely to try to show you how
these orthogonal polynomials may emerge in this quantum business.

Bohr’s formula

The condition that η, defined in Eq. (5.110), must be a positive integer, im-
plies that √
−2mE 1
κ= = ,
~ na0
which therefore determines the allowed energies:

~2
En = − , n = 1, 2, 3, . . . . (5.118)
2ma20 n2

This is called Bohr’s formula for the energy levels of the Hydrogen atom.
From a historical perspective, it is perhaps the most important result in quan-
tum mechanics. It was figured out by Bohr in 1912, before quantum theory, and
matches very well with experiments in spectroscopy. When Schrödinger first
proposed his equation in 1926, the first thing he did was to apply it to the Hy-
drogen atom, exactly like we did above. And to find Bohr’s formula naturally
emerge from the formalism was seen, by him, as a strong confirmation that his
ideas were correct.
The smallest (most negative) energy is called the ground state and has the

206
�=� -��
�=� -��

�=� -��

Figure 5.7: First few energy levels of the Hydrogen atom, in electron-Volt.

value
~2
E1 = − ' −13.6 eV. (5.119)
2ma20
The other levels, n = 2, 3, . . . are called excited states and all have energies
larger than E1 , but still negative (Fig. 5.7). States with energy above zero are
not bound states. That is, they do not describe the electron being bound to the
proton.

The periodic table

To wrap up, let us now go back to our original variables and write down the full
solution. Our results are written in terms of u = rR and ρ = 2κr, where κ = 1/na0 .
Eq. (5.117) therefore becomes
A
(2r/na0 )`+1 e−r/na0 Ln−`−1
Rn` (r) = 2`+1
(2r/na0 ),
r
where A is a normalization constant, which is determined by imposing that
Z∞
dr r2 R2n` = 1. (5.120)
0

Here the factor of r2 appears because we are working in spherical coordinates, so that
the element of integration is d3 r = r2 sin θ drdθdφ. It is a little bit nasty to find an

207
Figure 5.8: Pretty plots of the Hydrogen atom wavefunctions |ψn`m |2 defined in Eq. (5.122), for
a bunch of values of (n, `, m).

actual form for A, so I will simply write down the result:

s
!3 !`
2 (n − ` − 1)! 2r
Rn` (r) = e−r/na0 Ln−`−1
2`+1
(2r/na0 ). (5.121)
na0 2n(n + `)! na0

Yeah, I told you: it’s nasty. But nasty as it may be, we now have the full solution of
Schrödinger’s equation for the Hydrogen atom:

ψn`m (r, θ, φ) = Rn` (r)Y`m (θ, φ), (5.122)

where Y`m are the spherical harmonics in Eq. (5.87). Some pretty plots of |ψn`m |2 are
shown in Fig. 5.8.
The states with different values of ` usually receive funny names, which are due to
historical reasons. For instance, ` = 0 is called s, ` = 1 is called p, ` = 2 is called d
and ` = 3 is called f . Thus, for instance, all eigenstates with n = 2, ` = 1 are called 2p
states. This may be reminding you of chemistry class, and that’s exactly the point. In

208
chemistry you probably learned how to build the periodic table by filling the electrons
into orbitals, which were labeled as

1s, 2s, 2p, 3s, 3p, 3d, . . .

If n = 1, we can only have ` = 0, so there is only 1s. But if n = 2, we can have

either ` = 0 or ` = 1, so there is 2s and 2p. And so on and so forth. The labels
used in chemistry are therefore directly related to the eigenstates of the Hydrogen atom
Schrödinger equation, which we just solved.
This is it. This is the end of the course. Thank you so much for your time and
dedication. I hope you have enjoyed it as much as I did. And see you around! :)

209

New National Curriculum of Pakistan - 2
No ratings yet
New National Curriculum of Pakistan - 2
428 pages
R. Radha, S. Thangavelu - Fourier Analysis - NPTEL Course-NPTEL
No ratings yet
R. Radha, S. Thangavelu - Fourier Analysis - NPTEL Course-NPTEL
207 pages
Math4 (1)
No ratings yet
Math4 (1)
260 pages
bookdown-Lecture notes
No ratings yet
bookdown-Lecture notes
113 pages
1B Methods Lecture Notes: Richard Jozsa, DAMTP Cambridge Rj310@cam - Ac.uk
No ratings yet
1B Methods Lecture Notes: Richard Jozsa, DAMTP Cambridge Rj310@cam - Ac.uk
26 pages
2246mm3 Notes
No ratings yet
2246mm3 Notes
96 pages
Introduction to Fourier Series
No ratings yet
Introduction to Fourier Series
75 pages
Fourier Series and Transform PDF
No ratings yet
Fourier Series and Transform PDF
407 pages
Analytic Techniques for PDEs Fourier Series and the Method of Sep
No ratings yet
Analytic Techniques for PDEs Fourier Series and the Method of Sep
44 pages
Fourier Analysis Some Notes
No ratings yet
Fourier Analysis Some Notes
79 pages
Schoenstadt Fourier PDE
100% (2)
Schoenstadt Fourier PDE
268 pages
IIT Indore, MA203: Module 3 - Partial Differential Equations, Vinay Kumar Gupta
No ratings yet
IIT Indore, MA203: Module 3 - Partial Differential Equations, Vinay Kumar Gupta
55 pages
Engineering Analysis Numerical Methods (2024-2025) (5)
No ratings yet
Engineering Analysis Numerical Methods (2024-2025) (5)
87 pages
Kantor Ovich 2
No ratings yet
Kantor Ovich 2
147 pages
Mathematics For Engineers III
No ratings yet
Mathematics For Engineers III
118 pages
PIB - Methods PT 1 - Jozsa (2013) 26pg
No ratings yet
PIB - Methods PT 1 - Jozsa (2013) 26pg
26 pages
Fa Notes
No ratings yet
Fa Notes
80 pages
Mathematics For Engineers III - 1
No ratings yet
Mathematics For Engineers III - 1
119 pages
MATH - APP.240 Fourier Methods: Merja Laaksonen and Petteri Laakkonen, TUT 2020
No ratings yet
MATH - APP.240 Fourier Methods: Merja Laaksonen and Petteri Laakkonen, TUT 2020
103 pages
MATH356 Fourier Series, Convergence in L2
No ratings yet
MATH356 Fourier Series, Convergence in L2
23 pages
Mat 2165
No ratings yet
Mat 2165
59 pages
Schoenstadt Fourier PDE PDF
No ratings yet
Schoenstadt Fourier PDE PDF
268 pages
fourier_lectures_part1
No ratings yet
fourier_lectures_part1
18 pages
Module-2 Fourier Series
No ratings yet
Module-2 Fourier Series
28 pages
Unit-III MMC Lecture-2
No ratings yet
Unit-III MMC Lecture-2
24 pages
Fourier Series: Philippe B. Laval Kennesaw State University March 24, 2008
No ratings yet
Fourier Series: Philippe B. Laval Kennesaw State University March 24, 2008
26 pages
The Fourier Transform and Its Applications-Brad Osgood
No ratings yet
The Fourier Transform and Its Applications-Brad Osgood
428 pages
Module 4: Fourier Series
No ratings yet
Module 4: Fourier Series
6 pages
FourierAnalysis IHS Seminar
No ratings yet
FourierAnalysis IHS Seminar
45 pages
MATE2A2 Introduction Fourier Series
No ratings yet
MATE2A2 Introduction Fourier Series
16 pages
The Fourier Transform and Its Applications
No ratings yet
The Fourier Transform and Its Applications
428 pages
Fourier Series
No ratings yet
Fourier Series
17 pages
Cse Iii Engineering Mathematics Iii 10mat31 Notes PDF
No ratings yet
Cse Iii Engineering Mathematics Iii 10mat31 Notes PDF
138 pages
Fourier Series Edited Notes 2022 - 23
No ratings yet
Fourier Series Edited Notes 2022 - 23
24 pages
CH02 - Wooldridge - 7e PPT - 2pp
100% (3)
CH02 - Wooldridge - 7e PPT - 2pp
40 pages
M3P18Intro
No ratings yet
M3P18Intro
10 pages
Maths Class Notes PDF
No ratings yet
Maths Class Notes PDF
95 pages
Hand Book 2nd Year E&E PDF
No ratings yet
Hand Book 2nd Year E&E PDF
118 pages
Fourier Series
No ratings yet
Fourier Series
440 pages
Fourier Transform Primer and Its Applications in Signal Processing
No ratings yet
Fourier Transform Primer and Its Applications in Signal Processing
16 pages
Methods PDF
No ratings yet
Methods PDF
89 pages
Preamble
No ratings yet
Preamble
7 pages
HMS211M Engineering Mathematics 3M
No ratings yet
HMS211M Engineering Mathematics 3M
109 pages
Fourier Series and Fourier Transform Summary
No ratings yet
Fourier Series and Fourier Transform Summary
23 pages
Fourier Series (Introduction)
No ratings yet
Fourier Series (Introduction)
15 pages
F Series
No ratings yet
F Series
34 pages
Engineering Mathematics III 2015 Solved Question Papers For VTU All Semester 3 PDF
50% (2)
Engineering Mathematics III 2015 Solved Question Papers For VTU All Semester 3 PDF
25 pages
Walet - Further Mathematical Methods PDF
No ratings yet
Walet - Further Mathematical Methods PDF
79 pages
Fourier
No ratings yet
Fourier
39 pages
Fourier Analysis
No ratings yet
Fourier Analysis
79 pages
MITx - 18.03Fx - DIFFERENTIAL EQUATIONS - FOURIER SERIES AND PARTIAL DIFFERENTIAL EQUATIONS
No ratings yet
MITx - 18.03Fx - DIFFERENTIAL EQUATIONS - FOURIER SERIES AND PARTIAL DIFFERENTIAL EQUATIONS
5 pages
Notes 2010 Part1
No ratings yet
Notes 2010 Part1
34 pages
Байкал - Abstracts - en2014
No ratings yet
Байкал - Abstracts - en2014
181 pages
Maths 3 Compiled A.T.Eshwar
No ratings yet
Maths 3 Compiled A.T.Eshwar
47 pages
Ross mathematics program 2023 application problems
No ratings yet
Ross mathematics program 2023 application problems
4 pages
A New Class of Algorithms For Classical Plasticity
No ratings yet
A New Class of Algorithms For Classical Plasticity
27 pages
2022 Jan
No ratings yet
2022 Jan
11 pages
Averages
No ratings yet
Averages
36 pages
Engineering Mathematics III 2015 Solved Question Papers For VTU All Semester 3
No ratings yet
Engineering Mathematics III 2015 Solved Question Papers For VTU All Semester 3
25 pages
History India: Post Independence-Bipin Chandra India After Independence
No ratings yet
History India: Post Independence-Bipin Chandra India After Independence
14 pages
8-lecture-on-Rose-curve-graph
No ratings yet
8-lecture-on-Rose-curve-graph
13 pages
Notes Important Questions Answers 12th Math Chapter 6 Exercise 6.1
No ratings yet
Notes Important Questions Answers 12th Math Chapter 6 Exercise 6.1
15 pages
Elln Math Reviewer
100% (1)
Elln Math Reviewer
15 pages
The Gauss Theorem
No ratings yet
The Gauss Theorem
4 pages
Black-Scholes Option Pricing Model
No ratings yet
Black-Scholes Option Pricing Model
56 pages
6 STD - Maths
No ratings yet
6 STD - Maths
3 pages
Al Wisam School: Department: Math
No ratings yet
Al Wisam School: Department: Math
19 pages
Pengaruh Kepercayaan Diri Dan Kemampuan Komunikasi Matematika Terhadap Hasil Belajar Matematika Siswa
No ratings yet
Pengaruh Kepercayaan Diri Dan Kemampuan Komunikasi Matematika Terhadap Hasil Belajar Matematika Siswa
10 pages
ResumeV2 MChin Final
No ratings yet
ResumeV2 MChin Final
2 pages
11th MMMM Que
No ratings yet
11th MMMM Que
8 pages
Lecture Notes On Mathematical Methods PH2130 - 2012/2013: Glen D. Cowan Physics Department
No ratings yet
Lecture Notes On Mathematical Methods PH2130 - 2012/2013: Glen D. Cowan Physics Department
8 pages
MAT104E22B2
No ratings yet
MAT104E22B2
7 pages
Tsa Final 2017 Sol
No ratings yet
Tsa Final 2017 Sol
6 pages
Math With Answer (Type A)
No ratings yet
Math With Answer (Type A)
17 pages
Xii Maths Pre-Board - 2 Set-1 (2023 - 2024)
No ratings yet
Xii Maths Pre-Board - 2 Set-1 (2023 - 2024)
5 pages
Curriculum AE
No ratings yet
Curriculum AE
2 pages
Month: March Week: 25 KG: 3A Theme: Jobs and Occupations This Week We Are Going To Learn
No ratings yet
Month: March Week: 25 KG: 3A Theme: Jobs and Occupations This Week We Are Going To Learn
3 pages
0910 F4 M1+M2 Half Yearly Exam
No ratings yet
0910 F4 M1+M2 Half Yearly Exam
4 pages
Section 4-7: Comparison Test/Limit Comparison Test: Geometric Series
No ratings yet
Section 4-7: Comparison Test/Limit Comparison Test: Geometric Series
13 pages
DPP Limits-1
No ratings yet
DPP Limits-1
2 pages
Course Handout Math-Ii (Ma 1004) PDF
No ratings yet
Course Handout Math-Ii (Ma 1004) PDF
4 pages
Functional Analysis
From Everand
Functional Analysis
Peter D. Lax
No ratings yet
Mortals or Immortals
From Everand
Mortals or Immortals
Konstantinos p Anastasiadis
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
Mathematics N4: FET College Nated, #6
From Everand
Mathematics N4: FET College Nated, #6
Efetobo Emede
No ratings yet
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Fisica Matematica I

Uploaded by

Fisica Matematica I

Uploaded by

Mathematical physics

December 15, 2020

2 Ordinary Differential Equations 36

3 Partial Differential Equations 68

5 Legendre Polynomials 174

1.1 Periodic functions

��� ���� ����

���� ���� ����

�=� �=� �=� �=� �=� �=� �=�

� ��� � � ��� � � ��� � � ��� � � ��� � � ��� � � ��� �

Piecewise periodic functions

1.2 Orthogonality of trigonometric functions

However, we now have that

since n is an integer. Similarly, one may show that

This therefore provides us with a recipe for finding a0 :

ZL/2 ZL/2 ZL/2

Next, let us play with

In this case we use the Wikipedia formula

Orthogonality of trigonometric functions

We can summarize our findings in a neat way by introducing the Kronecker

All calculations in this section are then summarized by:

1.3 The Fourier recipe

ZL/2 ! ZL/2 ! ∞ ZL/2 ! !

The Fourier recipe

To summarize, we have learned that given a periodic function f (x), with

where the coefficients an , bn are given by

The interval of integration here is arbitrary, as long as it has length L.

g(x + L/γ) = f (γx + L) = f (γx) = g(x).

To compute its Fourier series, we apply Eqs. (1.17)-(1.18):

f (x) = x2 , x ∈ [−π, π], (1.26)

Thus, to summarize, the Fourier series of x2 reads

Fourier series for even and odd functions

Even and odd functions

• f (x) even → Only an (cosine series).

• f (x) odd → Only bn (sine series).

The function f (x) = x (sawtooth wave)

since the integrand is odd. The bn , on the other hand, read

Figure 1.12: Fourier series for f (x) = x, Eq. (1.31).

or bn = 2(−1)n+1 /n. Thus, the Fourier series for f (x) = x reads

Fourier series for scales and shifts

1.5 Complex form

eiθ = cos θ + i sin θ, (1.34)

Thus, we arrive at the following orthogonality relation for complex exponentials:

Let us summarize this in a big box:

Complex Fourier series

A periodic function f (x) may be expanded in a complex Fourier series

I personally love these two equations: simple, elegant, powerful.

The first term is an /2 and the second is −ibn /2. Whence,

c∗n = c−n . (1.44)

Integration by parts yields

Recalling that e inπ

π(−1)n (−1)n − 1 π(−1)n 1 − (−1)n

Thus, we can summarize our result as

1.6 Parseval’s identity

The right-hand side, on the other hand, reads

Changing n → −n allows us to combine the last two terms, leading to

Summarizing, Parseval’s identity in either real or complex form, reads

As an example of (1.50), consider the Fourier series of f (x) = x3 − π2 x, periodic in

Figure 1.13: The normalized boxcar function (1.54).

1.7 Dirac delta and Heaviside functions

This is a very special function. In fact, it is a somewhat pathological, because it is zero

To be more precise, δ is is called a distribution, which is a generalization of the concept

Whence, taking the limit a → 0 of Eq. (1.56) leads to

Thus, α = 1. We therefore conclude that

f (x − 0) := lim− f (x + ), f (x + 0) := lim+ f (x + ),

f 0 (x) = f 0 (x)θ(x0 − x) + f 0 (x)θ(x − x0 ) − f (x)δ(x − x0 ) + f (x)δ(x − x0 ),

f 0 (x) = f 0 (x)θ(x0 − x) + f 0 (x)θ(x − x0 ) + ∆ f (x0 )δ(x − x0 ). (1.65)

1.8 Convergence of Fourier series

Properties of uniformly convergent series

Figure 1.16: A function which is continuous, but whose derivative is discontinuous

Using the triangle inequality, |a + b| 6 |a| + |b|, we can bound this as

1.9 Integrals and derivatives of Fourier series

Derivative and integral of Fourier series

Sine as an infinite product

Thus, combining both results we arrive at

Sine and cosines as infinite products

��

��

� ��

f (x − 0) := lim− f (x + ), f (x + 0) := lim+ f (x + ),