Fisica Matematica I
Fisica Matematica I
Gabriel T. Landi
University of São Paulo
1 Fourier Series 3
1.1 Periodic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Orthogonality of trigonometric functions . . . . . . . . . . . . . . . . 5
1.3 The Fourier recipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Complex form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Parseval’s identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.7 Dirac delta and Heaviside functions . . . . . . . . . . . . . . . . . . 25
1.8 Convergence of Fourier series . . . . . . . . . . . . . . . . . . . . . 29
1.9 Integrals and derivatives of Fourier series . . . . . . . . . . . . . . . 32
1
4 Fourier Transforms 130
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.2 Characteristic function of a probability distribution (optional) . . . . . 139
4.3 Operations involving Fourier Transforms . . . . . . . . . . . . . . . . 141
4.4 Cauchy problem for the heat equation . . . . . . . . . . . . . . . . . 146
4.5 Quantum dynamics and Heisenberg’s uncertainty principle . . . . . . 152
4.6 Poisson’s equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.7 Discrete Fourier Transform and the FFT algorithm . . . . . . . . . . 164
Fourier Series
Fourier series is a method for dealing with functions f (x) that are periodic, with
some period L. That is, functions which satisfy
f (x + L) = f (x), (1.1)
Since periodic stuff is so common in physics, Fourier series turn out to be extremely
useful. They will also serve as the entry point for an even more useful idea, called
Fourier Transform, that we will study later on.
Periodic functions can have many shapes. Fig 1.1 shows some examples. The cool
thing about them is that you don’t need to consider the entire real line. All information
about them is contained in an interval of length L. The actual choice of interval is up
to you, as long as it has length L. This is illustrated in Fig. 1.2.
����
��� ����
��� ���� ����
��� ���� ����
�(�)
�(�)
�(�)
Figure 1.1: Examples of periodic functions. These functions were chosen to have period L = 1.
3
� ∈ [���] � ∈ [-�/���/�] � ∈ [-�/���/�]
�(�)
�(�)
���� ���� ����
���� ���� ����
���� ���� ����
-� -� � � � -� -� � � � -� -� � � �
� � �
Figure 1.2: All information about periodic functions is contained in the internal L. The actual
choice of interval is arbitrary, as long as it has length L.
are periodic, with period L. We can explicitly verify that this is the case, e.g. by
expanding
! ! ! ! ! !
2π 2πx 2πL 2πx 2πL 2πx
sin (x + L) = sin cos + cos sin = sin ,
L L L L L L
since cos(2π) = 1 and sin(2π) = 0. These functions, however, are not the only trigono-
metric functions which have period L. There is, in fact, an entire set of functions of the
form ! !
2πnx 2πnx
cos , and sin , n = 0, 1, 2, 3, . . . .
L L
It suffices to stick to n ∈ N since n < 0 does not give us anything new (because
sin(−θ) = − sin(θ) and cos(−θ) = cos θ). One can check that these functions are peri-
odic, with period L, in the same way as above; for instance,
! ! ! ! ! !
2πn 2πnx 2πnL 2πnx 2πnL 2πnx
sin (x + L) = sin cos + cos sin = sin ,
L L L L L L
because cos(2πn) = 1 and sin(2πn) = 0, when n is an integer. Some of the cosines are
illustrated in Fig. 1.3.
Since all these sines and cosines are periodic, any linear combination of them must
also be periodic. What I mean are combinations of the form
∞ ! !
a0 X n 2πnx 2πnx o
f (x) = + an cos + bn sin , (1.2)
2 n=1 L L
where an and bn are arbitrary coefficients. I put a 1/2 on a0 for convenience (the
reason will become clear below). Eq. (1.2) is what we call a Fourier series. It is an
4
���
���
���
���
�(�)
�(�)
��� ���
-���
-���
-���
-���
��� ��� ��� ��� ��� ��� ��� ��� ��� ���
� �
Figure 1.4: The two Fourier series in Eqs. (1.3) and (1.4).
infinite series of sines and cosines, with specific coefficients (an , bn ). Since each term is
periodic, the resulting series is guaranteed to be periodic for any choice of coefficients.
By appropriately choosing the coefficients, one can cook up all sorts of funny func-
tions. Here are two examples:
1 1
f (x) = sin(2πx) − sin(4πx) + sin(6πx), (1.3)
2 3
1 1
f (x) = sin(2πx) +
sin(4πx) + sin(6πx). (1.4)
2 3
The only difference is the minus sign in the second term. These two functions are
plotted in Fig. 1.4. As can be seen, by changing only a innocent minus sign, one gets
rather different functions. The actual form of the function can thus change significantly
with the coefficients.
This is actually what I used to make the plot in Fig. 1.5 I don’t need to specify it over
the other intervals, since I’m constructing it to be periodic. For instance, suppose I want
to evaluate f (x) at x = −0.9. How do I do it? First we translate it back to the interval
used in (1.5): since f (x) has period L = 1, f (−0.9) = f (−0.9 + 1) = f (0.1) = 0.1.
5
Figure 1.5: Imagine the periodic function you have to implement to make your nephew swing
higher and higher. Credits: illegally stolen from the internet.
expanded in a Fourier series. This is actually a very strong statement. Even poorly
behaved functions, containing discontinuities and divergences can be expanded. We
will actually prove some theorems about this later on. But before doing so, let us first
establish a recipe for how to determine the (an , bn ) given a certain f (x). This is done
using the idea of orthogonality of the trigonometric functions, which is pretty cool.
Here is how it works.
We start by integrating both sides of Eq. (1.2) over an interval of length L. The
actual choice of interval does not matter. Common choices are from [0, L] or from
[−L/2, L/2]. We will mostly use the latter. Hence,
ZL/2 ZL/2 ∞ ZL/2 ! ZL/2 !
a0 X 2πnx 2πnx
f (x)dx = dx + an cos dx + bn sin dx
2 n=1
L L
−L/2 −L/2 −L/2 −L/2
The only term that survives is therefore the one proportional to a0 , which leads us to
ZL/2
a0 L
f (x)dx = .
2
−L/2
6
Let me remind you, once again, that the choice of interval is immaterial. We could also
integrate from [0, L], for instance. In fact, I recommend you try repeating the above
calculations for [0, L] to see that it works.
Next we try to do something similar to find the other coefficients. In this case, it
is better if we first establish some integral identities for the trigonometric functions.
Consider first the integral
ZL/2 ! !
2πnx 2πmx
cos sin dx.
L L
−L/2
We could solve this using complex exponentials. Or, what is a bit easier, using a
trigonometric identity like
1 1
cos(x) sin(y) = sin(x + y) − sin(x − y),
2 2
which you can look up on Wikipedia. We then get
= 0,
since n and m are both integers. In doing these integrals, one should always be careful
with special cases (#protip). For instance, if n = m, we would be dividing by zero in
the last term. The right way to do this is to set n = m before doing the integral. Luckily,
in this case it makes no difference and we get 0 anyway. Thus, to summarize,
ZL/2 ! !
2πnx 2πmx
cos sin dx = 0, ∀n, m ∈ N.
L L
−L/2
ZL/2 ! !
2πnx 2πmx
cos cos dx.
L L
−L/2
7
We start by assuming n , m. We then get
ZL/2 ZL/2 ZL/2
2π(n + m)x
! ! ! !
2πnx 2πmx 1 1 2π(n − m)x
cos cos dx. = cos dx + cos dx
L L 2 L 2 L
−L/2 −L/2 −L/2
! L/2 ! L/2
1 L 2π(n + m)x 1 L 2π(n − m)x
= sin + sin
2 2π(n + m) L 2 2π(n − m) L
−L/2 −L/2
= 0,
This is starting to feel boring. We always get zero. But wait! This is only for n , m. If
n = m, we obtain instead
ZL/2 ! ! ZL/2 ! ZL/2
2πnx 2πnx 1 4πnx 1 L
cos cos dx = cos + dx = .
L L 2 L 2 2
−L/2 −L/2 −L/2
The first integral is zero [Eq. (1.6)], but the second is not. We therefore conclude that
ZL/2
0 n , m,
! !
2πnx 2πmx
dx =
cos cos
L L L/2 n = m
−L/2
An identical formula also holds for a sin-sin integral. I will leave that for you as a
(fun!) exercise.
ZL/2 ! !
2πnx 2πmx
cos sin dx = 0, (1.10)
L L
−L/2
ZL/2 ! !
2 2πnx 2πmx
cos cos dx = δn,m , (1.11)
L L L
−L/2
ZL/2 ! !
2 2πnx 2πmx
sin sin dx = δn,m , (1.12)
L L L
−L/2
8
which hold for any n, m ∈ Z. The only exception is Eq. (1.11) when n = m = 0,
in which case we get 2 instead of 1.
These results are called the orthogonality relations of the trigonometric functions,
over the interval [0, L]. This term is used because this is actually quite similar to the
orthogonality of vectors. The sines and cosines appearing in (1.10)-(1.12) form a basis
for the space of periodic functions of period L. And instead of the usual scalar product,
the inner product is defined here as the integral over [0, L] times 2/L.
ZL/2 ! !
2πmx 2πmx
+bn sin cos dx.
L L
−L/2
Almost everything vanishes. The only term that survives is the cos-cos integral. Using
Eq. (1.11) we then find
ZL/2 ! ∞
2πmx X L
f (x) cos dx = an δn,m .
L n=1
2
−L/2
At this point, it is worth getting used to the algebra of Kronecker deltas. The only
term in the sum over n that will not vanish will be when n = m. The Kronecker delta
therefore acts as a selector; it picks, out of all terms in the sum, the one with n = m:
ZL/2 !
2πmx L
f (x) cos dx = am .
L 2
−L/2
We therefore now have a recipe for determining the coefficients an . Of course, now that
there is no sum anymore, it does not matter what we call m or n. We can thus write
ZL/2 !
2 2πnx
an = f (x) cos dx
L L
−L/2
9
Compare this with Eq. (1.8) for a0 . We see that formula is now the same as this one,
for an , with n = 0. This is why I put the 1/2 in front of a0 in the Fourier series
definition (1.2). The expression for bn is derived in exact the same way. I will leave it
for you as a fun fun fun exercise.
ZL/2 !
2 2πnx
an = f (x) cos dx, (1.14)
L L
−L/2
ZL/2 !
2 2πnx
bn = f (x) sin dx. (1.15)
L L
−L/2
The above results are written for functions of period L. Some books use period 2L,
because this makes the integration interval go from [−L, L]. You can adjust for this by
simply replacing L → 2L everywhere. Other books like to use the interval L = 2π,
which is very convenient because it greatly simplifies the formulas:
∞
a0 X n o
f (x) = + an cos(nx) + bn sin(nx) , (1.16)
2 n=1
Zπ
1
an = f (x) cos (nx) dx, (1.17)
π
−π
Zπ
1
bn = f (x) sin (nx) dx. (1.18)
π
−π
In most problems below, I will assume L = 2π. Eqs. (1.16)-(1.18) will therefore ac-
tually be used way more often that the general formulas (1.13)-(1.15). But it is useful
to have (1.13)-(1.15) in a big gray box, as this makes them convenient to adjust for
different L.
It turns out, however, that as far as the Fourier coefficients are concerned, the choice
of interval does not matter at all. We can see this as follows. Suppose f (x) has period
10
L and Fourier series (1.13). Now define a new function g(x) = f (γx), where γ is a
constant. If f has period L, then g will have period L̃ = L/γ, since
Thus, g(x) behaves just like f (x), but in a stretched interval. To find the Fourier series
of g(x), we use the series (1.13) for f (x), with x → γx:
∞ ! !
a0 X n 2πnγx 2πnγx o
g(x) = + an cos + bn sin . (1.19)
2 n=1 L L
In the trigonometric functions, we see that g’s period, L̃ = L/γ naturally appears. If we
now stare at this expression for a second, we will eventually conclude that it is already
in the form of a Fourier series, with the same Fourier coefficients (an , bn ) of f (x). We
thus reach a very important conclusion: stretching the interval does not affect the
Fourier coefficients; the change is only in the sines and cosines in Eq. (1.13).
Going back to generic intervals of length L, define
2πn
kn = , n = 0, 1, 2, 3, . . . . (1.20)
L
Eq. (1.13) is then written more compactly as
∞
a0 X n o
f (x) = + an cos(kn x) + bn sin(kn x) , (1.21)
2 n=1
These kn will turn out to have a neat physical interpretation when we talk about the
propagation of waves in electromagnetism and quantum mechanics. We already know
that, for each term in (1.21), the value n dictates how fast the oscillation is (Fig. 1.3).
Thus, n, or kn , is somehow related to the “speed” of that oscillation mode. We call
it momentum. This is not the usual momentum in classical mechanics. You will
only really appreciate why we use this word when we discuss quantum mechanical
applications (soon!). But for now I just wanted to introduce this jargon, so you could
get used to it. So, if you want to sound cool to your friends, from now on refer to n as
the momenta.
1.4 Examples
Square wave
Consider the piecewise periodic function
1
0 < x ≤ π,
f (x) =
(1.22)
−1 −π < x ≤ 0
This is usually called the sign function. But we are thinking about it in a piecewise
periodic fashion, as in Fig. 1.6, so the sign function becomes the square wave.
11
���
���
�(�)
���
-���
-���
-� π -π � π �π
�
Figure 1.6: The square wave function (1.22), with period L = 2π.
Z0 Zπ
1 1
=− cos(nx) + cos(nx)
π π
−π 0
0 π
1 1
= − sin(nx) +
sin(nx)
nπ −π nπ 0
= 0.
The coefficients an in this case are all identically zero. This is something we could
have actually figured out without any calculations. Our function f is odd ( f (−x) =
− f (x)) while the cosines are even. The integrand, f (x) cos(nx), is thus odd and the
integration is symmetric with respect to zero. What this means is that any positive area
that contribute to the integral in the interval [0, π], will have a corresponding negative
area in the interval [−π, 0]. Hence the integral must vanish. In the Fourier business, it
is really worth to pay attention to this even/oddness thingy. It can save you tons of time
(#protip). We will talk more about it below.
12
As for bn , we have from Eq. (1.15):
Zπ
1
bn = f (x) sin(nx)
π
−π
Z0 Zπ
1 1
=− sin(nx) + sin(nx)
π π
−π 0
0 π
1 1
=
cos(nx) − cos(nx)
nπ −π
nπ 0
1 1
=
1 − cos(−nπ) − cos(nπ) − 1
nπ nπ
2
= 1 − cos(nπ) .
nπ
Since n is an integer, cos(nπ) will be either 1 or −1, depending on whether n is even or
odd:
−1 n odd
cos(nπ) = (−1) =
n
(1.23)
1
n even.
Thus, we see that the coefficients bn will only be non-zero when n is odd:
4
nπ n odd
bn = .
(1.24)
0 n even
The Fourier series for the sign function (1.22) therefore reads
X 4
f (x) = sin(nx) (1.25)
n=1,3,5,...
nπ
∞
X 4
= sin(nx)
k=1
(2k − 1)π
4n 1 1 o
= sin(x) + sin(3x) + sin(5x) + . . . .
π 3 5
In going from the 1st to the 2nd line, I defined a new variable n = 2k−1. The restriction
n = 1, 3, 5, . . . then implies k = 1, 2, 3, 4, . . .. This is therefore just an alternative way
of writing a sum over odd terms.
In Fig. 1.7 I compare the series with the actual function, by truncating the sum at
different values nmax . That is,
nmax
X 4
f (x) = sin(nx).
n=1,3,5,...
nπ
As one would hope, the larger the value of nmax , the better the series approximates f (x).
In the limit nmax → ∞ one recovers f (x) exactly.
13
���� = � ���� = � ���� = �
��� ��� ���
��� ��� ���
�(�)
�(�)
�(�)
��� ��� ���
-��� -��� -���
-��� -��� -���
-� π -π � π �π -� π -π � π �π -� π -π � π �π
� � �
���� = �� ���� = �� ���� = ��
��� ��� ���
��� ��� ���
�(�)
�(�)
�(�)
��� ��� ���
-��� -��� -���
-��� -��� -���
-� π -π � π �π -� π -π � π �π -� π -π � π �π
� � �
Figure 1.7: Fourier series for the square wave function, Eq. (1.22) (shown in black). In each plot
the red curves corresponds to the Fourier series, Eq. (1.25), truncated at different
levels nmax .
�
�
�(�)
�
�
�
-� π -π � π �π
�
Figure 1.8: The function f (x) = x2 , piecewise periodic in [−π, π] (Eq. (1.26)).
The function x2
Next consider the function
again defined to be piecewise periodic. Let us start with an , using Eq. (1.17):
Zπ
1
an = cos(nx)x2 dx.
π
−π
Before we do any calculations, stop and think about even vs. odd. The integrand in
this case is even (if you take x → −x it does not change). So these coefficients will be
non-zero. The same cannot be said about bn , though:
Zπ
1
bn = sin(nx)x2 dx = 0.
π
−π
14
���� = � ���� = � ���� = � ���� = �
� � � �
� � � �
�(�)
�(�)
�(�)
�(�)
� � � �
� � � �
� � � �
-� π -π � π �π -� π -π � π �π -� π -π � π �π -� π -π � π �π
� � � �
Figure 1.9: Fourier series for f (x) = x2 (black), for different truncation sizes nmax .
We have therefore cut the problem in half. Back to the an , we can compute the integrals
using integration by parts twice (with u = x2 and dv = cos(nx)dx). This yields
( π Zπ )
1 1 2 sin(nx)
an = x sin(nx) −
2xdx
π n −π n
−π
The first term vanishes. We now integrate the remaining term by parts, again:
Zπ
x cos(nx) π
( )
2 cos(nx)
an = − − +
nπ n −π n
−π
Now it is the last term that vanishes, because this is the L = 2π version of Eq. (1.6). Or
you can also see it from (1.11) by setting m = 0. Thus, all that survives for an is
4 4
an = 2
cos(nπ) = 2 (−1)n . (1.27)
n n
Notice that this result is bugged for n = 0. This is again one of those problems where
particular cases must be handled before carrying out the integration. Setting n = 0 in
Eq. (1.17) yields
Zπ
1 2π2
a0 = x2 dx = .
π 3
−π
The first term is a0 /2, which is why there is a 2π2 /6 instead of 2π2 /3. The results are
shown in Fig. 1.9. As can be seen, in this case the convergence is insanely fast. Much
faster than the sign function in Fig. 1.7, actually.
15
��� ���
��� ���
��� ���
�(�)
�(�)
��� ���
-��� -���
-��� -���
-��� -���
-� -� -� � � � � -� -� -� � � � �
� �
Figure 1.10: Even (left) and odd (right) functions. For even functions, the area for x < 0 equals
that for x > 0. Conversely, for odd functions, they exactly cancel each other.
even or odd. Even functions satisfy f (−x) = f (x) and odd functions satisfy f (−x) =
− f (x) (Fig. 1.10). Let us formalize the ideas used in the previous section a bit better.
First notice the following:
• even × even → even.
• odd × odd → even.
• odd × even → odd.
If you ever get confused about this, think about the functions x (odd) and x2 (even).
Thus, x · x = x2 is the product of two odd functions, which is even. Similarly, x · x2 = x3
is the product of an odd with an even function, which is odd. It is also useful to
remember that cosines are even and sines are odd.
When dealing with this even/odd issue, it is convenient to write the integration
intervals in (1.14), (1.15) to be from −L/2 to L/2. The reason is that:
ZL/2 ZL/2
f (x) even: f (x)dx = 2 f (x)dx, (1.29)
−L/2 0
ZL/2
f (x) odd: f (x)dx = 0, (1.30)
−L/2
For even functions, the area below the curve for x < 0 and x > 0 are equal. For odd
functions, they have opposite signs and thus cancel each other (Fig. 1.10).
What enters Eqs. (1.14), (1.15), however, are the products of f (x) with cosines and
sines, which are respectively even and odd. Thus, if f (x) is even, f (x) sin(kn x) will be
odd and bn will all vanish. And if f (x) is odd, f (x) cos(kn x) will be odd, so an will
vanish. Summarizing our results:
16
���� = ��� ���� = ���� ���� = ���� ���� = ���� ���� = �����
��� ��� ��� ��� ���
��� ��� ��� ��� ���
��� ��� ��� ��� ���
�(�)
�(�)
�(�)
�(�)
�(�)
��� ��� ��� ��� ���
��� ��� ��� ��� ���
��� ��� ��� ��� ���
-��������������� -��������������� -��������������� -��������������� -���������������
Figure 1.11: A zoom at one of the discontinuities of the square function (Fig. 1.7), to illustrate
the Gibbs phenomenon.
This is actually very easy to remember. The type of the series should reflect
the parity of f (x). So if f is even, it must have a cosine series because cosines
are even. And similarly if f is odd.
Gibbs phenomenon
Compare Figs. 1.7 and 1.9. The latter is continuous and we see that the convergence
of the series is extremely rapid. The square wave in Fig. 1.7, on the other hand, has
discontinuities and, as a consequence, there is a lot more wiggling around. In fact, we
see that even for nmax = 99, in Fig. 1.7, there is still some wiggling present at the point
where the jumps occur. Far away from the jumps, all is good. But at the jumps it is not.
This is called the Gibbs phenomenon, or Gibbs overshoot. As it turns out, it does
not vanish as nmax → ∞. This is illustrated for the square wave in Fig. 1.11, where
I plot essentially a zoom of Fig. 1.7 around one of the discontinuities, but for much
higher values of nmax . As you can see, the oscillations tend to squeeze around the
discontinuities, making the series increasingly better outside the jumps. Exactly at the
jump, however, the general height of the overshoot does not go down. In fact, it can be
shown that even as nmax → ∞, the overshoot remains at about 9% of the value of the
discontinuity. However, as nmax → ∞, it gets infinitely squished and thus occur in just
a single point.
17
���� = � ���� = �� ���� = ���
� � �
� � �
� � �
�(�)
�(�)
�(�)
� � �
-� -� -�
-� -� -�
-� -� -�
-� π -π � π �π -� π -π � π �π -� π -π � π �π
� � �
Structurally, this looks very similar to the square wave in Eq. (1.25). The crucial dif-
ference is that here the sum is over all values of n, while in (1.25) it is restricted to
n = 1, 3, 5, . . .. The results are shown in Fig. 1.12. The Gibbs phenomenon can again
be clearly observed: the Fourier series tend to converge outside the jumps and wig-
gle around at the jump locations. The Gibbs phenomenon is, in fact, universal, being
directly associated with jump discontinuities of the functions.
Here are some additional #protips for dealing with Fourier series. Suppose
the series of a certain function f (x) is given by Eq. (1.13), and let g(x) = α f (x)+
β. Then
∞
αa0 X n o
g(x) = β + + αan cos(2πnx/L) + αbn sin(2πnx/L) . (1.32)
2 n=1
The coefficients of g will thus be ã0 = αa0 + 2β, ãn = αan and b̃n = αbn .
This trick can save you some time, since it allows you to focus on the “core
part” of functions. For instance, suppose you are asked to compute the Fourier
coefficients of π2 x + 42. Forget about π2 . Forget about 42. Focus only on
the Fourier series of the function x. Similarly, suppose you are interested in the
Fourier series of the square wave, but defined to range from 0 to 1. The function
f (x) in Eq. (1.22) ranges from [−1, 1], so we must study g(x) = ( f (x) + 1)/2.
The corresponding Fourier series is then readily computed from Eq. (1.25).
We can also do scales and shifts in the argument x. Suppose g(x) =
f (λx + ω). The Fourier coefficients in this case don’t change. All you need to
do is change the argument in the Fourier series:
∞ ( ! !)
a0 X 2πn 2πn
g(x) = + an cos (λx + ω) + bn sin (λx + ω) . (1.33)
2 n=1 L L
18
You can then find the actual Fourier coefficients of g(x), by expanding the sines
and cosines using trigonometric identities. This will give ãn and b̃n as linear
combinations of an and bn (problem set 1).
or,
eiθ + e−iθ eiθ − e−iθ
cos θ = , sin θ = . (1.35)
2 2i
The basic idea, therefore, is to construct a Fourier series based instead on ei2πnx/L ,
which will be periodic, with period L. As we will see, this yields a very convenient and
elegant formulation. It also allows us to naturally include complex functions f (x) (so
far, we have been assuming that f (x) was real). To this end, notice first that e−i2πnx/L is
not the same as ei2πnx/L , so we will need to consider both n > 0 and n < 0 (that is, we
need n ∈ Z, the set of all integers). The basic idea, therefore, is to expand f (x) as
X
f (x) = cn ei2πnx/L , (1.36)
n∈Z
where cn are a new set of coefficients, which are usually complex. The notation n ∈ Z
means n = 0, ±1, ±2, ±3, . . ..
To find the cn , we need to establish orthogonality relations for the complex expo-
nentials, very much like Eqs. (1.10)-(1.12). In this case it turns out it is even easier to
prove them. Since eiπn = e−iπn for any integer n, it follows that
ZL/2 L/2
L L
i2π(n−m)x/L
e dx = ei2π(n−m)x/L
= (eiπ(n−m) − e−iπ(n−m) ) = 0,
i2π(n − m) i2π(n − m)
−L/2
−L/2
But, as before, this only holds for n , m. Otherwise, one simply has
ZL/2
dx = L
−L/2
ZL/2
1
ei2π(n−m)x/L dx = δn,m . (1.37)
L
−L/2
19
This result is pretty cool. If you open up these complex exponentials, you may check
that it contains the same amount of information as Eqs. (1.10)-(1.12), but in a much
more compact and elegant way.
With this, we can now find the coefficients cn in Eq. (1.36). We multiply both sides
by e−i2πmx/L (with a minus sign!) and integrate from 0 to L. We then get, using (1.37),
ZL/2 X ZL/2 X
−i2πmx/L
f (x)e dx = cn ei2π(n−m)x/L dx = cn Lδm,n = Lcm .
−L/2 n∈Z −L/2 n∈Z
Whence,
ZL/2
1
cn = f (x)e−i2πnx/L dx.
L
−L/2
The coefficients cn are found using the orthogonality of the complex exponen-
tials,
ZL/2
1
ei2π(n−m)x/L dx = δn,m , (1.39)
L
−L/2
which leads to
ZL/2
1
cn = f (x)e−i2πnx/L dx. (1.40)
L
−L/2
Notice that the exponential in cn has the opposite sign as the one in f (x).
As with real series, it is common to choose L = 2π, leading to
X
f (x) = cn einx . (1.41)
n∈Z
Zπ
1
cn = f (x)e−inx dx. (1.42)
2π
−π
Next, let us try to connect the cn with our previous coefficients an , bn in Eq. (1.13).
20
This is very easy. We simply use Eq. (1.34) in Eq. (1.42), which leads to
ZL/2
1 h i
cn = f (x) cos(2πnx/L) − i sin(2πnx/L) dx.
L
−L/2
1
cn = (an − ibn ). (1.43)
2
This result also holds for n = 0 and n < 0. For n = 0 we get c0 = a0 /2, since b0 = 0.
For n < 0 the situation is a bit weird because we normally don’t define an and bn . But
from (1.14) it is clear that a−n = an and from (1.15), b−n = −bn . Thus, when n < 0,
cn = 21 (a−n + ib−n ).
The real and complex versions of Fourier’s series are entirely equivalent ways of
representing a periodic function. The set of functions
1, cos(x), sin(x), cos(2x), sin(2x), cos(3x), sin(3x), . . .
forms a basis for the space of functions with period 2π. This is the basis we use to
expand f (x) in the real series. In the complex case, we use instead
1, eix , e−ix , e2ix , e−2ix , e3ix , e−3ix , . . . ,
which corresponds to a different basis. But the two are equivalent because each pair
(cos nx, sin nx) is a linear combination of (einx , e−inx ).
Eqs. (1.41) and (1.42) hold for complex functions f (x). That is, f : R → C. What
happens if f (x) is real? Taking the complex conjugate of Eq. (1.42) we find that, since
f (x)∗ = f (x),
ZL/2
1
c∗n = f (x)ei2πnx/L dx.
L
−L/2
But looking at this for a second, we realize that it is nothing but c−n . Thus, we conclude
that if f : R → R, one must have
This is a special symmetry of the cn . For general complex functions, cn and c−n are
generally unrelated complex numbers. But if f (x) is real, the negative c’s are related to
the conjugate of the positive c’s.
An even stronger symmetry can be uncovered when f (x) is real and has definite
parity. If f (x) is even, we must have a cosine series (an , 0 and bn = 0), so that
Eq. (1.43) shows cn must be real. Conversely, if f (x) is odd, cn will be purely imaginary.
Summarizing:
• f (x) even: cn = an /2 is real.
• f (x) odd: cn = −ibn /2 is purely imaginary.
21
Example: Triangle wave
Consider the function f (x) = |x|, but defined to be piecewise periodic in [−π, π].
The resulting function is called the Triangle wave. This function is even, so that its
real series should have only cosines (an = 0). As a consequence, from Eq. (1.43) we
should have cn = an /2; i.e., real. One could, of course, simply compute the coefficients
an from (1.14), and then reconstruct cn = an /2. But, for practice, let us compute it
directly from Eq. (1.42):
Zπ
1
cn = |x|e−inx dx
2π
−π
Zπ Z0 !
1
= xe −inx
dx + (−x)e −inx
dx .
2π
0 −π
The complex Fourier series for the triangle wave therefore reads
π 2 ix e3ix + e−3ix
!
f (x) = − e + e−ix + + ... . (1.46)
2 π 9
22
We could rewrite everything in terms of cosines. But I wanted to leave it like this
to emphasize that, even though we are summing complex exponentials, the result is
nonetheless real since each term is accompanied by its complex conjugate.
This property is pretty cool. It is exactly like the inner product you learn for vectors.
When we expand a function in a Fourier series, like f (x) = n cn einx , it is exactly
P
like expanding a vector in a basis; that is to say, the einx form a basis for the space
of periodic functions. Eq. (1.47) is thus nothing but the inner product of the two
functions, which we write as the sum of the products of the components in the basis.
For the space of functions, the inner product is obtained as an integral (together with
the factor of 1/2π). In particular, if g = f Eq. (1.47) yields
Zπ
1 X
| f (x)|2 = |cn |2 , (1.48)
2π n∈Z
−π
which is known as Parseval’s identity. This is just like expressing the absolute value
of a vector in terms of the squares of the coefficients in a basis. It is pretty cool that
this also holds for this (infinite dimensional) space of functions.
Parseval’s identity will be useful when we discuss applications of Fourier series.
The integral on the left-hand side of (1.48) is usually associated with some type of
energy input. The identity then decomposes this energy into the contributions from
each Fourier mode.
To give an example, consider the Fourier coefficients of f (x) = |x| in Eq. (1.45).
The left-hand side of (1.48) reads
Zπ
1 π2
|x|2 = .
2π 3
−π
Since the summand is even in n, we can write the sum only over positive values and
double the result:
X π2 8 X 1
|cn |2 = + 2 .
n∈Z
4 π n=1,3,5... n4
23
Using Eq. (1.48) and isolating the remaining sum over n, we then find
X 1 π4
4
= , (1.49)
n=1,3,5...
n 96
which is a funny looking sum over n, with an even funnier looking result.
We can also write Parseval’s identity in terms of an and bn , using (1.43). We split
the sum in 3 parts: n > 0, n = 0 and n < 0. For n > 0 we simply replace cn = 12 (an −ibn ).
For n = 0 we use c0 = a0 /2. And for n < 0 we use cn = 21 (a−n + ib−n ). Thus
Zπ ∞ −1
1 |a0 |2 X 1 2 X 1
| f (x)|2 = + |an | + |bn |2 + |a−n |2 + |b−n |2 .
2π 4 n=1
4 n=−∞
4
−π
which is the sine-cosine analog of (1.48). All results above were derived for period 2π.
But they hold for arbitrary L. All we need to do is replace 2π → L.
Parseval’s identity
ZL/2 ∞
1 X |a0 |2 1 X 2
| f (x)|2 dx = |cn |2 = + |an | + |bn |2 . (1.51)
L n∈Z
4 2 n=1
−L/2
A boring calculation reveals that the integral on the left reads 8π6 /105. Thus, the sum
over n on the right reads
∞
X 1 π6
= . (1.52)
n=1
n6 945
This is a special case of the Riemann Zeta function
∞
X 1
ζ(s) = s
(1.53)
n=1
n
This function usually cannot be represented in terms of ordinary constants. But for
some special values, like 2, 4, 6, . . ., it can.
24
�
�
��
���
�
�(�)
���
�
�
��
�
�
-��� -��� ��� ��� ���
�
where a > 0 is a parameter. I chose it in this way so that it has unit area:
Z∞
f (x)dx = 1. (1.55)
−∞
The integral doesn’t have to be from −∞ to ∞. It can be over any interval which
encompass [−a/2, a/2]. Written in this way, we therefore see that, as a gets smaller,
the function becomes thinner, but also taller. And it does so in a very precise way, such
that the area below the curve is always 1.
The reason why the boxcar can be viewed as a window is the following. Let g(x)
be another arbitrary function and consider the integral
Z∞ Za/2
1
f (x)g(x)dx = g(x)dx. (1.56)
a
−∞ −a/2
It picks out only the parts of g(x) that are within [−a/2, a/2]. In fact, the result can be
interpreted as the average of g(x) over this interval. It is a window because it selects
only one part of the function for us to see.
We can now think about what happens in the limit a → 0. This limit is somewhat
funny. The function becomes infinitely thin, but also infinitely tall. And the area below
25
the curve continues to be 1. We call this the Dirac delta function:
δ(x) = lim f (x) (1.57)
a→0
The δ function takes the idea of a window to the limit: it throws away everything about
g(x), and only picks the value of g at x = 0. One can also place the window at a
different point y. I will leave it for you to convince yourself that this is accomplished
by δ(x − y). That is,
Z∞
δ(x − y)g(x)dx = g(y). (1.59)
−∞
Now let’s compute the Fourier series of the Dirac delta function. We assume that
the function is piecewise periodic, with period 2π. In this case, we don’t get one δ
function, but a bunch of δ’s. This is called a Dirac comb (Fig. 1.14). We focus on
the complex series (1.41). The coefficients cn are given by Eq. (1.42). But because
of (1.59), it then follows that
Zπ
1 1
cn = δ(x)e−inx dx =
2π 2π
−π
26
���
���
���
�(�)
���
���
���
���
-�� -�� -� � � �� ��
�
Figure 1.14: The Dirac comb, obtained by making the Dirac delta function piecewise periodic,
with period 2π.
We therefore reach a quite curious result: the Fourier series for the Dirac comb is
independent of n. One simply has
1 X inx
δ(x) = e , (1.60)
2π n∈Z
for period 2π. This result will turn out to be very very useful later on. You may
naturally object that this series does not converge (we will talk more about convergence
of Fourier series in Sec. 1.8). This is one of those pathologies of distributions. But this
can be regularized by computing the Fourier series of the boxcar instead. You will do
this in problem set 1. What you will find is that the series always converges, for any
finite a. Since the δ-function is just a boxcar, with a tiny a, we can convince ourselves
that the series converges.
Another very important function for this course will be the Heaviside θ function
defined as
1 x>0
θ(x) = 1/2 x = 0
(1.61)
x < 0,
0
The value at x = 0 seldom matters. The Heaviside function is therefore a step, being
zero for x < 0, and 1 for x > 0. We can also shift the step. For instance θ(x − b) will
jump from 0 to 1 when x = b. The Heaviside function can also be used to construct
the boxcar (1.54) in a more compact way. We need two steps, one up, one down. A
general boxcar between the interval [a, b] will therefore read
Boxcara,b (x) = θ(x − a)θ(b − x) (1.62)
so that the boxcar used in Eq. (1.54) would be
1
f (x) = θ(x − a/2)θ(a/2 − x).
a
Funny enough, because of the special structure of θ(x), Eq. (1.62) is not the only way
to write the boxcar. A completely equivalent expression would be
Boxcara,b (x) = θ(x − a) − θ(x − b).
27
The first θ ensures that the function is only non-zero for x > a. But then we want to cut
it out eventually at b, so we subtract another θ.
What is the derivative of the Heaviside function? Well, for x < 0 and x > 0 the
function is flat, so θ0 (x) = 0. The tricky part is at x = 0. At this point the function jumps
instantaneously from 0 to 1. The slope of the curve should therefore be infinitely large.
It is easier to think that θ(x) changes quickly, but continuously, at x = 0, as represented
pictorially in Fig. 1.15(a). The typical slope will then look like that in Fig. 1.15(b). This
seem to imply that θ0 (x) will be proportional to the Dirac delta function, θ0 (x) = αδ(x),
for some constant α. To determine α we recall that
Z Z
1 1h i 1 1
1= δ(x) = θ0 (x)dx = θ() − θ(−) = (1 − 0) = .
α α α α
− −
are the same. Here → 0− means tending to zero from the left ( < 0) and 0+ means
tending to zero from the right ( > 0). Conversely, a function is piecewise continuous if
it is continuous in most of the interval [a, b], except at a finite number of points where
the jumps f (x +0) and f (x −0) are finite. So, in a nutshell, piecewise continuous means
the function never diverges and the number of discontinuities is not infinite.
Let us suppose that f (x) is continuous, except at one point x0 , where it jumps from
f (x − 0) to f (x + 0). We can represent this using the Heaviside function. All we need
is to do is multiply f (x) by a convenient “1”. In this case, 1 = θ(x − x0 ) + θ(x0 − x).
We then get
f (x) = f (x)θ(x0 − x) + f (x)θ(x − x0 ), (1.64)
For x < x0 and x > x0 the function f (x) is now continuous and silky smooth. Differen-
tiating with respect to x and using (1.63), we then get
where I also used the fact that δ(x0 − x) = δ(x − x0 ). The last two terms are only non-
zero when x → x0 . But in each term, it will tend to zero from a different side. Thus, in
the third term we can replace f (x) with f (x0 − 0) and in the fourth we can replace f (x)
by f (x0 + 0). We then arrive at
28
��� ���
��� ���
��� ���
θ(�)
θ(�)
��� ���
��� ���
��� ���
-��� -��� ��� ��� ��� -��� -��� ��� ��� ���
� �
Figure 1.15: Left: the Heaviside function, imagined as a sharp jump at x = 0. Right: θ0 (x),
which is zero almost everywhere, except at x = 0, where it is very large.
The first two terms are the usual derivatives of f (x), outside the jump point, while the
last represents the infinite contribution from the jump.
The generalization of this to an arbitrary number of jumps is straightforward. We
assume the jumps occur at J points {x j }. Then
J+1
X J
X
f 0 (x) = f 0 (x)θ(x − x j−1 )θ(x j − x) + ∆ f (x j )δ(x − x j ). (1.66)
j=1 j=1
In addition to the J points x j , I also defined x0 , which is the beginning of the interval
we are interested in studying, and x J+1 , which is the end. They could be, for instance,
±π if the function is periodic or ±∞ if it is not. We will use this formula in the next
section to prove the convergence of Fourier series.
29
There is one test of uniform convergence, which is the bread and butter of the
business, and is worth knowing. It is called the Weirstrass-M test. Let nm=1 um (x)
P
denote a series of functions, defined in some interval I ∈ R. Suppose that there exist
constants Mn , such that
|un (x)| < Mn , ∀x ∈ I, (1.67)
P∞ P∞
and such that n=1 Mn converges. Then n=1 un (x) converges uniformly. The idea,
therefore, is to majorize the series of functions by a numerical series (no x involved),
which itself converges.
The nice thing about uniformly convergent series is that they comply with
our intuition. In particular, the operations we would naively want to do, are all
perfectly allowed: Let f (x) = ∞
P
n=1 un (x) denote a uniformly convergent series.
Then
• If the un (x) are all continuous, so will f (x).
• If the un (x) are integrable, so will f (x). Moreover, we can permute sums
and integrals:
Z X ∞ ! X ∞ Z
un (x) dx = un (x)dx.
n=1 n=1
• If the un (x) are differentiable, then so will f (x). Moreover, we can differ-
entiate term by term:
∞ ! X ∞
d X dun
un (x) = .
dx n=1 n=1
dx
You see? Uniformly convergent series of functions are awesome! This is the
kind of series you want to work with. Of course, you will not be surprised
to hear that Fourier series of piecewise continuous functions converge uni-
formly. This is Fourier’s theorem, which we are now going to prove.
We are going to go through the proof of Fourier’s theorem for the case of contin-
uous functions. The case of general, piecewise continuous functions, can be found in
any of the textbooks in the bibliography (e.g. Djairo, chapter 3). Although not the most
general, what I like about the proof below is that it will also give us insight into how
fast the Fourier series converges. The main ingredients we will need are the derivatives
of f (x). We begin by noticing that, even if f (x) is continuous, f 0 (x) may be piecewise
continuous. An example is shown in Fig. 1.16. A function which is continuous and
whose k-th derivatives are also continuous is said to belong to the class C k . The func-
tion in Fig. 1.16 would thus be of class C 0 , since none of its derivatives are continuous.
When the function is of class C k , f (k+1) will no longer be continuous, but piecewise
continuous. Consequently, f (k+2) will involve δ-functions, like Eq. (1.66). And f (k+3)
30
���� ���
���� ���
���� ���
��/��
�(�)
-���
����
-���
����
-���
���� -���
��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ���
� �
will involve derivatives of δ functions, at which point things start to become messy.
We begin by considering the Fourier coefficients cn [Eq. (1.42)] of a function of
class C k . Integrating by parts, using u = f (x) and dv = e−inx dx, we get
π Zπ
e−inx
( )
1 1
cn = f (x) + f (x)e dx .
0 −inx
2π −in −π in
−π
The first term vanishes because e−inπ = einπ and f (−π) = f (π) (since f (x) is periodic).
We are thus left with
Zπ
1
cn = f 0 (x)e−inx dx.
2πin
−π
Notice what happened. Because of the cross term vanishes, integration by parts simply
amounts to replacing f (x) → f 0 (x) and dividing by in. The function f 0 (x) is still
periodic, so we can now repeat the procedure. In fact, we can repeat it k + 1 times,
leading to
Zπ
1
cn = f (k+1) (x)e−inx dx. (1.68)
2π(in)k+1
−π
We can do this k+1 times because f (k+1) is piecewise continuous, but finite; the integral
is thus still bounded.
We keep going, however, and integrate by parts once again. Since f (k+1) is piece-
wise continuous, it will have jumps at a certain set of points {x j }. Using (1.66) we then
get
xj
J Z J
1 X 1 X
cn = f (k+2)
(x)e −inx
dx + ∆ f (k+1) (x j )e−inx j (1.69)
2π(in)k+2 j=0 2π(in)k+2 j=1
−x j−i
31
Since we broke up f (k+2) into the intervals [x j−1 , x j ], the integrals above are all bounded.
And the same is true of the second term.
Thus, we see that
M
|cn | 6 k+2 , (1.70)
n
where M is a finite number. And, what is most important, M is actually independent of
n; the only n-dependence comes from the factor of 1/nk+2 . You may protest and argue
that there is still an n in the e−inx term of the integral. But that can be eliminated further
using the Cauchy-Schwarz inequality
x 2
Z j Zx j Zx j Zx j
(k+2) −inx (k+2) 2 −inx 2
f (x)e dx 6 |f (x)| dx |e | dx 6 | f (k+2) (x)|2 dx
−x j−i
−x j−i −x j−i −x j−i
Thus, we get in the end a bound of the form (1.70), with M being a constant that
depends only on f (x).
To finish of with style, we now use the Weirstrass-M test. We consider the Fourier
series X
f (x) = cn e−inx
n∈Z
−inx
and notice that the summands cn e are bounded by
M
|cn e−inx | 6 := Mn .
nk+2
P
According to the test, all that is left is to ask whether the sum n Mn converges. Well,
it is known that n=1 n converges for all α > 1. Thus, n Mn will converge provided
P∞ −α P
k > 0. That is, the series is guaranteed to converge provided the function is at least of
class C 0 ; i.e., continuous. Et voilà!
The convergence of the series ∞ −α
is faster for large α. If α = 100, for instance,
P
n=1 n
the series should converge super fast. The above arguments therefore show that the
convergence of the Fourier series is associated with how smooth it is. A function that
is in a class C k , with very large k, will be very smooth and will thus converge faster
than a function of class C 0 , which has kinks and stuff.
The Fourier coefficients of f 0 (x) is thus incn . This can be used as a trick to obtain one
set of Fourier coefficients from another.
32
Similarly, we can also integrate term by term. The integral of a periodic function,
however, is not necessarily periodic. Thus, consider instead the function
Zx h i
F(x) = f (x0 ) − c0 dx0 (1.72)
0
By subtracting c0 = a0 /2, which is the average of the function over the interval, we
make F(x) periodic. I leave it for you to check that this is indeed true.
What are the Fourier coefficients Cn of F(x)? The Fourier series of f (x) − c0 is just
like that of f (x), except that the sum does not contain n = 0. Using this in (1.72) then
leads to
x
XZ X einx − 1
F(x) = cn einx dx = cn
n,0 n,0
in
0
We assume f (x) is real, so that c−n = c∗n . This result already has the form of a Fourier
series. The coefficients Cn , with n , 0, are simply the coefficients that multiply einx ;
that is, Cn = cn /in. The only special case is for n = 0. The coefficients C0 is the
constant part of F(x). It will therefore correspond to the entire sum in the last term
above: X cn
C0 = − .
n,0
in
Let’s summarize these results in a pretty big box:
Let f (x) be a periodic function with Fourier coefficients cn [Eq. (1.41)]. Then
• The Fourier coefficients of f 0 (x) are incn ;
R x
• The Fourier coefficients of F(x) = 0 f (x) − c0 dx are cn /in, for n , 0,
and i n,0 cn /n, for n = 0.
P
The intuition should thus be clear: differentiate and you get in. Integrate and
you get 1/in (except for the annoying term with n = 0).
The above results also resonate well with the calculations of the previous section,
on the speed of convergence of a Fourier series. Integration makes functions smoother
and therefore converge faster. This is evidenced by the fact that integration multiplies
the Fourier coefficients by 1/n. Derivatives, on the other hand, makes things more
irregular and the corresponding series converges more slowly.
33
arbitrary real number. As a side effect of that calculation, I also asked you to show that
∞
1 1 X 2x
cot(πx) − = . (1.73)
πx π n=1 x2 − n2
Let us assume x ∈ [0, 1]. Since the series converges uniformly, we can integrate each
side, term by term. One may readily check that
Z
1
cot(πx)dx = ln sin(πx),
π
Whence,
Zx h x
1 i 0 1 sin(πx0 ) 1 sin(πx)
cot(πx ) − 0 dx = ln
0
= ln ,
πx π x 0
0
π πx
0
where I also used the fact that the limit of lim x→0 sin(πx)/πx = 1.
On the other hand, to compute the integral of the right-hand side of (1.73), we
change variables to y = α2 − n2 , leading to
Zx x
2x0
dx 0
= ln(x 02
− n 2
) = ln(x − n ) − ln(−n ) = ln(1 − x /n ).
2 2 2 2 2
x02 − n2 0
0
Finally, we get rid of the logarithm by exponentiating both sides. But to do that, we
must first change the sum on the right-hand side into a product using ln(a) + ln(b) =
ln(ab). That is, we rewrite this as
"Y∞ #
sin(πx)
ln = ln (1 − x /n ) .
2 2
πx n=1
Exponentiating on both sides finally yields the infinite product expansion of the sine
function
∞
sin(πx) Y
= (1 − x2 /n2 ) (1.74)
πx n=1
= 1 − x2 1 − x2 /4 1 − x2 /9 1 − x2 /16 . . .
This is pretty fun. We are used to expressing sines as a Taylor series expansion. But it
can also be expressed as a product.
One can also derive a similar formula for the cosine. Of course, cos(πx) = sin(πx +
π/2), so we could simply shift x in the product above. But the resulting formula is not
34
so fun. A nicer form can be found starting with the identity sin(2θ) = 2 sin θ cos θ and
writing
sin(2πx)
cos(πx) =
2 sin πx
Using Eq. (1.74) for both the numerator and denominator we then find
∞
(1 − (2x)2 /n2 )
Q
n=1
cos(πx) = ∞
(1.75)
x2 /n2 )
Q
(1 −
n=1
We are almost done. All we need to do is realize that the two products are actually
related, so some terms will cancel out. Opening up the first few terms in the product
on the numerator, we find
∞
Y 4x2 x2 4x2 x2
(1 − (2x)2 /n2 ) = 1 − 4x2 1 − x2 1 − 1− 1− 1− ....
n=1
9 4 25 9
Hey look! We see in every other term, exactly the product (1 − x2 /n2 ). The other terms
(those that have 4x2 in them) are like 1 − 4x2 /(2n − 1)2 , for n = 1, 2, 3, . . .. Thus, we
conclude that
Y∞ ∞ "
Y #
(1 − 4x2 /n2 ) = (1 − x2 /n2 )(1 − 4x2 /(2n − 1)2 ) .
n=1 n=1
The first product will cancel out the denominator in Eq. (1.75), leaving us with
∞
x2
Y !
cos(πx) = 1− .
n=1
(n − 1/2)2
which is pretty neat. We now summarize the results and conclude the chapter.
35
Chapter 2
2.1 Overview
Most laws of physics are expressed in the form of differential equations. A differen-
tial equation is any type of equation involving derivatives. For instance, let N(t) denote
the number of atoms present in a radioactive sample. At each step ∆t, the number of
atoms that decay must be proportional to the number of atoms present. So the rate of
change dN/dt must depend on N. Something like
dN
= −λN, (2.1)
dt
where λ (units of 1/second) is the decay rate; λ∆t represents the probability that an atom
decays in a small time interval ∆t. This is called an ordinary differential equation
(ODE) because it involves only one variable (here t). This is to be contrasted with
partial differential equations (PDEs), which contain partial derivatives over several
variables. In this chapter we focus only on ODEs. PDEs will be studied next.
Another example of a differential equation is Newton’s second law, d2 r/dt2 = F .
It is a differential equation because it relates the second derivative of position with the
force acting on the particle. In this chapter we will be particularly interested in the
damped harmonic oscillator, which is described by the differential equation
d2 y dy
m 2
= −α − ky, (2.2)
dt dt
where y is the position, m is the mass, k is the spring constant and α is the damping
coefficient. This equation mixes y with its first and second derivatives. It is therefore
called a 2nd order ODE. Eq. (2.1), on the other hand, is a 1st order ODE, because it
contains only first derivatives.
We will also play with electric circuits. A simple series circuit containing a resis-
tor R, a capacitor C, an inductance L and a voltage source V(t), is described by the
differential equation
d2 Q dQ Q
L 2 +R + = V(t), (2.3)
dt dt C
36
where Q(t) is the charge in the circuit. This equation is in general of 2nd order. But
sometimes it becomes of 1st order if an element is missing. For instance, if there is no
inductance present (the “RC circuit”) we get
dQ Q
R + = V(t). (2.4)
dt C
Conversely, if the capacitance is infinite (the “LR circuit”), we get LQ̈ + RQ̇ = V(t).
This equation still looks 2nd order. But we can instead work with the current I(t) = Q̇,
in which case we get
dI
L + RI = V(t). (2.5)
dt
It is important to note how (2.4) and (2.5) are mathematically equivalent, even though
they describe different physical systems.
Another example of ODE we are going to study is Schrödinger’s equation, describ-
ing a one-dimensional quantum particle subject to a position dependent potential V(x).
The ODE reads
~2 d2 ψ
− + V(x)ψ = Eψ, (2.6)
2m dx2
where ~ is Planck’s constant, m is the mass and E is the energy. Unlike the previous
examples, here the independent variable is x, not t. In this chapter we will go back
and forth between x and t. The dependent variable, on the other hand, is ψ(x), which
is called the wavefunction. I will try to explain a bit better what it represents once we
play with some examples of (2.6).
Eqs. (2.1)-(2.6) are all examples of linear ODEs, because they depend linearly on
the dependent variable, N, or Q or I or ψ. Here are some examples of non-linear ODEs:
y0 + xy2 = 1, ẏ = cot(y),
dy
y = 1, ẏ2 = y.
dx
Here I used y to denote the dependent variable. This is the notation we are going
to use whenever we talk about generic equations. In the formulas above I also used
different notations for derivatives: y0 or ẏ or dy/dx or dy/dt. I know it may seem a bit
messy, but all of these notations are used in physics, so we better get used to them. The
examples above represent non-linear ODEs, because they involve non-linear functions
of the dependent variable y and/or its derivatives. Non-linear equations are dramatically
more complicated to deal with than linear ones. But, lucky for us, the vast majority of
physical laws are linear.
Another ODE which will be quite important later on is Legendre’s equation,
which appear often in quantum mechanics and electromagnetism. It reads
where n are integers. This equation is linear because y, y0 , y00 only appears linearly.
However, if we contrast it with (2.1) or (2.2), it certainly seem more complicated. The
reason is because those examples had constant coefficients, whereas (2.7) does not.
37
For instance, the coefficient multiplying y00 in (2.7) is (1 − x2 ), which is a function of
the independent variable x. Conversely, in (2.2) the coefficient is just a constant, m.
ODEs with constant coefficients are much easier to deal with. But variable coefficients
are manageable as well, and we will work through some examples.
The entire picture of ODEs becomes a bit cleaner if we work with differential op-
erators. We call an object such as
n
X dj
L= u j (x) , (2.8)
j=0
dx j
a n-th order, linear ordinary differential operator. Here is what each word means:
• This is an operator because it acts on functions to produce new functions.
• It is linear because derivatives are linear, so
L a1 y1 (x) + a2 y2 (x) = a1 L(y1 ) + a2 L(y2 ),
• The coefficients of L are the functions u j (x). In the particular case when the u j
are independent of x, we say this is a differential operator with constant coeffi-
cients.
The differential operator associated with the damped harmonic oscillator (2.2) reads
d2 d
L=m + α + k, (2.9)
dt2 dt
Eq. (2.2) is then written as
Ly = 0. (2.10)
When the right-hand side is zero, we call the equation homogeneous. Conversely, an
inhomogeneous ODE has the form
Ly = f (x), (2.11)
or f (t), if you are using t as independent variable. Inhomogeneous equations are usually
associated to an external force. For instance, if we apply an external force F(t) to the
damped oscillator (2.2), we get instead
which is of the form (2.11). The same is also true for the RLC circuit ODE (2.3). In
this case the inhomogeneity is the external voltage V(t).
38
2.2 Separable equations
Before we enter into the more formal aspects on how to characterize the general
solutions of an ODE, let us practice with the simplest example possible. Consider a
differential equation of the form
dy
= f (x).
dx
This equation is called separable because we can put all y’s on the left and all x’s on
the right,
dy = f (x)dx.
Integrating both sides we then find
Z
y(x) = f (x)dx + c, (2.13)
where c is a constant. We have just solved the differential equation. Curiously, you see
that whenever we integrate a function, we are actually solving a differential equation.
We could also have written the result as a definite integral. That is, we integrate dy =
f (x)dx from x0 to x, leading to
Zx
y(x) = y(x0 ) + f (x0 )dx0 . (2.14)
x0
condition. For instance, if at t = 0 we had N(0) = N0 then C = N0 and thus the solution
is finally
N(t) = N0 e−λt ,
39
which represents an exponential decay, from N0 at t = 0, toward 0 as t → ∞. It is
important to notice, however, that we did not have to specify N(t) exactly at time t = 0.
We could have specified N(t) at time t = 42. The general solution is still given by
Eq. (2.15).
y0 sin x = y ln y.
Please take notice of where the constant is. It is not true to say that the solution is
y = C̃etan(x/2) since this would be the same as y = eln(C)+tan(x/2) . Where we put the
constant is very important.
We first consider the homogeneous equation L(y) = 0. A solution of this ODE is any
function y(x) satisfying Ly = 0. Since L is a linear operator, if y1 and y2 are solutions,
then any linear combination is also a solution:
Lyi = 0 → L a1 y1 + a2 y2 = 0.
40
This is very important since it allows us to build general solutions as linear combina-
tions of specific solutions. A set of solutions y1 , . . . , yk is said to be linearly indepen-
dent if, for all x, the only set of numbers satisfying
c1 y1 + . . . ck yk = 0,
is the trivial set c1 = . . . ck = 0. As we will prove below, homogeneous ODEs of order
n have n linearly independent solutions.
where c j are coefficients (that can be adjusted, for instance, from the initial
conditions).
This must be true for any two solutions of Ly(x) = f (x). So suppose you found one
particular solution yi2 . This result then guarantees that any other solution can always
be written as the particular solution yi2 , plus a linear combination of the solution to the
homogeneous equations.
41
Linear inhomogeneous ODEs
Let y p (x) denote any particular solution of the inhomogeneous equation Ly(x) =
f (x). Then the most general solution will be of the form
n
X
y(x) = y p (x) + c j y j (x), (2.18)
j=1
where c j are coefficients and y j are the solutions of the homogeneous equation
Ly j = 0.
This result is extremely powerful. And it is really cool how it follows from such a
simple reasoning. Notice also how, in order to solve Ly = f , one must first know the n
linearly independent solutions of Ly = 0.
Example: consider the ODE y00 + 5y0 + 4y = 2. We already saw that e−x and e−4x are
solutions of the homogeneous equation. So now we only need one particular solution
of the inhomogeneous equation. A very simple choice is y p = 1/2. Thus, the most
general solution will have the form
1
y(x) = + c1 e−x + c2 e−4x .
2
Notice that I called this “a particular solution” instead of “the particular solution”. The
reason is because any particular solution works. For instance, y = 21 + e−x is also a
particular solution, so we could have equally well have written the general solution as
1
y(x) = + e−x + c1 e−x + c2 e−4x .
2
But, as can you see, this is just an unnecessary complication: since the ci are constants
anyway, we can just absorb the 2nd term in the 3rd and call it a new constant c̃1 .
Next, consider the ODE y00 + 5y0 + 4y = 2x. This has the same differential operator,
but the inhomogeneous term is different. A particular solution, which one may verify, is
y p = (4x − 5)/8 (once again, we will learn how to derive these in due time, I promise!).
Thus, the most general solution is
4x − 5
y(x) = + c1 e−x + c2 e−4x .
8
42
necessarily equal to n). We define the Wronskian matrix as the m × m matrix with
entries
y1 y2 ... ym
y0
1 y 0
2 . . . y 0
m
Y(x) = . .
. . . (2.19)
.. .. .. ..
(m−1) (m−1)
y1 y2 . . . ym
(m−1)
It turns out this matrix can be used to test if the functions y j are linearly independent
or not. The reason is that linear independence implies there exist a set {c j } such that
c1 y1 + . . . + cm ym = 0. (2.20)
Differentiating this once yields
c1 y01 + . . . + cm y0m = 0,
and twice,
c1 y001 + . . . + cm y00m = 0,
and so on for any order of the derivative. To connect these equalities with the matrix
Y(x), we now define a vector c = (c1 , . . . , cm ). We can then compact all equalities into
a single matrix-vector multiplication
Yc = 0.
Try out an example to make sure this makes sense. For instance, Eq. (2.20) is the first
line of Y(x)c, and so on.
From linear algebra we know that Yc = 0 if and only if the matrix Y has zero deter-
minant, |Y| = 0. Thus, the functions are linearly dependent if |Y| = 0. The determinant
of the Wronskian matrix is often called simply the Wronskian (which sounds like the
name of a character in a Tarantino movie):
W(x) = |Y(x)| (2.21)
This provides a quick test to see if a set of functions are linearly dependent or not.
Given a set of functions y1 (x), . . . , ym (x), we define the Wronskian (or Wron-
ski determinant) as
...
y1 y2 ym
...
y01 y02 y0m
W(x) = . .. .. .. .
(2.22)
.. . . .
(m−1) (m−1)
y1 y2 . . . y(m−1)
m
43
independent.
We use the Wronskian to show that a n-th order linear operator has n linearly inde-
pendent solutions. Let’s do it for n = 2. The generalization to n > 2 is straightforward.
We can always parametrize a 2nd order linear ODE in the form
y00 + P(x)y0 + Q(x)y = 0, (2.23)
where P(x) and Q(x) are generic functions of x. Suppose we found three solutions, y1 ,
y2 and y3 . Construct the Wronskian
y1 y2 y3
W(x) = y01 y02 y03 .
y001 y002 y003
44
Example: Consider the ODE y00 +5y0 +4y = 2x+2. As we saw in the previous section,
a particular solution for f1 = 2x is y p1 = (4x − 5)/8 and a particular solution for f2 = 2
is y p2 = 1/2. Thus a particular solution for f = 2x + 2 will be y p = (4x − 5)/8 + 1/2.
Next, consider the inhomogeneous ODE, but with f (x) being a Dirac delta:
Ly = δ(x − x0 ). (2.24)
You can imagine this as an external perturbation that acts punctually only on a specific
point x0 . A particular solution of this equation is called the Green’s function of L:
We write it as G(x, x0 ) because it depends on two parameters, the variable x, and the
position of the drive x0 . Note, however, that L is a differential operator acting only on
x, not x0 . This is why I put the subscript L x , just to be clear.
Green’s functions are not necessarily easy to find. But once we find them, they are
incredibly useful since they work as building blocks to study other types of inhomo-
geneities. This is associated to the window property of the Dirac δ. Consider a general
function f (x) and write it as
Z∞
f (x) = δ(x − x0 ) f (x0 )dx0 . (2.26)
−∞
Green’s functions are therefore building blocks. The choices for f (x) that may appear
on Ly = f (x) are endless. But if we solve it for just a single drive (the δ-drive), then
we can build up the solution for any other drive. Pretty powerful, eh?
Green’s functions are a big business in physics and we will talk more about them
as the course progresses. Here I just wanted to introduce them to you, so you could say
hello.
45
where P(x) and Q(x) are arbitrary functions of x. The radioactive decay equation (2.1)
and the RC and LR circuits in Eqs. (2.4) and (2.5) are all of this form. But those
equations are actually simpler since they have constant coefficients, while (2.29) has
arbitrary coefficients.
From what we learned in Sec. 2.3, the general solution will have the form y = y p +
cyh , where y p is a particular solution of (2.29) and yh is the solution to the homogeneous
equation, y0 + Py = 0 (we only need one solution because the ODE is first order). The
homogeneous equation is easy to solve since it is separable (Sec. 2.2):
Z
dy
= −P(x)dx → ln y = − P(x)dx + const.
y
It is convenient to define Z
I(x) = P(x)dx. (2.30)
Next we turn to the inhomogeneous equation (2.29). We will actually find not only
the particular solution, but the general one. We do this using the method of integrating
factors. The idea is as follows. From (2.30) we have that I 0 (x) = P(x). We can then
use this to rewrite the left-hand side of (2.29) as
d
y0 + Py = e−I (yeI ).
dx
Please take a second to check that this is indeed true. We call eI(x) an integrating factor
because it transformed the differential operator L = dxd
+ P into something like dx d
(. . .).
Eq. (2.29) can then be rewritten as
d d
e−I (yeI ) = Q → (yeI ) = QeI . (2.32)
dx dx
Now it is easy to integrate on both sides, leading to
Z
ye =
I
Q(x)eI(x) dx + c.
Moving the e−I(x) to the other side then finally yields our general solution.
where I(x) = P(x)dx and c is a constant. Please be careful not to put the e−I(x)
R
inside the integral. Alternatively, we can also integrate Eq. (2.32) as a definite
46
integral, which leads to
Zx
0
y(x) = y(x0 )e −I(x)+I(x0 )
+e −I(x)
Q(x0 )eI(x ) dx0 . (2.34)
x0
Example. Consider the ODE y0 + 2y = e−x . This is in the form R(2.29), with P = 2
and Q = e−x . The integrating factor (2.30) therefore reads I(x) = 2dx = 2x and so
Eq. (2.33) becomes
Z
y(x) = ce−2x + e−2x e−x e2x dx = ce−2x + e−2x e x
Example. Consider the ODE x2 y0 + 3xy = 1. Dividing both sides by x2 reveals that
this has the form (2.29), with P(x) = 3/x and Q(x) = 1/x2 . The integrating factor (2.30)
is Z
3
I(x) = dx = 3 ln x.
x
Thus eI = x3 and e−I = 1/x3 . Eq. (2.33) then yields
Z 3
c 1 x c 1 x2
y= 3 + 3 2
dx = 3 + 3
x x x x x 2
Thus, the general solution is
c 1
3
y=
+ .
x 2x
A very common situation is when P(x) is independent of x. That is, when the ODE
has the form
ẏ + λy = Q(t), (2.35)
where λ is a constant. Here I changed from x to t because, as we will see below, most
applications involve time as the independent variable. The integrating factor in this
case is I(t) = λdt = λt. Thus, the general solution (2.33) simplifies to
R
Z
y(t) = ce + e
−λt −λt
Q(t)eλt dt, (2.36)
Zt
Q(t0 )eλt dt0 .
0
y(t) = y(t0 )e−λ(t−t0 ) + e−λt (2.37)
t0
47
In most cases of interest, one has λ > 0. The reason is because, as we can see in the
first term, this ensures that the dynamics is stable. If λ < 0 the first term would grow
unboundedly with time and eventually explode (kabuum).
An immediate application of this result is to the LR circuit described by Eq. (2.5)
(or, similarly, to the RC circuit). Dividing by L on both sides we get that the current
I(t) will evolve according to
R V(t)
I˙ + I = .
L L
This has the same form as (2.35), with λ = R/L and Q(t) = V(t)/L. Whence, the
general solution will be
e−λt
Z
I(t) = ce−λt + V(t)eλt dt, (2.38)
L
with λ = R/L. You can view the RL circuit as a kind of black box, that processes the
input V(t) into an output I(t), according to this expression. This is a fun example to
work with, because we can just go to the lab and apply all sorts of weird electric signals
V(t). Eq. (2.38) then specifies how the circuit will respond. We will discuss this kind
of game further in the next section.
Green’s function
Referring to the constant P case in Eq. (2.35), let us study the Green’s function.
That is, we study the system response to a δ impulse at a specific time s:
It is helpful to keep a physical image of what is happening. The δ-function plays the
role of a kick. It is an infinitely strong, but infinitely short kick. The picture, therefore,
is that the system was doing whatever for t < s; we then kick it at s and examine how
it evolves when t > s.
In this case, it is more convenient to use the definite-integral solution (2.37), which
becomes
Zt
δ(t0 − s)eλt dt0
0
y(t) = y(t0 )e −λ(t−t0 )
+e−λt
(2.40)
t0
Zb
f (s) if t ∈ [a, b],
δ(t − s) f (t)dt =
0
a
otherwise.
That is, the δ-function will only return something useful if the integration interval con-
tains the δ-peak at s.
It matters whether we specify the initial condition for t0 before or after the kick.
Physically, it is a bit weird to specify it after the kick (although mathematically, there
is nothing wrong with that). Thus, we usually assume that t0 < s. In this case, if t < s,
48
the integration interval [t0 , t] will not contain s and the integral in (2.40) will vanish
identically, leading to y(t) = y(t0 )e−λ(t−t0 ) . Conversely, for t > s the interval [t0 , t] will
contain the δ and we get instead y(t) = y(t0 )e−λ(t−t0 ) +e−λ(t−s) . Thus, the general solution,
assuming t0 < s is
y(t0 )e
−λ(t−t0 )
, t < s,
y(t) = (2.41)
y(t )e−λ(t−t0 ) + e−λ(t−s) t > s.
0
We often don’t worry too much about y(t0 ). We just assume that before the kick the
system was standing still (y(t0 ) = 0). Alternatively, we also imagine that t0 = −∞;
that is, the initial condition happened long long ago. The first term would then vanish
because of the exponential. In any case, we usually focus on
0,
t < s,
y(t) =
e−λ(t−s) t > s.
We thus see that, because of the structure of the ODE, if at any point t0 < s the system
was in y(t0 ) = 0, then it must have been at zero for all t < s. We can neatly summarize
the above result using the Heaviside function.
Green’s function
This is the Green’s function associated to the differential operator L = (∂t + λ).
That is why I wrote G, instead of y. The Heaviside function clearly shows
that the kick happened at time s. Before that the system was standing still
and afterwards it relaxes exponentially. Usually, we write the Green’s function
in terms of two parameters, G(t, s), one being the independent variable and
the other the position of the kick. But in this case it turns out that the result
depends only on their difference, t − s. This happens whenever the coefficients
in the differential operator are constant. I recommend you have a look at the
Wikipedia page for Green’s functions. There is a nice table listing the Green’s
function’s associated to a bunch of differential operators.
Once we have the Green’s function, we can now use it as a building block to gener-
ate solutions of ẏ + λy = Q(t), for arbitrary Q(t). As we saw in Eq. (2.27), a particular
solution in this case will be
Z∞
y p (t) = G(t − s)Q(s)ds.
−∞
When we plug the solution (2.42), the Heaviside function will chop the integration
49
interval, from (−∞, ∞) to (−∞, t]:
Zt
y p (t) = e−λ(t−s) Q(s)ds. (2.43)
−∞
If we stare at this for a few seconds, we will start to see the logic behind it. This
result is actually very similar to the general solution (2.37). In fact, we can make them
equal if we consider the initial condition in (2.37) to be at t0 = −∞. Thus, in this
sense, we could have maybe even “read” what the Green’s function should be, directly
from (2.37). This is a bit frustrating: our Green’s function (2.42) definitely looks very
pretty, but once we plug it back to obtain (2.43), we are essentially back to (2.37). But
don’t let this turn you off: this only happened here because the ODE we are solving
is very simple, so that a solution like (2.37) can be written down explicitly. For most
other ODEs, finding the Green’s function is a significant effort, but which pays off big
time.
e(λ+iωt)
!
= ce−λt + f0 e−λt .
λ + iω
50
The general solution of ż + λz = f0 eiωt is therefore
f0 iωt
z(t) = ce−λt + e . (2.45)
λ + iω
This result is now a complex number. To get the real or imaginary parts, recall that for
any complex number,
√
r = λ2 + ω2
λ + iω = reiφ ↔
φ = arctan(ω/λ),
and so
1 e−iφ
= .
λ + iω r
Whence, Eq. (2.45) may be written as
f0
z(t) = ce−λt + √ ei(ωt−φ) . (2.46)
λ + ω2
2
When z(t) is written in this way, it becomes absolutely trivial to take the real or imagi-
nary parts: we simply replace ei(...) with cosine or sine. Thus, for instance, the real part
is
f0
y(t) = ce−λt + √ cos(ωt − φ), (2.47)
λ + ω2
2
51
��� (�)
���
���
�(�)
���
-���
-���
� �π �π �π
ω�
Figure 2.1: (Green) Example solution of Eq. (2.47) as a function of ωt, with c = 1, f0 = 1 and
λ/ω = 0.3. (Black) The function cos(ωt), for comparison.
• The system oscillates with a phase lag φ. We say that it is always lagging behind
the drive (Fig. 2.1). This lag depends on the drive frequency ω. If we drive the
system very very slowly, ω ' 0 and thus φ ' 0. The faster the drive, the larger
is the lag. Here the word “faster” is used in comparison with λ, which is the
intrinsic time scale of the system.
Here I used capital C in the first term, to avoid confusion with the Fourier coefficients
cn . We can exchange the order of sums and integrals because Fourier series converge
uniformly (Sec. 1.9). Carrying out the integrals we then find
X cn
y(t) = Ce−λt + einωt . (2.49)
n∈Z
λ + inω
This is the principle of superposition in its clearest form: the solution is just a sum of
the solutions for the different driving frequencies einωt . You may also wonder why I
used y(t) here, instead of z(t), since the exponentials are complex. The reason is that
even though the exponentials may be complex, the function f (t) may very well be real.
This is encoded in the Fourier coefficients, through the fact that c−n = c∗n . In other
words, if f (t) is real, the solution (2.49) will also be real, because in this case c−n = c∗n ,
so that we get, for instance,
!∗
c1 eiωt c−1 e−iωt c1 eiωt c1 e−iωt
+ = + ,
λ + iω λ − iω λ + iω λ + iω
52
λ/ω = ���� λ/ω = ��� λ/ω = ��� λ/ω = �
��� ��� ��� ���
(�) (�) (�) (�)
��� ���
��� ���
��� ���
�(�)
�(�)
�(�)
�(�)
��� ��� ��� ���
-��� -���
-��� -���
-��� -���
-��� -��� -��� -���
� �π �π � �π �π � �π �π � �π �π
ω� ω� ω� ω�
Figure 2.2: Steady-state response (2.50) for a square-wave input and different values of λ/ω.
In the steady-state the system will thus respond to all harmonics einωt of the input drive
f (t). Moreover, each response will be weighted by a factor cn /(λ2 + n2 ω2 ) and will also
lag behind by an angle arctan(nω/λ). The overall response will therefore be somewhat
complicated, with the system trying to follow all einωt the best it can, but always lagging
behind (poor guy!).
To illustrate the behavior of Eq. (2.50), we consider the case of a square-wave. The
Fourier coefficients were computed in Sec. 1.4, Eq. (1.24), and read1
− nπ , n odd
2i
cn =
(2.51)
0,
n even
The corresponding steady-state y(t), computed using Eq. (2.50), is shown in Fig. 2.2
for different values of λ/ω. As can be seen, changing λ leads to dramatic changes.
Large λ/ω is the same as saying that the drive is very slow (ω small compared to λ).
In this case (figure (d)) we see that y(t) has a tendency to follow f (t) more closely. In
fact, if ω = 0 we simply get dn = cn /λ and the two coincide, up to a scaling factor.
Conversely, if ω is large (λ/ω small) we are in the fast drive regime, where the response
is now significantly different from the input.
There is a cool interpretation to the solution in Eq. (2.50). Recall that in Sec. 1.9 we
discussed how differentiating a function changed cn to incn , while integration changed
cn to cn /(in). There we were using period 2π. If we use arbitrary periods, we would get
inωcn and cn /(inω) instead. What we are doing now is not a simple integration. But it
is still the integration of an ODE. Since y(t) is the solution of Ly(t) = f (t), we can also
picture that y(t) = L−1 f (t): that is, y(t) is obtained by applying the inverse of L into the
The result we obtained in Eq. (1.24) was for bn . But recall that cn =
1 1
2 (an − ibn ) and, in our case,
an = 0.)
53
input f (t). And what we just learned is that this operation takes a Fourier coefficient cn
to cn /(λ + inω):
L−1 cn
cn −−−−−→ . (2.52)
λ + inω
Isn’t this cool? I mean, if we had only the differential operator L = dtd , the inverse
would be the integral and we would get only cn /inω. But since we have L = dtd + λ, we
get cn /(inω + λ) instead. This type of reasoning is usually called input-output theory
and is very important in engineering and physics. The input is f (t) (the drive), which
goes through a black box L−1 to produce the output (the system response) z(t). This
black box does not mix harmonics, taking einωt into einωt . But it processes each one
with a different weight, taking cn to cn /(λ + inω).
Energy
Consider specifically the RL circuit (2.5), where y(t) = I(t) is the current, λ = R/L
and f (t) = V(t)/L is essentially the voltage. The energy stored in the inductor is
1 2
E= LI . (2.53)
2
The energy is thus seen to be a quadratic form in the output variables. Similarly, in the
RC circuit y(t) = Q(t) and the energy stored in the capacitor is
Q2
E= . (2.54)
2C
Again, E is quadratic in y. In fact, this is very common. For most systems described by
ODEs of the form (2.44), the energy can be written as a quadratic form in the output,
1
E(t) = κy(t)2 , (2.55)
2
where κ is some constant to get the correct units.
Let us focus on the steady-state (long-time) regime of (2.50). The energy in this
case will oscillate periodically in time. Given any function g(t), periodic with period
T = 2π/ω, we define its time average as
ZT
1
ḡ = g(t)dt, (2.56)
T
0
which is nothing but the Fourier coefficient a0 [Eq. (1.8)]. The average energy over a
period is thus
ZT ZT
1 κ1
Ē = E(t)dt = y(t)2 dt. (2.57)
T 2T
0 0
54
But since y(t) is given by the Fourier series (2.50), we can directly use Parseval’s iden-
tity, Sec 1.6, Eq. (1.48). In our notation, this is translated as
ZT
1 X
y(t)2 dt = |dn |2 (2.58)
T n∈Z
0
Whence, we conclude that the energy associated to the steady-state solution (2.50) is
κ X |cn |2
Ē = . (2.59)
2 n∈Z λ2 + n2 ω2
This is one of the main practical uses of Parseval’s identity: it separates the energy
into specific contributions from each harmonic. We sometimes call this a spectral
decomposition.
Next, going back to E(t) in Eq. (2.55) and differentiating with respect to time, we
get
dE dy
= κy .
dt dt
Plugging the original ODE (2.44) yields
dE
= −κλy(t)2 + κy(t) f (t). (2.60)
dt
These two terms have a clear physical interpretation. In the RL circuit, for instance
(κ = L, y = I, λ = R/L and f = V/L), this becomes
dE
= −RI(t)2 + V(t)I(t). (2.61)
dt
You may remember from Physics III that the first term is the power that is dissipated
in the resistor, whereas the second term is the power delivered by the voltage source.
These two terms combine to yield the net power dE/dt in the circuit.
We can compute the integrated power in the circuit over a full period. That is, the
average of dE/dt. But if we are already in the steady-state, this will give zero because
E(t) is periodic:
ZT
1 dE 1h i
dt = E(T ) − E(0) = 0.
T dt T
0
This always happens in the steady-state: since the energy is periodic, in some parts of
the cycle dE/dt > 0 while in others dE/dt < 0, so that the area under the curve of
dE/dt yields zero when integrated. But saying that the average power is zero does not
mean that both terms in the right-hand side of (2.60) or (2.61) are individually zero. It
just means that the two must coincide. In fact, time-averaging each term yields
X |cn |2
κλy2 = κλ (2.62)
n∈Z
λ2 + n2 ω2
X |cn |2
κy(t) f (t) = κ . (2.63)
n∈Z
λ + inω
55
Let me explain what I just did. Eq. (2.62) is the same as (2.59). As for Eq. (2.63),
I actually used the more general “inner product” identity we developed in Eq. (1.47).
Essentially, I multiplied the Fourier coefficients cn of f (t), with the Fourier coefficients
dn of y(t).
The two results in Eqs. (2.62) and (2.63) don’t look equal. But they are. To see
that, we write
1 λ − inω
= 2 .
λ + inω λ + n2 ω2
Eq. (2.63) then becomes
X |cn |2 X n|cn |2
κy(t) f (t) = κλ − iκω .
n∈Z
λ2 +n ω 2 2
n∈Z
λ2 + n2 ω2
But the last term vanishes, because the summand is an odd function of n, so that the
terms with n > 0 exactly cancel those with n < 0 (and the term with n = 0 is also zero).
Thus, we are left only with the first term, which is exactly (2.62).
The moral of the story is that in the steady-state the power dissipated equals the
power delivered:
!
κλy = κy(t) f (t)
2 or RI(t) = V(t)I(t).
2 (2.64)
Thus, even though the average energy of the system is no longer changing, stuff is
still happening: the voltage source is constantly injecting juice in the circuit, which is
constantly being burned in the resistor. And what is perhaps most fascinating, because
this is a linear circuit, the harmonics don’t mix so this balance is true for each individual
frequency n.
Ly = ÿ + 2γẏ + ω2 y = 0. (2.65)
56
The coefficient γ in (2.65) is thus associated to damping, while ω is associated to oscil-
lations (it comes from the spring constant). In the vast majority of physical problems,
γ > 0 and ω ∈ R. But the solutions we will develop in this section also hold other-
wise. Another example of Eq. (2.65) is the LRC circuit (2.3) with zero external voltage,
V(t) = 0. In this case 2γ = R/L and ω2 = 1/LC. So, once again, we can associate γ
with damping (in this case caused by the losses in the resistor) and now the oscillatory
behavior is linked with 1/LC.
It is convenient at this point to introduce the shorthand notation
d
D=
dt
for the derivative operator. Then ẏ = Dy and ÿ = D2 y. In this notation Eq. (2.65)
becomes
D2 y + 2γDy + ω2 y = 0,
and allows us to identify the differential operator
L = D2 + 2γD + ω2 .
which is a polynomial in D.
This new object, D, is a differential operator, so you cannot treat it like a number.
For instance, Dy , yD since D acts on anything that is a function of t. We say D and
y(t) do not commute. On the other hand, D does commute with things which are not
functions of t, such as scalars: 2D = D2. It also commutes with itself. For instance,
D(D+2) = (D+2)D. Checking this kind of property can be a bit confusing at first. The
trick is to make the differential operator act on a generic function y(t). For instance:
(D + 2)Dy = D2 y + 2Dy.
Since this must be true for any y(t), we then conclude that the identity must hold for
the operator itself. that is, D(D + 2) = (D + 2)D. It is convenient to use the notation
[A, B] = AB − BA,
to denote the commutator between two objects, A and B. Then our last calculation
reveals that
[D, (D + 2)] = 0.
Conversely, D(D + t) , (D + t)D, since t there does not commute with D. We can
again check this explicitly
(D + t)Dy = D2 y + tDy.
57
The two are clearly not equal. In fact,
[D, D + t]y = y.
And, again, since this must hold true for any y(t), we conclude that
[D, D + t] = 1. (2.67)
Canonical quantization
I cannot resist to tell you of a neat application of differential operators to quan-
tum mechanics. Heisenberg showed that quantum mechanical experiments could be
explained if position x and momentum p were not numbers, but differential operators.
More specifically, they should be such as to satisfy
D(xy) = y + xDy.
Thus, h i
[x, D]y = (xD − Dx)y = xDy − y + xDy = −y.
Since this must be true for all y, we then conclude that
[x, D] = −1
This is very similar to Eq. (2.68), except we got a −1 on the right-hand side, instead of
i~. This therefore motivates us to define the momentum operator
d
p = −i~ . (2.69)
dx
58
Back to ODEs
Sorry. I lost focus. Back to business. We can now use these properties of differ-
ential operators to establish a general solution for ODEs of the form (2.65). Consider,
for example, the differential operator L = D2 + 5D + 4. This only involves scalars and
powers of D, which commute between themselves. But since everything commutes,
we can apply standard algebra. For instance, we can factor D2 + 5D + 4 just like we
would factor a normal polynomial,
D2 + 5D + 4 = (D + 1)(D + 4),
[which is also the same as (D + 4)(D + 1), since everything commutes]. Again, if you
ever feel insecure about this, plug a generic y and check it:
(D + 1)(D + 4)y = (D + 1)(Dy + 4y) = (D2 y + 4Dy) + (Dy + 4y) = D2 y + 5Dy + 4y.
Since this must hold for any y, it must then be true at the level of the differential operator
itself.
But why is this useful? The reason is simple: we know how to solve (D − a)y = 0,
for any constant a. This is just ẏ − ay = 0, whose solution is y = eat :
(D − a)y = 0 → y = eat . (2.70)
Going back to our example, we know that y = e−t solves (D + 1)y = 0. So if we plug
e−t on the entire ODE we get
D2 y + 5Dy + 4y = (D + 4)(D + 1)e−t = (D + 4) · 0 = 0,
meaning e−t also solves D2 y + 5Dy + 4y = 0. Similarly, y = e−4t solves (D + 4)y = 0
and so will also solve D2 y + 5Dy + 4y = 0 because (D + 4)(D + 1) = (D + 1)(D + 4),
because they commute. Thus, we conclude that y = e−t and y = e−4t are both solutions.
And since these are linearly independent and our ODE is 2nd order, we conclude that
the most general solution will be
y = c1 e−t + c2 e−4t ,
for constants c1 , c2 .
Unforced oscillations
One may then verify that eλ+ t and eλ− t will both be solutions of (2.71). And so,
59
as long as λ+ , λ− , the general solution will be given by the linear combination
This shows that the solution is always enveloped by e−γt . In most physical applications
γ > 0 and hence y(t) will decay exponentially. This is what happens, for instance, in the
damped harmonic oscillator, which wiggles around for a bit until eventually stopping.
The condition γ > 0 is thus associated with the stability of the ODE.p
If γ > ω we say the solution is overdamped. In this case γ2 − ω2 ∈ R so
all exponentials in (2.74) are real; there will be no oscillations and the system will
just relax exponentially towards equilibrium. Conversely, if γ < ω the solution is
called underdamped. In this case the square roots will be imaginary. Defining Ω =
ω2 − γ2 > 0, we can also write it as
p
y(t) = e−γt c1 eiΩt + c2 e−iΩt (2.75)
= e−γt C1 cos Ωt + C2 sin Ωt (2.76)
The three solutions are all equivalent and simply correspond to different parametriza-
tions of the constants. For instance, if we take the 3rd line and expand sin(Ωt − φ) =
sin Ωt cos φ − cos Ωt sin φ, we see that c cos φ ≡ C1 and −c sin φ ≡ C2 .
Finally, there is the critically damped case where γ = ω. What is special about
this is that the two roots of L become equal, λ+ = λ− [Eq. (2.72)]. Let us go back to
the drawing board. What we are interested, in this case, is to find a general solution of
(D − a)(D − a)y = 0.
We already know that y = eat works. But to construct the general solution, we need
two linearly independent solutions. Let us then, instead, try a solution of the form
y = u(t)eat , for some function u(t). However, one may verify that
(D − a)y = u̇eat .
Thus, every time we apply (D − a) to ueat , we get back almost the same thing, but with
u replaced with u̇. So applying it a second time yields
60
Hence, we see that y = u(t)eat will be a solution of (D − a)(D − a)y = 0, provided ü = 0.
The most general solution should thus be of the form
We have just found our other linearly independent solution. One solution is eat and the
other is teat .
v0 + γy0
" q q #
y(t) = e−γt y0 cosh t γ2 − ω2 + p sinh t γ2 − ω2 , γ > ω,
γ2 − ω2
h i
= e−γt y0 + (v0 + γy0 )t , γ = ω, (2.79)
v0 + γy0
" q q #
= e−γt y0 cos t ω2 − γ2 + p sin t ω2 − γ2 , γ < ω.
ω −γ
2 2
I know this looks messy. But it interesting to compare the 3 cases. First, notice
how γ > ω and γ < ω can be obtained by simply switching from trigonometric
to hyperbolic functions. Second, now that we wrote the solution in terms of
actual physical quantities, y0 and v0 , we can obtain the critically damped solu-
tion by taking the limit ω → γ. This does not work on Eq. (2.74) because it is
written in terms of generic constants. An illustration of the three solutions is
given in Fig. 2.3.
for some external drive f (t). The prototypical example is the forced, damped harmonic
oscillator described by
mÿ + αẏ + ky = F(t), (2.81)
where m is the mass, α is the damping, k the spring constant and F(t) the external force.
Dividing both sides by the mass then yields Eq. (2.80) with 2γ = α/m, ω2 = k/m and
61
���
��� ω/γ = ���� ����������
��� ω/γ = �� ����� ������
��� ω/γ = �� �����������
�
���
���
-���
� � � � � ��
γ�
Figure 2.3: Example of the general solution (2.73) or (2.78), for different values of ω/γ, all
starting from y(0) = 1 and ẏ(0) = 0.
f (t) = F(t)/m. To be honest, the results we are going to develop also hold for ODEs of
order higher than 2. But these are seldom found in practice, so we focus on 2nd order
to gain intuition.
It helps to think about Eq. (2.80) as a probe & response problem, an idea that
encompass many experiments in physics. Our system can be viewed as a kind of black
box, that we do not know much about. To learn something about it, we probe it with
an external perturbation f (t) and measure how it responds. This scenario happens
in particle physics, for instance: Rutherford probed gold atoms by poking them with
α particles. It also happens (a lot) in condensed matter. To characterize a magnetic
system, we probe it with a magnetic field. To characterize a Bose-Einstein condensate,
we shake it with an optical field.
Since Eq. (2.80) is inhomogeneous, the general solution will be
y(t) = y p + c1 y1 + c2 y2 , (2.82)
where y p is a particular solution and y1(2) are two linearly independent solutions of
Ly = 0, which is precisely what we studied in the last section. There is a general
method to solve this type of equation, for arbitrary right-hand side. But the method is
not very elegant and, for simple choices of f (t), more direct and insightful approaches
exist. Nonetheless, I discuss this general method below, in Eqs. (2.94) and (2.95), in
case you are curious.
For now I want to focus on the case when f (t) = f0 eiΩt , where I use Ω to avoid
confusion with the natural oscillation frequency ω in Eq. (2.80) We begin by noting the
following nice result: let
L(D) = a0 + a1 D + a2 D2 + . . . ,
This is pretty cool: when L(D) acts on a complex exponential, we get the exact same
polynomial, but with differential operators replaced by numbers.
62
Let us prove this identity. Start with a simple example:
d(eiΩt )
(D − 2)eiΩt = − 2eiΩt = (iΩ − 2)eiΩt .
dt
Notice how the thingy on the left has the same algebraic structure as the differential
operator we started with: we simply replaced D − 2 with iΩ − 2. From this I think the
proof is pretty easy, right? I mean, we simply act with L(D) on eiΩt . Each time a D hits
the exponential, we get iΩ times the exponential again. So we are essentially replacing
D’s with iΩ’s everywhere.
Consider now the ODE
L(D)y = f0 eiΩt .
Use the ansatz y(t) = AeiΩt , where A is a constant. We then get
h i
L(D) AeiΩt = AL(iΩ)eiΩt = f0 eiΩt
The anstaz will thus be a valid particular solution, provided the constant A is chosen as
f0
A= . (2.84)
L(iΩ)
If it happens that L(iΩ) = 0, then this of course won’t work. In this case, the particular
solutions will usually have the form P(t)eiΩt , where P(t) is a polynomial in t. I will
leave this as an exercise for you to check. Try, for instance, L = (D − iΩ)(D − a) or
L = (D − iΩ)(D − iΩ).
If we want the general solution, then we must still add to y p the homogeneous
solutions y1 and y2 , as in Eq. (2.82). However, as we discussed in the previous section,
these solutions always have an exponential envelope and therefore decay in time (the
transient). The particular solution (2.87), on the other hand, is periodic and will thus
63
oscillate indefinitely (the steady-state). Thus, if we are only interested in the long-time
behavior of the system, all we need is Eq. (2.87).
The solution (2.87) is also complex and we may wish to take its real or imaginary
parts. The rationale is exactly the same as in Sec. 2.6: the ODE is linear with real coef-
ficients, so the real part of (2.87) will give us a particular solution for f (t) = f0 cos Ωt
and the imaginary part will give us a solution for f (t) = f0 sin Ωt. In this sense, it is
convenient to write
1 e−iφ 2γΩ
= , tan φ = 2 . (2.88)
ω2 − Ω2 + 2iγΩ (ω − Ω ) + 4γ Ω ω − Ω2
p
2 2 2 2 2
Taking the real or imaginary parts is now trivial: simply replace the complex exponen-
tial by cosine or sine.
The particular solution (2.89) shows that, in the steady-state, the system will oscil-
late with the same frequency Ω as the external force. The amplitude of the oscillations
is proportional to f0 (so big drives generate big responses), but modulated by a factor
that depends on Ω, ω and γ. This amplitude is illustrated in Fig. 2.4(a) as a function of
Ω/ω. Unlike what we had in first order systems [c.f. Eq. (2.47)], we see in this case the
possibility of a resonance effect: By tuning Ω we can either suppress or enhance the
amplitude. In particular, if the damping γ is very small, we can dramatically enhance
the response by tuning Ω exactly at the natural frequency ω. The magnitude of the
response depends on the damping constant γ, so the resonance is stronger for lower
dissipation.
Now take a second imagine all this from the perspective of an experimentalist. This
is extremely valuable information. You sweep through different values of Ω, until you
encounter a peak. The position of the peak indicates the natural oscillation frequency
of the system. And the height/width of the peak tells you about the magnitude of the
damping (an experimentalist might call this the quality factor, instead of damping).
Here we focused on a single harmonic oscillator, but more complicated systems tend to
behave in a similar way. The only difference is that they may have multiple resonances
and thence multiple peaks.
Fig. 2.4 also shows the phase lag φ in (b), as well as different example dynamics in
(c). In this case, the behavior is similar to that of 1st order systems, where the system
tries to follow f (t) around, but is always lagging behind.
64
�� π
(�) (�) � (�)
�
��������� ���� �
ω��(�)/��
� ��� π
�
ϕ
�
� ��� -�
�
-�
� �
��� ��� ��� ��� ��� ��� ��� ��� ��� ��� � π �π �π
Ω/ω Ω/ω Ω�
Figure 2.4: (a) Amplitude and (b) phase φ [Eq. (2.89)], plotted as a function of Ω/ω, for differ-
ent values of γ/ω. (c) The system’s response ωy p (t)/ f0 as a function of Ωt for the
same values of γ/ω as in (a) and (b), with fixed Ω/ω = 0.8. The dotted black curve
is simply the external force, cos(Ωt).
�(�)
The new and exciting feature here, as compared to the first order systems we studied in
Sec. 2.6, is the prospect of resonant effects. As we just saw, resonance occurs when the
driving frequency Ω matches the natural frequency ω. But now f (t) contains a super-
position of many driving frequencies Ω, 2Ω, 3Ω etc. This opens up more possibilities.
We can now have a resonance when any of these subharmonics gets very close to the
system’s natural frequency.
You may have heard about the Angers Bridge in France, in 1850, which collapsed
when a battalion was marching through it. Marching generates a periodic drive. It is
very far from a single harmonic drive, but it is still periodic. In fact, marching looks
somewhat like a narrow boxcar, such as that in Fig. 2.5. Let us take, for concreteness,
a boxcar of height f0 , frequency Ω and duration a:
65
what we have been calling T in this chapter) being the period, 2π/Ω:
ZT/2 Za
1 f0 f0 2
cn = f (t)e−inΩt
dt = e−inΩt = sin(nΩa).
T T T nΩ
−T/2 −a
This result is very nice. The soldier’s march is modeled by three ingredients: f0 , a
and Ω. The first is the overall amplitude. And, as one would intuitively expect, larger
inputs f0 lead to larger outputs y p . Then there is a and Ω. The latter reflects the
overall periodicity of the steps. It describes at which frequency the input repeats itself.
Conversely, a describes the duration of each kick. That is, the amount of time the
soldier’s boots apply a force on the ground. It is thus something one has much less
control over.
But, as can be seen, the possibility of having resonances depends on Ω, not a.
The value of a will only influence the magnitude of the resonance, through the factor
sin(nΩa). A resonance occurs when Ωn = ω. If ω < Ω there will be no integer n that
gets close. So resonances occur when ω > Ω; that is, when the marching pace is slow,
compared to the natural oscillation frequency of the bridge. In this case, a resonance
will occur whenever ω and Ω are such that ω/Ω is close to an integer. If γ is small,
this may cause one of the Fourier coefficients in (2.93) to become very large. And this,
in turn, can make the bridge go pluft. Still today, soldiers break stride (stop marching)
when they go on a bridge, precisely for this reason.
66
Imagine that you are given a specific f (t) and you solved this integral. This will give
you u(t) as some function. But since u = (D − λ− )y, we can use u(t) as a right-hand
side and solve the ODE (D − λ− )y = u, whose solution is
Z
λ− t λ− t
y(t) = c2 e + e u(t)e−λ− t dt. (2.95)
This gives you the general solution of (2.80), for any f (t), in terms of two integrals.
(D − 1)(D + 2)y = t.
= c1 et − et (1 + t)e−t
= c1 et − (1 + t).
e3t 1
" #
= c2 e −2t
+e −2t
c1 − (t + 1/2)e2t
3 2
c1 t 1
= c2 e−2t + e − (t + 1/2).
3 2
Since c1 is a constant anyway, we may redefine c1 /3 → c1 , leading finally to the general
solution
1
y(t) = c2 e−2t + c1 et − (t + 1/2).
2
67
Chapter 3
68
configuration u(x, y, z). This happens, in all three examples above. For instance, one
may ask what is the temperature profile of some solid that is connected to multiple heat
baths. Or what is the electrostatic potential resulting from some charge configuration.
In this sense, the most important equation, by far, is Laplace’s equation
∇2 u = 0, (3.1)
≡ u xx + uyy + uzz .
69
equation. Its solutions are oscillatory, even though it is first order in time. This happens
because of the complex factor in front and is something which has major consequences
for the description of quantum phenomena, as we will discuss.
where r = (x, y, z). That is, we look for solutions where the time part is factored from
the spatial part. As we will learn, there will usually be an infinite number of linearly
independent solutions of this form, which we can label as un (r, t) = Rn (r)T n (t), with
some index n. The general solution will definitely not have this form; otherwise time
and space would evolve independently and there would be no fun at all. But these
product solutions can be used as building blocks to construct more general solutions.
This is possible because the equation is linear, so the sum of two solutions is also a
solution. That is to say, the general solution will usually have the form
X
u(r, t) = cn Rn (r)T n (t), (3.7)
n
where cn are coefficients that need to be adjusted to match the initial and boundary
conditions. This will be the basic game that we will play throughout this chapter: we
find the set of product solutions of the form (3.6) and then use them to build the general
solution (3.7) by adjusting the constants to match the initial/boundary conditions.
Wave equation
Let us see what the ansatz (3.6) brings us in the case of the wave equation (3.3).
On the left we have
1 ∂2 u R d2 T
=
c2 ∂t2 c2 dt2
since R(r) does not depend on t. Similarly, on the right we have
∇2 u = T (t)∇2 R,
since T (t) does not depend on r. Rearranging a bit, we can then write Eq. (3.3) as
T̈ ∇2 R
2
= . (3.8)
c T R
Now comes the key point: We are looking for functions T and R which solve this
equation. By “solve” we mean functions which satisfy it for all values of t and all
70
values of r. But the left-hand side is only a function of t, by hypothesis. And the
right-hand is only a function of r. So how can it be that, as we vary t and we vary r,
which we can of course do independently, the quantity on the left remains equal to the
quantity on the right? The answer is that this can only happen if they each are equal to
a constant. That is, if
T̈ ∇2 R
= = constant := −k2 . (3.9)
c2 T R
We label the constant as −k2 for convenience. This implies no loss of generality. The
above logic is very important. So please take a second to see if you really understand.
The value of the constant does not matter right now; we will see that there are many
constants k that satisfy this, whose values are imposed by the boundary conditions.
What really matters at this point is that T̈ /T and ∇2 R/R must be a constant. That is,
they cannot depend on either t or r.
Eq. (3.9) therefore implies two separate equations
T̈ = −c2 k2 T, (3.10)
∇2 R = −k2 R. (3.11)
The equation for R is called the Helmholtz equation. Notice how it looks similar to
Laplace’s or Poisson’s equations, (3.1) or (3.2). The difference is that the right-hand
side now depends on R. We will discuss how to solve this type of equation soon.
For now, I just wanted to anticipate one result: we will find that in most problems
Eq. (3.11) only has a solution for a discrete set of real constants kn . You may appreciate
the connection with eigenvalues and eigenvectors. Eq. (3.11) has the form Ax = λx,
where instead of a matrix A, we now have a differential operator ∇2 and instead of
a vector x we have a function R(r). The allowed values kn , which solve (3.11), are
therefore the eigenvalues of the Laplace operator. And the corresponding functions
Rn (r) are the eigenfunctions.
Let us then focus on the time part, Eq. (3.10). This equation is easy; it is just a
2nd order homogeneous ODE with constant coefficients. The two linearly independent
solutions are
T (t) = eickt , e−ickt . (3.12)
The solutions are thus complex exponentials, befitting of a wave equation. We could
also use sines and cosines, but I prefer to leave it like this for now. Let us denote by Rn
the solutions of (3.11). The general solution can then be written as a linear combination
of these solutions X
u(x, t) = (an eickn t + bn e−ickn t )Rn (r). (3.13)
n
Heat equation
Next let us see what happens for the heat equation (3.4). We use the same type
of ansatz as in (3.6). As a result, we get something very similar to (3.8), but with a
71
different time part:
Ṫ ∇2 R
= = −k2 . (3.14)
αT R
The argument for the separation of variables remain exactly the same: the left-hand
side is only a function of t and the right-hand side is only a function of r. The resulting
equation for R continues to be the Helmholtz equation (3.11). But the equation for T
now becomes
Ṫ = −αk2 T
This is a 1st order ODE with constant coefficients. There is now only one independent
solution,
2
T (t) = e−αk t
Since k is real, we therefore see that the solution is a decaying exponential: Heat
dissipates, while waves propagate. The general solution will thus have the form
X 2
u(r, t) = cn e−αkn t Rn (r). (3.15)
n
Schrödinger equation
Finally, we consider Schrödinger’s equation (3.5). Repeating the same procedure,
we find
i~Ṫ ~2 ∇2 R
=− = E.
T 2m R
In the case of Schrödinger’s equation we call the constant E instead of −k2 . This
is because, it will turn out, E is associated with the energy of the system (we will
understand why later on). The equation for R now reads
~2 2
− ∇ R = ER (3.16)
2m
while that for T reads
Ṫ = −i(E/~)T. (3.17)
This is a 1st order ODE, so the solution reads
T (t) = e−iEt/~ . (3.18)
Hence, the general solution will have the form
X
u(r, t) = cn e−iEn t/~ Rn (r). (3.19)
n
The value of E turns out to be real, so that the solutions are seen to be complex ex-
ponentials. Thus, even though the time-derivates are first order, the solutions are still
oscillatory. This is because of the factor of i in Eq. (3.5).
Eq. (3.5) is actually only a particular case of Schrödinger’s equation. The general
equation actually reads
∂2 u ~2
i~ 2 = − ∇2 u + V(r)u, (3.20)
∂t 2m
72
where V(r) is an arbitrary function of r, which is called the potential energy. I will
explain the logic a bit better later on. For now I just wanted to point out that, even in
this more general case, the method of separation of variables continues to hold. But
now it yields
Ṫ = −i(E/~)T, (3.21)
~2 2
− ∇ R + V(r)R = ER (3.22)
2m
Notice how the equation for the time-part remains unchanged; the solutions continue
to be T (t) = e−iEt/~ . The equation for R is sometimes called the time-independent
Schrödinger equation. The solutions Rn (r) will still only exist (usually) for a discrete
set of energies En , but which will depend sensibly on the function V(r). Notwithstand-
ing, since the time-part does not change, the general solution will still be given by
Eq. (3.19), but with new functions Rn (r).
denote the net amount of particles in that region. A continuity equation then states that
dQ
=− J · dS, (3.24)
dt
SΓ
where J (r, t) is the current that flows through point r at time t, and the integral is over
the surface S Γ encompassing the region Γ, with dS being a surface element. Using the
divergence theorem, however, we may write
$
J · dS = (∇ · J )dV,
SΓ Γ
73
where ∇ · J = ∂ x J x + ∂y Jy + ∂z Jz is the divergence. We may combine this and (3.23)
into (3.24), leading to
$ $
∂u(r, t)
dV = − (∇ · J )dV.
∂t
Γ Γ
And since this must be true for any region Γ, we conclude that the integrands them-
selves must be equal. That is,
∂u
= −∇ · J . (3.25)
∂t
This is the continuity equation. It relates changes in u with the divergence of a current
J through that region.
This result is nice but, in a sense, it doesn’t say much because we haven’t really
defined what J is. What really determines the physics is the form of the current.
Unless we say something specific about it, there is nothing to do. This is where our
hero, Fourier, comes in. He argued that what generates a current is precisely a variation
of u. If u were constant everywhere, there would be no current. But if there is an
imbalance between u in one point and u in another, this will cause a current to flow.
This makes a lot of sense. For instance, if u is the concentration of particles, then what
generates a current is the fact that one region has more particles than others; or, if u is
the temperature, the heat will flow because one region may have a temperature higher
than the other.
According to Fourier, therefore, the current should have the form
J = −α∇u. (3.26)
That is, it should be proportional to minus the gradient of u (which is what quantifies
how steeply u changes). Here α is just a constant, which varies from material to ma-
terial, called the diffusivity. The minus sign is there because the flow tends to be from
high to low concentration. For instance, heat flows from hot to cold. For this reason,
we also have α > 0. When we are thinking in terms of heat flow, Eq. (3.26) is called
Fourier’s law of heat conduction. Conversely, if we are thinking in terms of particle
diffusion, it is called Fick’s law of diffusion.
Eq. (3.26) is the missing ingredient that makes it possible to extract useful infor-
mation from the continuity equation (3.25). Combining both results, we finally find the
diffusion equation
∂u
= α∇2 u, (3.27)
∂t
where ∇2 u = ∇ · (∇u). Interestingly, Fourier series were actually invented to treat the
heat equation. Fourier was interested in solving (3.27) and developed the theory behind
Fourier series as a method to solve it.
74
Figure 3.1: A practically 1D bar, of length L.
a thin but long metal rod, for instance, so that the temperature may change along the x
direction, but is practically constant along the y and z directions (Fig. 3.1). The bar is
assumed to have length L. In this case u(x, t) depends only on the x position, and time.
We are going to write the heat equation more compactly as
ut = αu xx . (3.28)
This equation, by itself, doesn’t tell the whole story. We still need to specify the bound-
ary and initial conditions. The initial condition is specified by providing the function
u(x, 0) = u0 (x). That is, we need to know how the temperature profile looked like
initially. Only then can we actually say something about how it will evolve in time.
The boundary condition, on the other hand, is specified by saying what happens to
u at the endpoints, x = 0 and x = L. Can heat flow from the endpoints? Or maybe, are
the end-points in contact with some heat bath kept at some temperature? This actually
defines two types of boundary conditions:
• Dirichlet boundary condition: we fix the temperature at the boundaries, at all
times: u(0, t) = T 1 and u(L, t) = T 2 , for some temperatures T 1 and T 2 . This is the
case, for instance, if the endpoints are connected to thermal baths (like a flame
or a big bucket of water).
• Neumann boundary condition: we fix a constant heat flux at the boundaries.
Recall from Eq. (3.26) that the heat flux J is essentially the gradient of u. In 1D
this reduces to J = −α∂ x u. Hence, fixing the heat flux is the same as fixing ∂ x u.
For instance, we could set J(0, t) = J1 and J(L, t) = J2 . A particularly common
choice is when we set J = 0 (and hence u x = 0); this describes insulating walls.
That is, we block any heat flow to the outside world.
These boundary conditions are not the only ones; they are just two very common
choices that appear frequently. We could also have mixed boundary conditions, such
as Dirichlet on the left and Neumann on the right. Or we can also mix them at the same
boundary. Something like,
au(0, t) + bJ(0, t) = c, (3.29)
75
for constants a, b, c. This is called a Robin, or “radiation” boundary condition. It
has this name because, if we relabel c = T a, for some parameter T , we can write it as
bJ(0, t) = a(T − u(0, t)). (3.30)
The idea is that T is some kind of temperature of the surroundings, so this equation
specifies that the heat flux at the boundary is proportional to the temperature difference
between T and u(0, t), a hypothesis known as Newton’s law of cooling. Robin bound-
ary conditions can also be solved by similar methods, but turn out to be mathematically
more complicated. We will therefore focus on Dirichlet and Neumann in this course.
76
Eigenvalues and eigenfunctions
We can summarize the above results by writing the following boundary prob-
lem:
X 00 = −k2 X, X(0) = X(L) = 0. (3.34)
This is the 1D version of the Helmholtz equation (3.11), with Dirichlet bound-
ary conditions. What we have just learned, therefore, is that the allowed solu-
tions are Xn (x) = sin(kn x), where kn = nπ/L and n = 1, 2, 3, . . .. The kn are the
eigenvalues and Xn are called the eigenfunctions.
You should now see a Fourier series starting to take shape. Combining X(x) with
2
the time part, T (t) = e−αk t all functions of the form
2 πn
un (x, t) = e−αkn t sin(kn x), kn = , n = 1, 2, 3, . . . (3.35)
L
will be solutions of our equations. Since the PDE is linear, linear combinations of them
will also be a solution. Therefore, the general solution of ut = αu xx , with Dirichlet
boundary conditions u(0, t) = u(L, t) = 0, will be
∞
X 2 πn
u(x, t) = bn e−αkn t sin(kn x), kn = , n = 1, 2, 3, . . .
n=1
L
for constants bn , which are determined from the initial condition u(x, 0) = u0 (x).
We find the bn by setting t = 0, leading to
∞
X
bn sin(kn x) = u0 (x). (3.36)
n=1
I know these details seem a bit confusing. To be honest, it is easier to simply check
that this is the correct formula by making a couple of tests in Mathematica. We now
use (3.37) in (3.36). We multiply both sides by sin(km x) and integrate from 0 to L. As
a result, we find
ZL
2
bn = u0 (x) sin(kn x)dx. (3.38)
L
0
Looking at this, however, we realize it is exactly the Fourier sine series of chapter 1.
77
In fact, so we are all in the same page, let us explore this connection in more depth.
Consider the Fourier coefficients of a function f (x), periodic in [−L/2, L/2], defined in
Eq. (1.15):
ZL/2 !
2 2πnx
bn = f (x) sin dx.
L L
−L/2
Now shift the period to be 2L instead of L. We can get that by simply replacing every
L we see by 2L:
ZL πnx
1
bn = f (x) sin dx.
L L
−L
This is already starting to look like (3.38). Finally, let us split f (x) into an even and
an odd part, f (x) = fe (x) + fo (x), where fe (−x) = fe (x) and fo (−x) = − fo (x). We can
always do this for any function.1 Then it follows that
ZL
fe (x) sin(kn x)dx = 0,
−L
since sine is odd and fe (x) is even. Thus, only the odd part of the function contributes
to its Fourier series,
ZL πnx
1
bn = fo (x) sin dx.
L L
−L
Finally, since the integrand is now even (because it is the product of two odd functions),
we can write the integration to be only from 0 to L, and multiply the result by 2. We
then finally arrive at
ZL πnx
2
bn = fo (x) sin dx,
L L
0
which is exactly (3.38). Thus, we conclude that the coefficients bn are nothing but the
Fourier sine series of u0 (x). The reason why the result depends only on the sine series,
and not the cosine, is because of the boundary conditions: we wanted u(0, t) = 0, so a
cosine would never work.
78
���
���
�=�
�(���) ��� � = ����
� = ����
���
� = ���
���
��� ��� ��� ��� ��� ���
�
Figure 3.2: Dashed-black: The function u0 (x) = x(x2 − 3Lx + 2L2 ), with L = 1. Colors: solution
u(x, t) for different times.
where n = 1, 2, 3, . . . and
ZL
2
bn = u0 (x) sin(kn x)dx. (3.40)
L
0
Example: Suppose
u0 (x) = T 0 x(x2 − 3Lx + 2L2 ), (3.41)
which is illustrated by the dashed curve in Fig. 3.2. This mimics something that is hot
in the middle and cold at the boundaries. The Fourier coefficients (3.40), as you may
quickly verify with the help of Mathematica, are
12T 0 L3 12T 0
bn = = 3 .
π3 n3 kn
Thus, the general solution will be
∞ 2
X e−αkn t sin(kn x) πn
u(x, t) = 12T 0 , kn = . (3.42)
n=1
kn3 L
This is illustrated for different times in Fig. 3.2 and as a pretty contour plot in Fig. 3.3.
As can be seen in both figures, due to the negative exponentials in Eq. (3.39), the
solution tends to zero in the long time limit: u(x, t) → 0 when t → ∞. This makes
sense since the boundaries are kept at zero temperature. Thus, the initial temperature
concentration at the middle simply flows out to the boundaries and eventually dies out.
Example: Boxcar Suppose that the initial temperature profile is a boxcar function
somewhere in the middle of [0, L]:
79
Figure 3.3: Same as Fig. 3.2, but as a contour plot in (x, t).
Figure 3.4: Evolution of u(x, t) for u0 (x) being the boxcar function (3.43).
This represents a boxcar of width 2 centered around L/2. The result is presented in
Fig. 3.4. As can be seen, the initially hot part in the middle sort of dissipates in time
and eventually the entire bar reaches zero temperature when t → ∞.
Mismatch between boundary and initial conditions: Suppose the temperature pro-
file is initially given by u0 (x) = x. At x = 0 this will match the boundary condition
u(0, t) = 0, which is supposed to hold for all t. But at x = L the two will never match
because u0 (L) = L, while we are supposed to have u(L, t) = 0. What happens then?
The short answer is that this is an ill-posed problem because it implies that u(L, t) has
to change discontinuously from the value u(L, 0) = L to u(L, t) = 0 for any t > 0. You
are thus trying to solve for something that is unsolvable. The longer answer is that,
surprisingly, the heat equation kind of “adapts” to this situation.
To see what happens, let us look at the what the Fourier series of the initial condition
is doing. The Fourier coefficients bn in Eq. (3.40) become bn = −2(−1)n /kn . Let us
80
Figure 3.5: The Fourier series (3.44), with L = 1, which is trying to simulate a straight line
u0 (x) = x.
This was supposed to be the initial condition u0 (x). But is it? After all, the bn are
chosen precisely so that u(x, 0) = u0 (x) [Eq. (3.39)]. The results for this function, with
the sum involving a finite number of terms, is shown is shown in Fig. 3.5. The series is
trying to approximate the straight line u0 (x) = x. And it does a fairly good job at that,
except at x ∼ L.
At x ∼ L the function oscillates violently. This is the Gibbs phenomenon discussed
in Sec. 1.4. And, what is most important, the series always tends to zero when x → L.
This is seen in the smaller panels of Fig. 3.5, where I make a zoom around this region
and plot the result for different maximum values nmax . As can be seen, the function
wiggles around 1, but when x → L it always eventually falls down and touches zero.
It must do that because Eq. (3.44) is a sum of functions sin(kn x), which are identically
zero at x = L.
The moral of the story is that the Fourier expansion (3.44) cannot describe a func-
tion such as u0 (x) = x. If we wish to do that, we would need a Fourier series involving
sines and cosines. Then why not use cosines? Well... because cosines do not satisfy
the boundary conditions. Cosines fix the initial conditions, but break the boundary
conditions. This is the incompatibility I was talking about: it is impossible to solve the
Dirichlet problem if the initial conditions are not consistent with the boundary condi-
tions.
Well, what if we do it anyway? That is, what if we plug the bn found in (3.44)
into our general solution (3.39)? The result is shown in Fig. 3.6. As can be seen, for
any t > 0 the solution picks up the boundary conditions perfectly and the evolution is
smooth. To summarize, therefore, strictly speaking, it makes no sense to try to solve a
problem where the boundary and initial conditions do not match. That would require
a discontinuity in u(x, t). But still, if you are stubborn and try to do that anyway, you
81
Figure 3.6: Solution of the Dirichlet heat equation with initial conditions u0 (x) = x. For any
t > 0, the function satisfies the initial condition
will still find an answer. The heat equation tends to smooth things out and produce, for
all t > 0, a solution which is smooth and well behaved.
u(0, t) = T 1 , u(L, t) = T 2 .
Instead of looking at the time-dependence, let us first analyze the steady-state. That is,
the solutions u ss after a long-time has passed. In this case ∂u ss /∂t = 0 and so Eq. (3.28)
reduces to
d2 u ss
= 0. (3.45)
dx2
This is a 2nd order linear ODE. The solution is trivial:
u ss = c0 + c1 x,
u ss (0) = c0 = T 1 ,
u ss (L) = c0 + c1 L = T 2 .
82
Suppose T 1 > T 2 (left side hot). Then T 2 − T 1 < 0 and thence J ss > 0. That is, heat
flows from hot to cold. This is a manifestation of the 2nd law of thermodynamics.
Great. Now let’s go back to the full time-dependent problem. We already know
the steady-state solution. To find the full time-dependent solution, we use the magic of
linearity: the function uss (x) solves (3.28). And so does u(x, t) in Eq. (3.39). Hence,
their sum must also be a solution. The general solution will thus be
∞
X 2
u(x, t) = u ss (x) + bn e−αkn t sin(kn x). (3.48)
n=1
This also matches the boundary conditions. The value of the coefficients bn are again
determined from the initial conditions, u(x, 0) = u0 (x). Setting t = 0 yields
∞
X
u ss (x) + bn sin(kn x) = u0 (x).
n=1
ZL h
2 i
bn = u0 (x) − u ss (x) sin(kn x)dx.
L
0
ut = αu xx , (3.49)
u(0, t) = T 1 u(L, t) = T 2 ,
u(x, 0) = u0 (x).
83
where
ZL h
2 i
bn = u0 (x) − u ss (x) sin(kn x)dx. (3.52)
L
0
As you can see, adding non-zero temperatures at the boundaries does not introduce
many changes from a mathematical point of view (although, of course, it completely
changes the physics). For this reason, most textbooks focus on T 1 = T 2 = 0.
ut = αu xx , (3.53)
u x (0, t) = 2 u x (1, t) = 1,
u(x, 0) = 2 − x2 .
u ss (x) = 2 − x.
This result is plotted in Fig. 3.7. As can be seen, the initial parabolic temperature
profile is slowly damped until it becomes the flat line u ss (x) = 2 − x.
ut = αu xx , (3.55)
u x (0, t) = 0 u x (L, t) = 0,
u(x, 0) = u0 (x).
The difference here is in the second line, where we are fixing u x = ∂u/∂x, instead of u.
Since we set both u x (0, t) = u x (L, t) = 0, we are insulating the system, so that no heat
can flow to the outside.
To find the solution, we go back to the separation of variables. The basic solu-
tion (3.31) continues to be valid, with a, b and specially k, still to be determined. We
84
Figure 3.7: u(x, t) from Eq. (3.54), describing the heat equation with Dirichlet boundary condi-
tions u(0, t) = 2 and u(1, t) = 1. Left: as a function of x, for different t; Right: as a
density plot in the (x, t) plane.
now try to use it to impose the boundary conditions. Differentiating with respect to x,
we get
2
u x (x, t) = e−αk t − ak sin kx + bk cos kx .
Imposing u x (0, t) = 0 yields b = 0 so the solutions will be of the form cos kx. Next,
imposing u x (L, t) = 0 implies that
ak sin kL = 0.
Once again, this will be satisfied when k = nπ/L, with n = 1, 2, 3, . . .. Now, however,
there is one extra possibility: we can also have k = 0. When the solutions were sines,
k = 0 was not important since sin(0) = 0. But now that the solutions are cosines, k = 0
is meaningful. Thus, in this case, the set of allowed values of k becomes
nπ
kn = , n = 0, 1, 2, 3, . . . .
L
The general solution will thus be a linear combination of the form
∞
X 2
u(x, t) = an e−αkn t cos(kn x).
n=0
85
Finally, we fix an from the initial conditions u(x, 0) = u0 (x). Again, we have exactly
the Fourier recipe:
ZL
2
an = u0 (x) cos(kn x)dx,
L
0
which holds also for n = 0 (which is why we parametrize the coefficient as a0 /2).
ut = αu xx , (3.56)
u x (0, t) = 0 u x (L, t) = 0,
u(x, 0) = u0 (x),
is
∞
a0 X 2
u(x, t) = + an e−αkn t cos(kn x). (3.57)
2 n=1
where
ZL
2
an = u0 (x) cos(kn x)dx, (3.58)
L
0
86
Figure 3.8: u(x, t) for the Neumann boundary condition.
meaning the bar will tend to have a homogeneous temperature profile, given by the
average of the initial temperature distribution u0 (x). This happens because we insulated
the end-points, so that no heat can flow to the outside world. What the heat equation
predicts, therefore, is that the initial temperature profile u0 (x) will simply be distributed
uniformly through the bar.
Conserved quantity
Let
ZL
Q(t) = u(x, t)dx. (3.60)
0
In the case of particle diffusion, this represents the total number of particles in the
region [0, L]. In the case of heat, it represents the integrated temperature along the bar.
Now we start from the heat equation ut = αu xx and integrate from 0 to L:
ZL ZL
ut (x, t)dx = α u xx (x, t)dx.
0 0
87
What I did here was to “remember” that u xx is the derivative of u x , and so the integral
of a derivative must be the function evaluated at the end points. But now we can relate
this to the heat current J(x, t) = −αu x (x, t). We therefore see that
ZL
α u xx (x, t)dx = J(0, t) − J(L, t).
0
dQ
= J(0, t) − J(L, t). (3.61)
dt
This is a really nice result. It says that the net number of particles Q(t) in the system
can only change due to a flow of particles at the endpoints. Or that the net temperature
of the bar will only change if there is a flow of heat at the endpoints. To analyze the
consequences of this result, we look separately at the Neumann and Dirichlet cases.
Neumann BCs: In this case we explicitly set J(0, t) = J(L, t) = 0. Hence dQ/dt = 0,
so Q is a conserved quantity:
We actually saw this in Fig. 3.8: the initial temperature profile simply spread out
through the system and eventually became uniform. What was not obvious from the
example was that this was indeed a conservation law. That is, that there was indeed a
certain quantity Q, which for Neumann problems do not change in time.
Dirichlet BCs: In this case J(0, t) and J(L, t) are not zero and so Q(t) is not a con-
served quantity. This is because we are keeping the end-points at a fixed temperature,
which can only be done if heat flows from the outside world. Let us take the exam-
ple analyzed in Fig. 3.7. The quantities Q(t), dQ/dt, J(0, t) and J(1, t) are plotted in
Fig. 3.9. What we see is that initially Q(t) changes with time. This is because we start
with a certain number of particles Q(0) = u0 (x)dx, which is allowed to change in
R
time. In this particular case, it decreases, but this is merely due to the choice of ini-
tial conditions. Notwithstanding, what actually matters is that as time progresses, Q(t)
tends to a constant, in general different from zero.. Consequently, dQ/dt → 0. But hav-
ing dQ/dt = 0 does not mean that the currents J ss (0) and J ss (1) are themselves zero.
What it means, according to Eq. (3.61), is that they should be equal: J ss (0) = J ss (1).
This is in fact what we see in the left-most plot of Fig. 3.9.
In the Dirichlet case we must therefore distinguish between the transient and the
steady-state. During the transient Q(t) changes and so all 3 terms in Eq. (3.61) will
be non-zero. But in the steady-state the system will no longer change, so dQ/dt = 0,
meaning that J ss (0) = J ss (L). This type of state is called a non-equilibrium steady-
state (NESS). It is a steady-state because stuff is no longer changing. But it is not in
88
���
����
-��� -���
����
��/��
�� � ��
-��� -���
�
���� J0
-��� -���
J1
-���
��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ���
� � �
Figure 3.9: The quantities Q(t), dQ/dt, J(0, t) and J(1, t) for the same example as in Fig. 3.7.
equilibrium because there are still currents flowing. Conversely, if we happen to have
J ss (0) = J ss (L) ≡ 0, then we say the system is in an equilibrium steady-state (or simply
“equilibrium state” for short). This happens, for instance, if T 1 = T 2 .
ut = αu xx , (3.62)
u(0, t) = 0 u(L, t) = 0,
u(x, 0) = u0 (x).
ut = αu xx ,
u x (0, t) = 0 u x (L, t) = 0,
u(x, 0) = u0 (x).
We are going to show that the solution u(x, t), in both cases, is unique. To do that, we
introduce the notion of “energy”. This is a concept that appears in many PDEs. In
the wave and Schrödinger equations, energy will actually be an “energy”, with a clear
physical interpretation. In the heat equation, not so much: energy for us will only be a
quantity with convenient properties.
Start with ut = αu xx , multiply both sides by u(x, t) and integrate from 0 to L:
ZL ZL
uut dx = α uu xx dx.
0 0
89
Thus,
ZL ZL
d u2
uut dx = dx.
dt 2
0 0
ZL L ZL
uu xx dx = uu x − u2x dx.
0
0 0
The cross term vanishes, in the Dirichlet case because u(0, t) = u(L, t) = 0, and in the
Neumann case because u x (0, t) = uL (0, t) = 0. Thus,
ZL ZL
α uu xx dx = −α u2x dx.
0 0
ZL ZL
d u2
dx = −α u2x dx.
dt 2
0 0
ZL
1
E(t) = u2 dx. (3.63)
2
0
We now use this to show that the solution is unique. We do it for the Dirichlet prob-
lem (3.62). The reasoning for the Neumann problem is absolutely analogous. Suppose
u1 and u2 are two solutions of this problem and define v = u1 − u2 . We are going to
show that v(x, t) = 0 for all x, t, and so u1 = u2 .
90
By linearity, v will satisfy the Dirichlet problem
vt = αv xx , (3.66)
v(0, t) = 0 v(L, t) = 0,
v(x, 0) = 0.
Please take a second to convince yourself that this is true. The important part is the
initial condition, which now reads v(x, 0) = 0. The energy associated to v will thus be
ZL ZL
1 1
E(t) = v(x, t) dx 6 E(0) =
2
v(x, 0)2 dx = 0
2 2
0 0
I know this may seem a bit disappointing and I am pretty sure some of you may be
thinking “this guy is an idiot; he doesn’t know what he is talking about” (maybe you
were already thinking that before). But what you should keep in mind is the difference
between a mathematical blackbox and a physical blackbox. Embrace the former and
avoid the latter. I use the term blackbox here to mean a computer code that performs
some operation, but which you don’t exactly know how it is doing that. Some of these
91
blackboxes may very well be open source (that is, they are not really black), but the
code inside can be so complicated, that it would take too much time to fully understand
the algorithm. So, for all intents and purposes, they behave like blackboxes.
A mathematical blackbox is a computer code that performs a well defined mathe-
matical operation. For instance, finding the roots of a polynomial, or the eigenvalues
of a matrix, or solving an ODE. Codes for this are available in all scientific computing
libraries, such as Mathematica, Scipy/Numpy, Matlab, Julia etc. Physical blackboxes,
on the other hand, solve some specific physical problem. For instance, a physical black-
box may solve Newton’s law for a bunch of particles. You input the mass and initial
coordinates, as well as their interaction potentials between the particles. The code then
integrates Newton’s 2nd law. My general advice is to avoid blackbox libraries of this
type. Build your own. The reason is simple: physical simulations always involve as-
sumptions and approximations. But physical blackboxes don’t make that clear, so you
never know exactly what you are doing. In other words, physical blackboxes make you
dumb. Mathematical blackboxes, on the other hand, are fine: they are performing well
defined operations, not prone to ambiguities.
Even though you don’t need to understand the details of a mathematical blackbox
implementation, you still need to understand the basic idea. This way you can know
when the code is performing well or not. Or when your specific problem may require
some additional ingredient. In this section we are going to discuss the basic ideas
behind the numerical solution of differential equations. PDEs are usually solved by
mapping them into ODEs. And, in the end of the day, both are solved by using discrete
versions of derivatives. So this is where we start.
Finite differences
Consider a function f (x). The derivative of f (x) is defined in a Calculus textbook
as
f (x + ∆x) − f (x)
f 0 (x) = lim . (3.67)
∆x
∆x→0
In a computer there is no such thing as “∆x → 0”; we can choose ∆x to be small, but
it is always finite. The recipes on how to approximate derivatives, using a finite ∆x, go
by the name of finite differences.
The basic idea is to keep track of the error we make when we choose ∆x finite. We
can do that using a Taylor expansion2
1 00 1
f (x + ∆x) = f (x) + f 0 (x)∆x + f (x)∆x2 + f 000 (x)∆x3 + . . . . (3.68)
2 3!
The naı̈ve definition of the derivative, which we call forward difference, is
h i f (x + ∆x) − f (x) 1
∆F f (x) := = f 0 (x) + f 00 (x)∆x. (3.69)
∆x 2
2 Maybe you learned to do Taylor expansions around a fixed point x0 :
1 00
f (x0 )(x − x0 )2 + . . . .
f (x) = f (x0 ) + f 0 (x0 )(x − x0 ) +
2
They are both the same thing. You can get (3.68) by replacing x → x + ∆x and x0 → x.
92
As can be seen, the error we make is proportional to ∆x. We sometimes write this using
a notation introduced by Landau:
h i
∆F f (x) = f 0 (x) + O(∆x). (3.70)
The “big-O notation” means the error we are making is at least of the order of ∆x. The
quantity in front, in this case f 00 /2, may be big or small. But when we say that the error
is of the order of ∆x, it means that if we halve ∆x, we halve the error.
We could also have defined a backward difference:
h i f (x) − f (x − ∆x)
∆B f (x) := . (3.71)
∆x
In the idealized world of calculus, this is entirely equivalent to (3.67) when ∆x is in-
finitesimal. But for finite ∆x it is not. To find the error, we do a Taylor expansion of
f (x − ∆x). This is exactly like Eq. (3.68), but with ∆x → −∆x:
1 00 1
f (x − ∆x) = f (x) − f 0 (x)∆x + f (x)∆x2 − f 000 (x)∆x3 + . . . . (3.72)
2 3!
Thus, we see that
h i 1
∆B f (x) = f 0 (x) − f 00 (x)∆x.
2
The error is once again O(∆x). Although the prefactor differs a little, the order of the
error is the same.
With a bit of tweaking, however, one can come up with a much better differentiation
scheme, with error O(∆x)2 . This is done using centered differences:
h i f (x + ∆x) − f (x − ∆x)
∆C f (x) = . (3.73)
2∆x
How do we know that this is good? We use the Taylor expansions (3.68) and (3.72).
Let me write them side by side, so it is easier for you to see what is happening:
1 00 1
f (x + ∆x) = f (x) + f 0 (x)∆x + f (x)∆x2 + f 000 (x)∆x3 + . . . ,
2 3!
1 00 1
f (x − ∆x) = f (x) − f 0 (x)∆x + f (x)∆x2 − f 000 (x)∆x3 + . . . .
2 3!
If we now subtract the two, the terms proportional to f 00 will cancel and we will be left
with
2
f (x + ∆x) − f (x − ∆x) = 2 f 0 (x)∆x + f 000 (x)∆x3 .
3!
Thus, we see that the centered difference has an error
h i 1
∆C f (x) = f 0 (x) + f 000 (x)∆x2 , (3.74)
3!
which is O(∆x)2 , one order higher than the forward and backward differences. Now if
we halve ∆x, we improve the precision by a factor of 4. This is quite dramatic.
93
What about f 00 (x)? The naive approach would be to apply the forward difference
twice. I will spare you the details of these calculations, as they start to get clumsy real
quick. But if you do it, either by hand or using Mathematica, you will find
h i
∆F ∆F [ f (x)] = f 00 (x) + f 000 (x)∆x.
The error is thus O(∆x). We can definitely do better. One alternative would be to apply
the centered difference twice:
h i f (x + 2∆x) − 2 f (x) + f (x − 2∆x)
∆C ∆C [ f (x)] =
4∆x2
1 (iv)
= f 00 (x) + f (x)∆x2 .
3
The error is O(∆x)2 , which is great. But this now introduces another detail, which we
haven’t looked at so far: This derivative does not use f (x ± ∆x), but further away points
f (x ± 2∆x). We are sort of drifting away fast from the center point x.
It turns out we can get the best of both worlds. We can get an error O(∆x)2 and
still keep the derivative compact, involving only x and x ± ∆x. The trick is to do one
forward and one backward (or vice-versa):
h i h f (x) − f (x − ∆x) i
∆F ∆B [ f (x)] = ∆F
∆x
f (x+∆x)− f (x)
∆x − f (x)−∆x
f (x−∆x)
= (3.75)
∆x
Simplifying the formula, we then find
h i f (x + ∆x) − 2 f (x) + f (x − ∆x)
∆F ∆B [ f (x)] = (3.76)
∆x2
f (iv) (x) 2
= f 00 (x) + ∆x ,
12
where in the last line I used the Taylor expansions of f (x ± ∆x). This is a very famous
formula for the second derivative of a function: left plus right minus twice the center.
And, as we can see, the error is O(∆x)2 and it involves only nearest neighbor points.
Finite differences
A good approximation for the first and second derivatives of a function f (x)
are
f (x + ∆x) − f (x − ∆x)
f 0 (x) ' , (3.77)
2∆x
f (x + ∆x) − 2 f (x) + f (x − ∆x)
f 00 (x) ' . (3.78)
∆x2
94
Both have errors O(∆x)2 .
Systems of ODEs
Consider a 1st order ODE of the form
dy
= g(t, y), (3.79)
dt
where g is an arbitrary function of both t and y (it can be arbitrarily non-linear). It
turns out that most ODEs, even those or order higher than 1, can always be written as
a system of ODEs of the form (3.79). For instance, consider the ODE
ÿ + 2γẏ + ω2 y = f (t),
that was studied in detail last chapter. Define r = ẏ. Then ÿ = ṙ and we can write this
as the system
ẏ = r
ṙ = −2γr − ω2 y + f (t)
tn = n∆t, n = 0, 1, 2, 3, . . . .
We also denote by yn = y(tn ), the variable y at the discrete times. Then the discrete
version of (3.79) would read
yn+1 − yn
= g(tn , yn ).
∆t
This can be used to get an update rule, telling us how to build yn+1 from yn :
95
which is known as Euler’s method .
This method sucks, however. You should never use it. The problem is that we are
using the forward difference, which has a very low precision O(∆t). If you want a much
much better method, with almost minimal effort, use the midpoint method. Here is the
idea. Instead of looking only at times tn and tn+1 = tn + ∆t, we focus on the midpoint,
tn+1/2 = tn + ∆t/2. Of course, that by itself would not give higher precision because if
we simply use halve the interval, the error would be ∆t/2, which is still linear in ∆t.
Instead, the idea is to use a a centered difference (3.73) on this midpoint, which we
know has an error O(∆t2 ):
yn+1 − yn
ẏ(tn+1/2 ) ' . (3.82)
∆t
The denominator is ∆t, instead of 2∆t, because in this case the step is ∆t/2. The
ODE (3.79) establishes that ẏ = g(t, y). Thus, the left-hand side of (3.82) must be
ẏ(tn+1/2 ) = g tn+1/2 , yn+1/2 .
This is not the end of the story, because we still need to know y(tn+1/2 ). We can
estimate it using a Taylor series expansion:
∆t
yn+1/2 ' y(tn ) + ẏ(tn )
2
∆t
= yn + g(tn , yn ) .
2
This is the missing piece to finish the algorithm. Feeding this in the right-hand side
of (3.83) then leads to:
yn+1 = yn + g tn+1/2 , yn + g(tn , yn )∆t/2
Midpoint method
A simple but good method for solving the ODE (3.79) is the midpoint algo-
rithm, based on the following update rule:
96
which shows how to update directly from yn to yn+1 .
The midpoint method serves to illustrate what is one of the basic principles in the
numerical solution of ODEs: To update from yn to yn+1 , we first find intermediate
points, at times t ∈ [tn , tn+1 ] where to evaluate the function (in this case the point
tn+1/2 ). The update then combines yn with function evaluations at these intermediate
points to update to yn+1 . The relative contribution of each midpoint, however, comes
with different weights, for instance in (3.84) there is a factor of 1/2 and in (3.85) there is
none. This is due to the clever use of finite differences to approximate the derivatives.
Ultimately, the factors are carefully chosen to build an algorithm whose error scale
much better with ∆t than the simple Euler step.
This introduces two ideas: the algorithm’s order p and the number of stages s.
An algorithm of order p has an accumulated error that scales as O(∆t) p . Stages, on
the other hand, refer to the number of intermediate function evaluations required for
updating yn to yn+1 . The midpoint algorithm we just presented is therefore of order
p = 2 and also has s = 2 stages. It can be shown that for p = 1, 2, 3, 4, one must
always have s > p. That is, if you want an algorithm of order p = 4, you need at least
s = 4 stages. For p = 5, 6, 7, . . ., things become trickier and it is not known whether
the bound s = p can ever be reached. For instance, all currently known methods of
order p = 8 have at least s = 11 stages.
A very popular 4th order method is the Runge-Kutta 4 (RK4). (It would be
funny to have a 7th order method invented by scientists with initials C and R).
It reads
yn+1 = yn + k1 + 2k2 + 2k3 + k4 ∆t/6,
(3.87)
where
k1 = g(tn , yn ), (3.88)
k2 = g tn+1/2 , yn + k1 ∆t/2 ,
(3.89)
For some people, RK4 is the final word in ODE solving. But modern algorithms
show that one can do better. Much better.
97
����
����
����
�(�)�Δ�
����
����
����
����
� � �� �� �� �� ��
�
Figure 3.10: Red: numerical solution using Mathematica’s NDSolve, of the ODE y0 = y cos(x +
y), starting at y(0) = 0.2. Black: the step sizes at each step of the numerical
solution, which uses adaptive step sizes.
regions, it may change extremely fast. A good algorithm should be able to adapt and
take big step sizes when nothing is happening and small step sizes when the action
really starts. This is not so trivial, because it requires some way of keeping track of
the errors that are being made. But practically all numerical libraries use adaptive step
sizes nowadays. An example is shown in Fig. 3.10. The solution is plotted in red, while
in black I show the size of the step taken at each point. As can be seen, in many regions
the step taken can be very small. And in others very big.
The second major feature of modern algorithms is “multistep”. The RK4 algo-
rithm uses the function evaluated at tn and tn+1/2 to update yn+1 . But then, in the next
step, we don’t use these points again to update from yn+1 to yn+2 . Multistep methods
recycle previous calculations. It helps to think in terms of cost; i.e., computer time.
The costly part of an ODE solver is the evaluation of g(t, y). A method with s stages
will therefore evaluate it s times for each step. Modern algorithms may very well use
s = 10, so that can be a lot. It therefore becomes essential that we recycle evaluations
of g(t, y) from previous steps in the next one. This is the idea behind multistep meth-
ods. I will not write down the explicit algorithm, as they can get a bit ugly. But if you
want, you can check them out here. There are two main multistep methods, Adams
and BDF (backward differentiation formulas, not BTS!). Both methods have varying
orders, but try to use previous steps in future calculations.
The difference between Adams and BDF is that the latter handles stiffness. This
term, although widely used in the literature, does not have a precise definition. But
loosely speaking, a stiff equation is one for which a very small ∆t’s is fundamental.
What I mean is the following. In some systems, big ∆t’s simply mean low precision,
but the general shape of the solution remains the same. But in other systems, big ∆t’s
can lead to completely crazy behavior (like the function diverging, for instance), so
very small ∆t’s need to be used. These systems are said to be stiff and BDF methods
are better at handling them (for somewhat technical reasons that is better not to get
into). Mathematica and Scipy/Numpy use Adams and BDF side by side. Adams is
faster, but BDF is better for stiff systems. Most solvers are in fact smart: they can
detect stiffness and automatically switch from Adams to BDF.
98
To solve PDEs, convert them into ODEs
Let us now talk about PDEs. To be concrete, I will choose the heat equation with
Dirichlet boundary condition as an example:
ut = αu xx , (3.92)
u(0, t) = 0 u(L, t) = 0,
u(x, 0) = u0 (x).
There are many libraries for solving PDEs of this form. They use either one of two
methods: finite differences or finite elements. Both methods convert (3.92) into a
system of coupled ODEs. Finite differences accomplish that by discretizing the deriva-
tives, using the Taylor formulas we developed earlier. Finite elements, on the other
hand, expand u in a certain basis of functions, whose properties are well known. Finite
elements is generally superior. But it is also more sophisticated and takes some effort
to implement. For this reason, I will only talk about finite differences here.
We start by discretizing space into steps of some small number ∆x:
x j = j∆x, j = 0, 1, 2, . . . , N,
where N = L/∆x is a large integer defining the number of points in the spatial grid.
Each position x will have associated to it a temperature
In Eq. (3.92) we have a 2nd derivative, so it is convenient to use Eq. (3.78) to discretize
it
u j+1 − 2u j + u j−1
u xx = ,
∆x2
which we know is O(∆x)2 . This converts the heat equation into a system of coupled
ODEs:
α
u̇ j = u j+1 − 2u j + u j−1 , (3.93)
∆x 2
where I wrote u̇ j for ut (x j , t). A nice feature of this system is that the right-hand side is
linear. We therefore have in our hands a system of linear coupled ODEs, which can be
solved quite easily. Other PDEs, like the Navier-Stokes equation of fluid mechanics,
will in general be non-linear.
In principle there are N + 1 coupled equations in (3.93), for u0 , u1 , . . . , uN . But
we also need to treat the boundary conditions. They read u(0, t) = u0 (t) = 0 and
u(L, t) = uN (t) = 0 where, recall, N was defined so that L = N∆x. Thus, out of the
N +1 points x0 , x1 , . . . , xN in our grid, the two on the boundaries are trivially fixed. That
is, we have in practice only a system of N − 1 coupled ODEs. Moreover, for this same
reason, we have to be a bit careful when we use Eq. (3.93) with j = 1 or j = N − 1. In
the former, since u0 = 0, we should have
α
u̇1 = u2 − 2u1 ,
∆x 2
99
����
���� �������
����
�����
�(���)
Figure 3.11: Numerical solution of the Dirichlet problem (3.92) with u0 (x) = x(1 − x), for t =
0.05, 0.1 and 0.3. Left: The solid lines were obtained using Mathematica’s built-in
NDSolve, while the dots refer to the discretized system of ODEs (3.93), with the
interval [0, 1] discretized with N = 20 points. Right: the difference between the
two solutions.
and for j = N − 1,
α
u̇N−1 =
− 2uN−1 + uN−2 .
∆x 2
To be compact, we usually write only Eq. (3.93), but with the additional caveat that
u0 = uN = 0.
Now that we have converted this into a system of ODEs, we can use the algorithms
discussed previously to solve them. In libraries such as Mathematica, a PDE such as
the heat equation can be solved automatically using NDSolve[]. That is, you don’t
need to do this type of discretization. Mathematica will do it for you, under the hood.
But knowing that this is what is happening is useful. For instance, particular in higher
dimensions, handling less conventional boundary conditions can be hard. That is, if
you want to solve for instance the heat equation for a bar that does not have a simple
shape.
An example of how all this works is shown in Fig. 3.11. See also the accom-
panying Mathematica notebook. What I did was to solve numerically the Dirichlet
problem (3.92) with u0 (x) = x(1 − x). The solid lines are obtained using NDSolve[]
directly. The points, on the other hand, refer to the solution of the discretized sys-
tem 3.93, with N = 20 points. As can be seen, even such a small discretization grid, of
N = 20, is already enough to produce a decent answer. The error between the two is
shown in the right-hand plot and is around 10−4 for initial times.
∂u
= α∇2 u.
∂t
As we have seen in the previous section, the solutions of this equation tend to relax,
in the long-time limit, to a steady-state, which is time-independent. This steady-state
100
must therefore satisfy ∂u/∂t = 0. Or, what is equivalent, Laplace’s equation
∇2 u = 0, (3.94)
subject to the same boundary conditions as the time-dependent problem (but no ini-
tial conditions to worry about). Laplace’s equation therefore describes the steady-state
temperature profile of a material, in any dimension. It also appears in electromag-
netism, with u being now the electrostatic potential.
We already solved Laplace’s equation in 1D and the result was not a lot of fun. In
that case the equation reduced to u xx = 0, so the solution had to be of the form
u(x) = c1 + c2 x,
for constants c1 , c2 . In 2D and 3D, on the other hand, the set of allowed solutions turns
out to be much richer. In this section we are going to study Laplace’s equation in 2D:
∇2 u = u xx + uyy = 0. (3.95)
The first thing to consider in this case, is that the geometry of the system becomes
important. We are going to focus on rectangular systems. That is, some piece of
material with sides L x and Ly , as depicted in Fig. 3.12. This choice really makes a
difference. We could’ve, for instance, solved it for a disk, or some other irregular
shape. The mathematical structure of the solutions would be completely different.
Once the geometry is fixed, the next step is to specify the boundary conditions. For
the rectangle in Fig. 3.12, we have 4 boundaries to specify: bottom, top, left, right. And
the BCs can be either Dirichlet or Neumann. Dirichlet conditions, as before, specify
the value of the function at the boundaries. Something like
The temperature does not have to constant at the boundaries. Thus, at the bottom and
top walls, the BC can be a function of x, and at the left and right walls, it can be a
function of y.
We can also work with Neumann conditions, where instead of specifying the value
of u at the boundaries, we specify its derivative. But now things get trickier because we
have two variables, so which derivative do we specify? Recall that Neumann conditions
are meant to specify the heat flux J = −α∇u. Thus, Neumann conditions usually focus
on the component of the current that is normal to the surface. At the bottom and top
walls, the relevant normal component is uy and at the left and right walls the relevant
101
�
��(�)
��
��(�) ��(�)
�
��(�) ��
Figure 3.12: Rectangular geometry we are going to use to study Laplace’s equation (3.95).
That is, with the bottom wall kept at f0 (x) and the other 3 walls at zero. Similarly, let
ut (x, y) denote the solution of the Dirichlet problem with the top wall being non-trivial
and the other 3 kept at zero:
Now consider the sum of the two, u = ub + ut . This satisfies Laplace’s equation, due to
linearity. Moreover, it satisfies the Dirichlet boundary conditions
Moral of the story: if you want to solve the Dirichlet problem with each wall kept
in a different configuration, first separately solve the problems in which one wall is
non-trivial and all others are kept at zero. Then add up the result.
Separation of variables
The starting point to solve any such problems is, once again, the method of sep-
aration of variables. That is, we look for solutions of the form u(x, y) = X(x)Y(y).
102
Plugging this in (3.95) then yields
X 00 Y 00
=− = a, (3.104)
X Y
where a is a constant, since the left-hand side depends only on x and the right-hand
side only on y. Thus, we find two equations:
I am going to be quite careful about the nature of this constant a. What we know is that
a must be real, since we are interested in real solutions. But it can be either positive or
negative. The goal is to figure out precisely what values of a lead to valid solutions. The
fact that the signs of Eqs. (3.105) and√(3.106) must
√ be different has deep consequences:
√ of X = aX are either e or e− ax , while the solutions of Y 00 = −aY are
00 ax
the√ solutions
ei ay or e−i ay . Thus, the solutions will always be either real or complex exponentials,
depending on whether a itself is positive or negative: if a > 0 the X solution will be
a real exponential and the Y solution will be oscillatory. But if a < 0 then the roles
are inverted. This is the new feature which makes the 2D Laplace equation richer than
its 1D counterpart: now we can construct non-trivial solutions which oscillate in one
direction, but are damped in another. We will see exactly how this unfolds, by looking
at a specific example.
u xx + uyy = 0, (3.107)
We start looking at Eq. (3.105) for X(x). Since u(0, y) = u(L x , y) = 0, this equation
must be subject to X(0) = X(L x ) = 0. The X problem thus reduces to:
This is the same problem found in the 1D heat equation. The solution demands some-
thing that goes up and then goes down. So it cannot be a real exponential, since ex-
ponentials are either monotonically increasing or decreasing. The solution must thus
be oscillatory. That is, a < 0. In fact, we already know from our work on the heat
equation that the solutions only exist provided
!2
nπ
a=− := −kn2 , n = 1, 2, 3, . . . . (3.110)
Lx
and they are of the form X(x) = sin(kn x).
103
Next we turn to Eq. (3.106) for Y(y), which is now modified to
Y 00 = kn2 Y.
Hence, the Y solutions must be real exponentials, either ekn y or e−kn y . We assume linear
combinations of the form
Yn (y) = an ekn y + bn e−kn y ,
where an and bn are constants that will be used to impose the remaining boundary
conditions, u(x, 0) = f0 (x) and u(x, Ly ) = 0. The latter, in particular, forces an ekn Ly =
−bn e−kn Ly , so the solution must have the form
Yn (y) = an ekn y − e−kn y e2kn Ly .
We can make things a bit more symmetrical if we define an = An e−kn Ly /2, which we
can do, of course, since these are just constants anyway. This allows us to write the
solution as
An kn (y−Ly )
Yn (y) = − e−kn (y−Ly ) = An sinh kn (y − Ly ) .
e
2
Combining everything, the general solution will then have the form
∞
X
u(x, y) =
An sinh kn (Ly − y) sin(kn x).
n=1
The factor sinh(kn Ly ) is a bit ugly, but it is just a constant which can be absorbed into
An . It would, in fact, be more convenient if we relabel An → An / sinh(kn Ly ). That is,
write the solution as
∞
X sinh kn (Ly − y)
u(x, y) = An sin(kn x).
n=1
sinh(kn Ly )
We have the freedom to to this because An is a constant. And the only reason we do it
like this is because now the initial condition reads
∞
X
u(x, 0) = An sin(kn x) = f0 (x),
n=1
104
Figure 3.13: Example of the solution (3.114) for u(x, 0) = x(x2 − 3x + 2), with L x = 1, Ly = 1/2.
u xx + uyy = 0, (3.111)
is
∞
X sinh kn (Ly − y) nπ
u(x, y) = An sin(kn x), kn = , (3.114)
n=1
sinh(kn Ly ) Lx
where
ZLx
2
An = f0 (x) sin(kn x)dx. (3.115)
Lx
0
Example: suppose f0 (x) = x(x2 −3x+2) with L x = 1 and Ly = 1/2. Then Eq. (3.115)
yields An = 12/kn3 . The solution is shown in Fig. 3.13. As can be seen, since the other
3 walls (left, right, top) are all kept at 0, the temperature profile of the bottom wall is
attenuated and damped out toward the middle of the slab.
Semi-infinite slab
A particular case of the above solution is when the slab of material is semi-infinite.
That is, when Ly → ∞. Recall that sinh(x) = (e x − e−x )/2. When x is large, the 2nd
exponential becomes negligible and we can thus approximate sinh(x) ' e x /2. Hence,
105
in the limit where Ly is large, the hyperbolic sines in Eq. (3.114) become
ekn (Ly −y)
sinh kn (Ly − y)
' k L = e−kn y .
sinh(kn Ly ) en y
Thus, in this case the general solution (3.114) is simplified to
∞
X nπ
u(x, y) = An e−kn y sin(kn x), kn = , (3.116)
n=1
Lx
whereas the expression for An remain unchanged [Eq. (3.115)]. This result nicely il-
lustrates how we have oscillations on one direction and exponential damping on the
other. This is the cool mechanism that allows us to find such a rich set of solutions to
Laplace’s equation in 2D.
ZLx
2
An = f0 (x) sin(kn x)dx.
Lx
0
ZLx
2
An = f1 (x) sin(kn x)dx.
Lx
0
ZLy
2
An = g0 (y) sin(kn y)dy.
Ly
0
106
Figure 3.14: Solution of BVP0000 ( f0 , f1 , g0 , g1 ) by adding up the solutions for each wall. Each
wall is taken to be in a boxcar, centered around the middle of the slab, of width 0.3
and temperature 1, for bottom and top, and 2 for left and right. The big plot on the
right is what happens when we sum the solutions. I also set L x = 1 and Ly = 1.5.
ZLy
2
An = g1 (y) sin(kn y)dy.
Ly
0
Notice that for the bottom and top walls, the quantization values of the kn depend on L x ,
while in the left and right walls it depends on Ly . An example of how to add up these
solutions to consider more complicated boundary conditions is shown in Fig. 3.14.
u x (0, y) = u x (L x , y) = 0.
This means heat can only flow through the bottom or top walls. To obtain the full
solution, we of course still need to specify two more boundary conditions, at bottom
107
and top. But let us see what general conclusions we can draw by assuming only that
the left and right walls are insulated.
The X part continues to satisfy X 00 = aX, but is now subject to Neumann conditions
X (0) = X 0 (L x ) = 0. An identical problem was studied in the 1D heat equation. The
0
fact that the derivative is zero at the border can only happen for oscillatory solutions
(that is, a < 0). Hence, we redefine a := −k2 . The solutions will then be X(x) =
a cos(kx) + b sin(kx). But X 0 (0) = bk = 0, so we must have b = 0, meaning the
solution in this case has to contain only cosines, X(x) = a cos(kx). Next we impose
X 0 (L x ) = −ak sin(kL x ) = 0, which means we must have
nπ
kn = , n = 0, 1, 2, 3, . . . .
Lx
In this case n = 0 is also included, since it leads to a non-trivial solution X0 (x) = const.
The solutions are thus either X0 (x) = const or Xn (x) = cos(kn x).
Next we turn to Y 00 = −aY = kn2 Y. Now something quite interesting happens: we
need to separately treat n = 0 and n , 0. The reason is that, when n = 0 the ODE for
Y becomes Y 00 = 0, whose solution is Y0 (y) = c1 + c2 y. This is fundamentally different
from the n , 0 case, where the ODE is Y 00 = kn2 Y and thus admits real exponentials
an ekn y + bn e−kn y . Thus, summarizing, the solutions are:
And the general solution will have to be linear combinations of these solutions.
u xx + uyy = 0, (3.121)
u x (0, y) = u x (L x , y) = 0, (3.122)
108
Dirichlet on bottom and top: suppose the bottom and top walls are fixed at u(x, 0) =
T 1 and u(x, Ly ) = T 2 . Imposing this in Eq. (3.123) yields
∞
X
u(x, 0) = c1 + (an + bn ) cos(kn x) = T 1 , (3.124)
n=1
∞
X
u(x, Ly ) = c1 + c2 Ly + (an ekn Ly + bn e−kn Ly ) cos(kn x) = T 2 . (3.125)
n=1
(T 2 − T 1 )
u(x, y) = T 1 + y.
Ly
Dirichlet on bottom and top (inhomogeneous): The last example is a very special
case, where the dependence on x vanishes entirely. Let us consider a more interesting
Dirichlet example. Suppose that u(x, 0) = 0 but u(x, Ly ) = f1 (x). Imposing u(x, 0) = 0
in Eq. (3.123) leads to
∞
X
u(x, 0) = c1 + (an + bn ) cos(kn x) = 0,
n=1
which implies that c1 = 0 and bn = −an . Thus, the solution must have the form
∞
X
u(x, y) = c2 y + an sinh(kn y) cos(kn x), (3.126)
n=1
where I relabeled an → an /2 to make the sinh appear. Next we impose u(x, Ly ) = f1 (x).
This means
∞
X
u(x, Ly ) = c2 Ly + an sinh(kn Ly ) cos(kn x) = f1 (x).
n=1
∞ ZLx
a0 X 2
+ an cos(kn x) = f (x) → an = f (x) cos(kn x)dx.
2 n=1 Lx
0
Thus,
ZLx
1
c2 Ly = f1 (x)dx,
Lx
0
109
Figure 3.15: Example of the solution (3.126), with the left and right walls insulated, with
u(x, 0) = 0 and u(x, Ly ) given by Eq. (3.127).
and
ZLx
2
an sinh(kn Ly ) = f1 (x) cos(kn x).
Lx
0
• Vibrations of a solid;
The solutions of the wave equation itself will be explored in the next section.
110
Electromagnetic waves
Maxwell’s equations for the electric and magnetic fields, E and B, in the absence
of charges or currents, reads
∇ · E = 0, (Gauss) (3.128)
∂B
∇×E = − , (Faraday) (3.129)
∂t
∇ · B = 0, (No magnetic monopoles) (3.130)
1 ∂E
∇×B = (Ampère-Maxwell). (3.131)
c2 ∂t
√
where c = 1/ µ0 0 and µ0 and 0 are the permeability and permittivity of free space
respectively. In the presence of charges or currents, we have to modify Eqs. (3.128)
and Eq. (3.131) respectively.
The following derivation is a bit basic, and you may have seen it before. But I think
it is one of those things we should all do at least once in our lives, so I put it here for
completeness. We start by taking the curl of Eq. (3.129):
∂B ∂ 1 ∂2 E
∇ × (∇ × E) = ∇ − =− ∇×B =− 2 2 .
∂t ∂t c ∂t
We now use the general vector identity
∇ × (∇ × E) = ∇(∇ · E) − ∇2 E,
where ∇2 E is the vector whose i-th component is ∇2 Ei . But because of (3.128), the
first term vanishes and we are thus left only with
1 ∂2 E
−∇2 E = − .
c2 ∂t2
Thus, we conclude that each component of the electric field satisfies the wave equation:
1 ∂2 E
− ∇2 E = 0. (3.132)
c2 ∂t2
The same is also true for the magnetic field. You can check it by taking instead the curl
of Eq. (3.131), and then repeating the same procedure.
We therefore conclude that each component of the electric and magnetic fields sat-
isfy the wave equation.
Vibrating string
Next we consider a vibrating string. Imagine a guitar string in one dimension,
along direction x. It stays perfectly flat if we don’t play it, or vibrate in the direction
u perpendicular to x, when we do (Fig. 3.16). The motion of the string can thus be
described by the function u(x, t), which says how much the string is pushed away from
111
� �� ��
�� ��
�� ��
�
�� �� �� �� �� ��
� →
��
θ�
θ� →
��
�
��
equilibrium at point x in space and time t. We can prove that if the displacement is
small, u(x, t) will satisfy the wave equation. The basic idea is to assume that each atom
in the string can only move up or down (i.e. along the u direction) and never left or
right (along the x direction). This is called the transverse vibration approximation. It
essentially means the vibrating string cannot transport matter from left to right. It can
only wiggle up and down. What is fun about waves is that notwithstanding, one can
still transport energy, as we will see.
We discretize the string by considering small pieces of length ∆x. The interval
[0, L] is discretized as
x j = j∆x, j = 0, 1, 2, . . . , N,
where N = L/∆x. We let u j (t) = u(x j , t) denote the displacement of the small piece of
string located at position x j . The mass of this string is m ' ρ∆x, where ρ is the density
of the string. In the end of the calculation, we will take the limit ∆x → 0.
The basic force diagram is illustrated in Fig. 3.17. The only forces acting on the
piece of string u j are the tensions associated to the piece to its left (u j−1 ) and to its right
(u j+1 ). Other forces, such as gravity, can be neglected. This makes sense if you think
about a guitar string, which is highly tensioned. The horizontal and vertical forces
112
acting on this small piece will thus be
Fh = T 2 cos θ2 − T 1 cos θ1 ,
Fv = T 2 sin θ2 − T 1 sin θ1 ,
where T i = |T~i | is the magnitude of each tension and θ1 and θ2 are the angles that the
string makes to the left and right.
The horizontal force would cause the atoms in the string to move to the left or
right. Since we are assuming the vibrations are transverse, this cannot happen so the
horizontal components must vanish,
This is one of the basic assumptions in the derivation and is a very special property of
transverse vibrations. It means that the horizontal tension is the same on the left and
right sides and, hence, must be the same at all points x j of the string. This quantity T
is thus just a property of the string. It is related, for instance, to how hard you stretched
the string of your guitar when you first set it up.
We can use Eq. (3.133) to write T 2 = T/ cos θ2 and T 1 = T/ cos θ1 . This allows us
to express the vertical component of the force as
h i
Fv = T tan θ2 − tan θ1 . (3.134)
The reason why this is convenient is because, as the drawing in Fig. 3.17 indicates, the
angles θ1 and θ2 are related to the slope between u j and its neighbors u j±1 :
u j+1 − u j
tan θ2 = ,
∆x
u j − u j−1
tan θ1 = .
∆x
This part is a bit confusing, I know. Please take a second to make sure you understand.
Looking at Fig. 3.16 helps. In any case, this allows us to finally conclude that
" #
u j+1 − u j u j − u j−1
Fv = T −
∆x ∆x
h u j+1 − 2u j + u j−1 i
= (T ∆x)
∆x2
Here I also multiplied and divided by ∆x because the quantity in square brackets then
becomes exactly the second derivative of u in Eq. (3.75). Whence
h u j+1 − 2u j + u j−1 i
Fv = (T ∆x) ' (T ∆x)u xx . (3.135)
∆x2
This vertical force will cause u(x, t) to move up and down according to Newton’s
2nd law,
d2 u j
m 2 = Fv ,
dt
113
where m = ρ∆x. Using our just obtained formula for Fv , the factors of ∆x cancel out
and we are left with
d2 u j h u j+1 − 2u j + u j−1 i
ρ 2 =T . (3.136)
dt ∆x2
Or, in the limit ∆x → 0,
The string thus obeys the 1D wave equation. And, what is more, we have found a neat
microscopic interpretation for the constant c. As we saw in the electromagnetic case,
c is the velocity of propagation of the waves. The relation c2 = T/ρ thus makes sense:
the waves propagate faster if the tension in the string is higher (T large) and if the string
is light (ρ small).
Pragmatic derivation: Start with Eq. (3.137), multiply by ut on both sides and inte-
grate from 0 to L:
ZL ZL
utt ut dx = c2 u xx ut dx.
0 0
I can put the d/dt outside since the integral is on x and I am assuming that u is smooth.
We now integrate the term on the right by parts. Integration by parts transfers the
derivative, from one function to another, plus a boundary term:
ZL L ZL
u xx ut dx = u x ut − u x utx dx,
0
0 0
114
where the last term involves utx because we transferred the derivative from u xx to ut .
The first term is what we call a boundary term. It involves the function and its deriva-
tives, evaluated at the boundaries. Since the strings are clamped at the boundaries
(u(0, t) = u(L, t) = 0), this term will be zero, leading to
ZL ZL ZL
1 d
u xx ut dx = − u x utx dx = − (u x )2 dx.
2 dt
0 0 0
ZL
d u2t u2
+ c2 x dx = 0. (3.139)
dt 2 2
0
ZL
1
E= u2t + c2 u2x dx, (3.140)
2
0
so that dE/dt = 0.
In the case of the heat equation, energy was in general not conserved. But for the wave
equation it always is. This is a consequence of the 2nd derivative in time. Eq. (3.140) is
extremely useful, as it says that during the evolution there is this thing which is always
constant. But the downside of the above derivation is that it does not clarify what this
energy really means. For that, we move on to the 2nd derivation.
Mechanical derivation: We go back to the discretized version of the problem, Eq. (3.136).
This can be viewed, from a classical mechanics perspective, as a system of N − 1 par-
ticles u1 , u2 , . . . uN−1 interacting with with each other through conservative forces. The
total energy will then be the sum of kinetic and potential energies:
N
1X
E= ρ∆xu̇2j + V(u1 , . . . , uN−1 ),
2 j=0
where V(u1 , . . . , uN−1 ) is the potential energy function, which is such that
∂V h u j+1 − 2u j + u j−1 i
− = Fv, j = T ∆x .
∂u j ∆x2
We now need a bit of reverse engineering: what is the function V(u1 , . . . , uN−1 ), which
is such that when we differentiate with respect to a given u j , yields the expression
115
above? I will tell you the answer and then we can check:
N
T X
V(u1 , . . . , uN−1 ) = (u j − u j+1 )2 , (3.141)
2∆x j=0
with the proviso that u0 = uN = 0 (to fix the boundary conditions). To see why, choose
a certain j to differentiate. Say j = 42. There are only be two terms in this sum that
contain u42 ; the term (u42 − u43 )2 and the term (u41 − u42 )2 . Thus,
∂V T h i
= (u42 − u43 ) − (u41 − u42 ) ,
∂u42 ∆x
which is exactly what we are looking for.
The energy of the system will therefore be
N N
1X T X
E= ρ∆xu̇2j + (u j − u j+1 )2 .
2 j=0 2∆x j=0
This is now exactly in the form of the Riemann sum you learned in Introductory Cal-
culus: X Z
lim f (x j )∆x = f (x)dx.
∆x→0
xj
Thus, in the limit ∆x → 0 we will simply get the integral of whatever is inside the sum.
The first term is simply u̇ j → ut , the first time derivative. And the second term contains
(u j+1 − u j )/∆x → u x . Thus, we finally arrive at
ZL
1
E= ρu2t + T u2x dx. (3.142)
2
0
This is now almost the same as (3.140), differing only due to the pre-factor ρ. The rea-
son for this discrepancy is that Eq. (3.140), actually, is not really an energy (it does not
have energy units), while (3.143) is. This happened because the pragmatic derivation
used only the wave equation ut = c2 u xx , where ρ is hidden in c2 . Usually we don’t
really care about this discrepancy, though: Eq. (3.140) is still the energy, just rescaled
to units of ρ.
116
Figure 3.18: A one dimensional bar depicted as a system of particles coupled by springs.
Vibrations of a solid
Another major application of the wave equation is in the description of the vibra-
tions of a solid. Imagine a long and thin bar, made e.g. of steel or something. If you
now go on one side and hit it with a hammer, the bar will vibrate and this vibration will
propagate through the solid. It can even be felt from the other side.
These vibrations are different from the ones of a string that we just studied. The
reason is that they are longitudinal, instead of transverse. We can imagine the solid as a
bunch of atoms, each sitting in a certain equilibrium position labeled as x1 , x2 , x3 , . . . , xN .
For simplicity we are going to think in 1D, as in Fig. 3.18. if nothing is vibrating, the
atoms are just standing still. But whey the start to vibrate, they will be displaced, very
slightly, from their equilibrium positions. We call u(xn , t) the displacement of atom
n from its equilibrium position xn . Notice how everything is 1D. This u(xn , t) refers
to the displacement to the left or right. This is different from Fig. 3.16, where the
displacement was up and down.
The atoms are bound together by chemical forces, which in general can be quite
complicated. But lucky for us, that does not matter. What matters is that the positions
x1 , x2 , . . . correspond to the equilibrium configuration of these forces. That is, they are
the positions where everything balances out. If an atom now moves a bit, its neighbors
to the left and right will tend to push it back. Try to picture this like people on a
crowded subway (wearing masks because of Covid). For instance if u(xn ) > 0, then the
atom at xn+1 will tend to push the atom at xn to the left and the atom at xn−1 will push
it to the right. If u(xn ) < 0 the situation reverses. The force acting on atom xn by the
atom at xn+1 can be approximated by a harmonic force, with spring constant k. That is,
something like h i
Fn+1→n = k u(xn+1 , t) − u(xn , t) .
The rationale is as follows: the force is zero only if both are in their equilibrium posi-
tions, u(xn ) = u(xn+1 ) = 0. If u(xn+1 ) = 0, the the force must have the opposite sign of
u(xn ), because it is supposed to be a restoring force (it pushes in the direction opposite
of the motion). Conversely, if u(xn ) = 0, the force should have the same sign as u(xn+1 )
because if u(xn+1 ) < 0, it should push u(xn ) to the left and vice-versa. Just like people
in a crowded subway.
Similarly, the force of atom n − 1 on n will be
h i
Fn−1→n = k u(xn−1 , t) − u(xn , t) .
The same logic applies. Thus, Newton’s 2nd law for atom n reads
h i
mutt (xn , t) = k u(xn+1 , t) − 2u(xn , t) + u(xn−1 , t) ,
117
where m is the mass of the atom. The thing on the right almost looks like a 2nd
derivative. But here things are still discrete. We will only get a wave equation if we
look at the dynamics from farther away. This is called a coarse graining. We can make
a 2nd derivative appear by dividing on both sides by ∆x2 , where ∆x is the equilibrium
spacing of the atoms. Then we can approximate
u(xn+1 , t) − 2u(xn , t) + u(xn−1 , t)
' u xx (x, t).
∆x2
and so our equation of motion becomes
1 k∆x2 E
utt = u xx , c2 = := , (3.144)
c2 m ρ
where ρ := m/∆x is the density of the bar, just like in Eq. (3.137). The quantity E :=
k∆x, on the other hand, is called the Young modulus and is a property characterizing
the elasticity of the material. For instance, steel has a Young modulus which is 3 times
that of aluminum. But the density of steel is also almost 3 times higher, so that the
speed of propagation of waves in steel and aluminum are somewhat close.
utt = c2 u xx , (3.145)
u(0, t) = 0 u(L, t) = 0,
for given functions, f and g. As always, we solve this using separation of variables, by
setting u(x, t) = X(x)T (t) (Sec. 3.2). This yields
T̈ X 00
= = −k2 ,
c2 T X
where k is a constant to be determined. This results in the pair of equations
X 00 = −k2 X, (3.146)
T̈ = −c2 k2 T. (3.147)
118
Since the spatial part is subject to the boundary conditions X(0) = X(L) = 0, we obtain
the familiar family of solutions
nπ
Xn (x) = sin(kn x), kn = , n = 1, 2, 3, . . . (3.148)
L
The time-part, on the other hand, will have solutions
for constants an and bn . You can also use complex exponentials if you prefer. For the
present problem, sines and cosines are a bit more convenient.
The general solution will thus be a linear superposition of these basic solutions:
∞
X
u(x, t) = an cos(ckn t) + bn sin(ckn t) sin(kn x). (3.149)
n=1
Finally, the values of an and bn are determined from the initial conditions:
∞
X
u(x, 0) = an sin(kn x) = f (x), (3.150)
n=1
∞
X
ut (x, 0) = ckn bn sin(kn x) = g(x). (3.151)
n=1
ZL
2
bn = g(x) sin(kn x)dx.
Lckn
0
which are nothing but the Fourier sine coefficients of f (x) and g(x) on [0, L]. Actually,
to be precise, the Fourier sine coefficient of g(x) is ckn bn , not bn . With this we arrive at
the following general solution:
119
Figure 3.19: Example solution of the wave equation, Eq. (3.152), for f (x) = x(x3 − 2Lx2 + L2 )
and g(x) = x(L − x). The plot was made with L = c = 1.
where
ZL
2
an = f (x) sin(kn x)dx (3.153)
L
0
ZL
2
bn = g(x) sin(kn x)dx, (3.154)
Lckn
0
are the Fourier sine coefficients of f (x) and g(x) on [0, L]. Some conditions
must be met by f (x) and g(x) for this solution to actually be valid. These will
be discussed below.
Example: We take f (x) = x(x3 − 2Lx2 + L3 ) and g(x) = x(L − x). The solution is
shown in Fig. 3.19. Since (3.152) is composed only of oscillatory functions, there is no
damping and the solution just keeps repeating itself over and over. See also this GIF.
This is very different from the heat equation in Sec. 3.4, which was damped in time.
ZL
1
E= u2t + c2 u2x dx. (3.155)
2
0
We showed that the energy is a constant of motion, dE/dt = 0. Moreover, from its
definition, we can see that E > 0.
120
�=� �=� �=� �=� �=�
��� ��� ��� ��� ���
��� ��� ��� ��� ���
��� ��� ��� ��� ���
-��� -��� -��� -��� -���
-��� -��� -��� -��� -���
� ��� ��� ��� ��� � � ��� ��� ��� ��� � � ��� ��� ��� ��� � � ��� ��� ��� ��� � � ��� ��� ��� ��� �
�/� �/� �/� �/� �/�
Now suppose we found two solutions, u1 and u2 , of the boundary value prob-
lem (3.145), and define u = u1 − u2 . Since the wave equation is linear, u is also
a solution of utt = c2 u xx . Moreover, it is subject to the same boundary conditions
u(0, t) = u(L, t) = 0. But, since it is the difference between u1 and u2 , it is now subject
to the initial conditions u(x, 0) = ut (x, 0) = 0. Thus, at time t = 0 the energy contained
in u will be E = 0. And since energy is a constant of motion, it has to remain zero for
all times. But E is an integral of positive quantities and so the only way it can be zero
is if the integrand itself is zero. Hence, we conclude that u(x, t) = 0 for all times, which
implies u1 = u2 . That is, the solution is unique.
The above reasoning is actually a very powerful method for establishing the unique-
ness of solutions. Please take a second to make sure you understand it.
The solution (3.152) is given by an infinite sum of terms. Each term,
un (x, t) = an cos(ckn t) + bn sin(ckn t) sin(kn x), (3.156)
is called the n-th harmonic of the string. For each harmonic, there are certain special
positions x where un = 0 at all times. This always happens at the boundaries, un (0, t) =
un (L, t) = 0, of course. But it also happens if kn x = `π, for some integer `. These points
are called nodes. Since kn = nπ/L, this implies x = (`/n)L. Of course, we must also
have x ∈ [0, L], so the number of integers ` which satisfy this is limited by n. That is,
the n-th harmonic will have n − 1 nodes, located at
The first harmonic, n = 1 has no nodes. The second harmonic has 1 node and the third
harmonic has 2. This is illustrated in Fig. 3.20.
Each harmonic also evolves in time. The oscillation frequency is seen to be
ωn = ckn (3.158)
121
I will leave it for you as an exercise to check that, if we plug (3.156) and carry out the
integral, we are left with
En = L(a2n + b2n )ω2n (3.159)
The energy density, En /L, is therefore directly proportional to the frequency ωn (and to
the constants an , bn that determine the amplitude of that harmonic). Higher harmonics
are thus more energetic. The total energy in the string is then simply the sum of the
energy contained in each harmonic,
∞
X ∞
X
E= En = L (a2n + b2n )ω2n , (3.160)
n=1 n=1
which is nothing but a slightly modified version of Parseval’s identity (Sec. 1.6).
These restrictions are required for essentially two reasons: (i) to ensure that u satisfies
the boundary conditions u(0, t) = u(L, t) = 0 at all times and (ii) to ensure that second
derivatives utt and u xx actually make sense. For instance, if f (x) is not twice differen-
tiable, then at the initial times how can we compute u xx (x, 0)? It can be shown (see
Trench, Sec. 12.3) that if these restrictions apply, then (3.152) will be a valid solution
of (3.145) (we have already shown that it is unique from the energy argument).
But what if f and g violate some of these conditions? There are, in fact, physically
important problems which do. One is the so-called plucked string, given by g(x) = 0
and
x/a
0 6 x 6 a,
f (x) =
(3.166)
(L − x)/(L − a) a 6 x 6 L,
(see Fig. 3.21). This is what happens if you start the string at rest (g(x) = 0), but stretch
it in a certain point a, like we would do in a guitar or harp. This f (x) is not twice
differentiable, so strictly speaking a solution of the boundary value problem (3.145)
with this initial condition does not exist.
This is a bit frustrating. What is going on? A pragmatist would argue that this is
a mathematical pathology. When we pluck a string, the profile f (x) does not actually
122
���
���
���
�(�)
���
���
���
��� ��� ��� ��� ��� ���
�
have a kink, like Fig. 3.21, but curves smoothly. Thus, a “real” f (x) would be twice
differentiable.
The naive approach, on the other hand, would be to simply say “I don’t care”: I’ll
just plug this f (x) in Eqs. (3.152) and (3.153) and see what happens. Doing that we
find that the an are given by
2
an = sin(kn a), (3.167)
a(L − a)kn2
while bn = 0 since g(x) = 0. Thus Eq. (3.152) becomes
∞
2 X 1
u(x, t) = sin(kn a) cos(ckn t) sin(kn x) (3.168)
a(L − a) n=1 kn2
This doesn’t look too bad. See, for instance, the following GIF. The solution is kind
of weird, but it does evolve in an oscillatory fashion, as one would expect from the
wave-equation.
It turns out that the wave equation admits something called “Sobolev generalized
solutions”. And the plucked string is one such example. These are not actual solutions
of the PDE, but they are solutions of another integral equation associated to it. This
is why, when we make a plot, they still like they are working just fine. For more
information, see Djairo Sec. 5.11.
denote the Fourier series of the initial conditions f (x) and g(x) [c.f. Eqs. (3.153)
and (3.154)]. The reason why I don’t call this f (x) and g(x) is because, strictly speak-
123
ing, these two functions only need to be defined in [0, L], while S f and S g are defined
for any x (being, of course, periodic with period L).
Now let us go back to the general solution (3.152) and use the trigonometric iden-
tities:
1h i
sin(x) cos(y) = sin(x − y) + sin(x + y) , (3.171)
2
1h i
sin(x) sin(y) = cos(x − y) − cos(x + y) . (3.172)
2
We then get
∞ ∞
X 1X h i
an cos(ckn t) sin(kn x) = an sin(kn (x − ct)) + sin(kn (x + ct))
n=1
2 n=1
1h i
= S f (x − ct) + S f (x + ct)
2
Similarly,
∞ ∞
X 1X h i
bn sin(ckn t) sin(kn x) = bn cos(kn (x − ct)) − cos(kn (x + ct))
n=1
2 n=1
This doesn’t look at all like S g . To make it appear, we use the following trick:
Zx+ct
1h i
sin(kn y)dy = cos(kn (x − ct)) − cos(kn (x + ct)) .
kn
x−ct
Combining everything, we see that the general solution (3.152) can also be written as
Zx+ct
1h i 1
u(x, t) = S f (x − ct) + S f (x + ct) + S g (y)dy,
2 2c
x−ct
Zx+ct
1h i 1
u(x, t) = S f (x − ct) + S f (x + ct) + S g (y)dy. (3.173)
2 2c
x−ct
124
��� �
��� �(�)
��� �(�+��) �(�-��)
�(�)
���
���
���
��� ��� ��� ��� ��� ��� �
� (���)
Figure 3.22: (Left) A localized initial condition f (x) propagates to the left and right according
to Eq. (3.173). (Right) The light cone, described by the straight lines x − ct and
x + ct.
where
∞ ZL
X 2
S f (x) = an sin(kn x), an = f (x) sin(kn x)dx, (3.174)
n=1
L
0
∞ ZL
X 2
S g (x) = ckn bn sin(kn x), bn = g(x) sin(kn x)dx, (3.175)
n=1
Lckn
0
125
t, the solution u(x, t) at a certain point x will have been influenced only by the parts of
the initial conditions f (x0 ) and g(x0 ) if that point x0 is inside the cone defined by x − ct
and x + ct. Or, put it differently, the effects of the initial conditions f (x0 ) and g(x0 ) at a
given point x0 will only start to influence u(x, t) after at least a time t = (x − x0 )/c. The
influence of the initial conditions therefore travel at a speed c, which is why we call c
the speed of propagation of the waves (mind-blowing!). We use the term “light-cone”
because light satisfies the wave equation. But, of course, this also holds for strings,
where c is the speed of propagation, which depends on the properties of the string (and
has nothing to do with light).
utt = c2 u xx , (3.176)
u(x, 0) = f (x),
ut (x, 0) = g(x).
ut = αu xx ,
u(x, 0) = f (x).
Or we can study a free quantum particle in 1D, which obeys Schrödinger’s equation,
∂ψ ~2 ∂2 ψ
i~ =− , (3.177)
∂t 2m ∂x2
ψ(x, 0) = f (x),
where m is the mass and ~ is Planck’s constant. In any case, we still have to specify the
initial conditions (2 in the case of waves, 1 in the case of diffusion and Schrödinger).
But we do not impose any boundary conditions. Problems of this form are called
Cauchy problems.
The best tool for handling Cauchy problems is the Fourier transform, which will
be the subject of the next chapter. Here, I just want to prepare the terrain. More specif-
ically, what I would like to do is to compare the wave Eq. (3.176) with Schrödinger’s
Eq. (3.177). In fact, Schrödinger himself talked about the solutions of his equation as
describing matter waves. So it is only fair we compare them.
126
Wave equation and plane waves
Let us start with the Cauchy problem for the wave equation (3.176). We search for
solutions of the form u(x, t) = Aei(kx−ωt) , where k and ω are to be determined (and A is
just a silly constant). Plugging this in (3.176) yields
A(−ω2 ) = c2 A(−k2 ).
We therefore see that this will indeed be a solution, but only if k and ω are related by
ω = c|k| (3.178)
This is very similar to what we found last section, Eq. (3.158). But in that case, due
to the boundary conditions, the allowed values of k were discrete. Here, because there
are no boundary conditions, any k ∈ R provides a valid solution.
The fact that now k can vary continuously is a bit of a complication, because we
cannot write the general solution as a sum of these basic building blocks. Instead, it
must be an integral. This will be the main motivation to define a Fourier Transform in
the next chapter, which is the continuous analog of a Fourier series.
A solution like ei(kx−ωt) is called a plane wave. The quantity k is called the wave
vector and a relation like (3.178) is called a dispersion relation. That is, the disper-
sion relation establishes how frequency is related to wave vector. In high school you
probably learned that for light the wavelength λ was related to the frequency ν accord-
ing to ν = c/λ. This is actually the same as (3.178) because now we are using angular
frequency, ω = 2πν. Moreover, the wave vector is related to the wavelength according
to k = 2π/λ.
Schrödinger equation
Next let us try a plane wave solution ψ = Aei(kx−ωt) for Schrödinger’s equation. This
yields
~2
A(i~)(−iω) = − A(−k2 ).
2m
It therefore also works, but only if ω and k are related via
~k2
ω= . (3.179)
2m
To understand what this means, let us recall that Planck’s constant has the value
It thus has units of energy times seconds. Hence, E = ~ω will have units of energy.
Moreover, [k] = 1/m and so [~k] = kg m/s, which are units of momentum. This
suggests the association
E = ~ω, p = ~k. (3.181)
Using this, Eq. (3.179) becomes
p2
E= , (3.182)
2m
127
which is nothing but the usual energy-momentum relation of classical mechanics.
Dispersion relations are therefore a synonym of energy-momentum relations. Fre-
quency is just an ~ away from energy. And wave vector is just an ~ away from momen-
tum. This finally explains why, all the way back in chapter (1), I called the quantities
kn = nπ/L appearing in Fourier series, as “momentum”.
This also motivates us to do the same for the wave equation dispersion relation (3.178).
This yields
E = c|p|, (3.183)
which is the energy-momentum relation for photons. As you will learn when you study
Quantum Physics, a photon of frequency ω has energy E = ~ω. Moreover, even though
photons have no mass, they still carry momentum. I know this may seem weird since
we always think of momentum as mass times velocity. But momentum is actually more
general and you can have momentum even if m = 0.
A key distinction between waves and matter waves, therefore, is in the energy-
momentum dispersion relation. Photons have a linear dispersion E ∝ |p|, while for
matter waves this is quadratic, E ∝ p2 . It turns out that these are both limiting cases of
a more general dispersion relation
q
E = m2 c4 + p2 c2 , (3.184)
called the relativistic dispersion relation. If m = 0, this yields the photon case (3.183).
Conversely, if the momentum is small, pc mc2 , we can write
q
E = mc2 1 + p2 /(m2 c2 )
√
and use the series expansion 1 + x ' 1 + x/2, which yields
or
p2
E ' mc2 + . (3.185)
2m
The first term is called the self-energy: it is the energy that the particle has even if p =
0. And the second term is the Schrödinger dispersion relation (3.179). Schrödinger’s
equation is thus valid in the non-relativistic limit. That is, it is valid for motion with
pc mc2 .
Klein-Gordon equation
You may also be wondering, which kind of PDE yields the general relativistic dis-
persion relation (3.184). The answer is Klein-Gordon’s equation
utt = c2 u xx − µ2 u. (3.186)
where µ = mc2 /~. I will leave it for you to check that this admits plane wave solutions
ei(kx−ωt) , but with
ω2 = c2 k2 + m2 c4 /~2 . (3.187)
128
Multiplying by ~ on both sides yields (3.184). Eq. (3.186), or some variations of it, are
in fact one of the basic equations obeyed by many elementary particles.
When Schrödinger was trying to derive his equation, he actually first considered
the Klein-Gordon equation (3.186). But he found a problem: this equation does not
admit a continuity equation. We discussed continuity equations in Sec. 3.4. They have
the form
∂q
= −∇ · J , (3.188)
∂t
where q is a “charge” and J is a current. Continuity equations happen naturally when
the PDE is 1st order in time. But they don’t hold when it is 2nd order. Schrödinger
knew that.
Continuity equations ensure that particles are not spontaneously created or de-
stroyed. Schrödinger was interested in describing electrons, protons and atoms, which
cannot simply disappear out of nowhere. The wave equation can only describe pho-
tons, which can be spontaneously destroyed or created. Schrödinger therefore realized
that his equation had to be 1st order in time. But it was also supposed to describe wave
motion. And we know that a 1st order PDE like the heat equation cannot describe
waves. So what to do? This was Schrödinger’s big eureka moment: introduce complex
numbers. Write down a 1st order PDE, but put an i in front to turn real exponentials
into complex exponentials.
Despite its enormous success, Schrödinger’s equation only describes non-relativistic
motion. How to describe relativistic particles? This only came a few years later with
Dirac. He realized that the only way of doing this was to introduce an additional level
of complexity: the function ψ could not be a number; it had to be a vector. And what is
funnier, it has to be a vector of dimension 4. Dirac’s equation for a free particle reads
ψ1 ψ4 −ψ4 ψ3 ψ1
ψ ψ ψ −ψ ψ
i~∂t 2 = −i∂ x 3 + ∂y 3 − i∂z 4 + m 2 (3.189)
ψ3 ψ2 −ψ2 ψ1 −ψ3
ψ4 ψ1 ψ1 −ψ2 −ψ4
Pretty funny eh? Of course, I will not have the time for us to go into more detail about
Dirac’s equation. But I wanted you to see how a deep understanding of PDEs can aid us
in formulating fundamental laws of physics. I think this is really beautiful and shows
how something seemingly simple, like the order of a derivative, can have a profound
influence on how we describe physical systems.
129
Chapter 4
Fourier Transforms
4.1 Introduction
Let f (x) be an arbitrary function of x. It doesn’t have to be periodic or even real.
We define its Fourier Transform (FT) as
Z∞
1
f˜(k) = √ f (x)e−ikx dx. (4.1)
2π
−∞
The FT integrates over x, but introduces this new parameter k, so that the result is a
function of k. When we take the Fourier transform of a function, we like to say we
moved to Fourier space. Real space has x. Fourier space has k. As we will see
soon, when x is position, this k has the interpretation of momentum or wave vector.
Conversely, when dealing with functions of time, f (t), we usually use ω instead of k:
Z∞
1
f˜(ω) = √ f (t)e−iωt dt. (4.2)
2π
−∞
Mathematically speaking, this is silly: we can call this parameter whatever we want.
But physically, we like to keep the correspondence x ↔ k and t ↔ ω.
The Fourier Transform is an invertible transformation. That is, we can always go
back and find f (x) from f˜(k). This is called the Inverse Fourier Transform and is
given by
Z∞
1
f (x) = √ f˜(k)eikx dk. (4.3)
2π
−∞
The integral is now in dk and all that changes is the sign in the exponential. It is not at
all obvious why Eq. (4.3) works; but we are going to prove it below.
The Fourier transform generalizes the notion of Fourier series to functions which
are not necessarily periodic. When dealing with Fourier series, we had a set of param-
eters cn . In the Fourier transform this becomes a continuous function f˜(k). We can
130
��� (�) ���� (�)
���
���
� ����
��(�)
��(�)
���
��� ����
���
��� ����
���
-� -� -� � � � � -�� -� � � ��
� �
Figure 4.1: (a) The boxcar functions (4.4) for different values of a. (b) The corresponding
Fourier transform (4.5). When f (x) is wide, f˜(k) tends to be thin and vice-versa.
move from series to transform if we take f (x) to be a periodic function, but with period
L → ∞. We will do this in a second. But before, let us work out some examples.
Boxcar
Consider a boxcar centered around 0, with width 2a and height 1/2a (so that the
area under the curve is 1):
1
Ba (x) = θ(x + a)θ(a − x), (4.4)
2a
where θ(x) is the Heaviside theta function. See Fig. 4.1(a). The Fourier transform (4.1)
yields
Za
e−ika − eika
!
1 1 −ikx 1
B̃a (k) = √ e dx = √ .
2π 2a 2π2a −ik
−a
Dirac δ-function
The Fourier transform of a Dirac δ function is
Z∞
1 1
δ̃(k) = √ δ(x)e−ikx dx = √ . (4.6)
2π 2π
−∞
It is thus a constant, independent of k. The δ function is the sharpest function one can
have. And we see that its FT is the widest function one can have: that is, a constant
on the entire real line. We can also see Eq. (4.6) as a particular case of the boxcar
131
example (4.4). Recall that δ(x) is the limit of Ba (x) when a → 0. And indeed, if we
take the limit of a → 0 of Eq. (4.5) and use the famous limit sin x/x → 1, we also get
1/2π.
Now that we have δ̃(k), we can also recover δ(x) using the Inverse FT (4.3):
Z∞
1
δ(x) = eikx dk. (4.7)
2π
−∞
which looks super weird. This thing is supposed to mimic the δ-function. That is, it
should be infinite when x = 0 and zero when x , 0. The case x = 0 is consistent
since integrating 1 from −∞ to ∞ will definitely give infinity. But if x , 0, it is less
clear because eikx just oscillates indefinitely, so how can we evaluate it at k = ±∞? The
intuition is that ei10000x is a very very fast oscillatory function in x, which oscillates
symmetrically between positive and negative values. Thus, “on average”, it should be
zero. But this, of course, is not at all rigorous. A rigorous derivation can be done if
we regularize the integral. That is, we introduce some term which makes eikx slowly
decay as k → ±∞. An example of how to do this will be discussed below.
There is also another way of understanding (4.7). Let f (x) be some generic function
and let us try to combine the FT with the Inverse FT; that is, we insert Eq. (4.1) in
Eq. (4.3):
Z∞ Z∞ Z∞
dk ikx ˜ dk ikx dy
f (x) = √ e f (k) = √ e √ f (y)e−iky .
2π 2π 2π
−∞ −∞ −∞
Here I used a different letter y because this variable is being integrated, so we should
not confuse it with x. We now exchange the integrals over dk and dy (assuming this is
allowed) and write this as
Z∞ Z∞
dk ik(x−y)
f (x) = dy f (y) e .
2π
−∞ −∞
The integral in k results in a function of x − y: That is, this must have the form
Z∞
f (x) = dy f (y) × (function of x − y),
−∞
which should be compared this with the definition of the Dirac delta function:
Z∞
f (x) = dy f (y)δ(x − y).
−∞
132
Thus, we conclude that
Z∞
dk ik(x−y)
δ(x − y) = e , (4.8)
2π
−∞
This reflects the fact that if we apply first the FT, and then the Inverse FT, we
must get back to where we started.
If we change the integration variable in Eq. (4.10) from k to k0 = −k, we get a minus
sign in dk0 = −dk. But the integration limits now go from +∞ to −∞. So inverting
them back to the natural order, gets rid of the other minus sign. As a consequence, we
may also write (4.10) as
Z∞ 0
dk −ik0 (x−y)
δ(x − y) = e .
2π
−∞
That is, all that changes is the sign of the exponent. This shows that δ(x − y) = δ(y − x).
Changing integration variables in this way, from k → −k, is a common trick, which is
worth getting used to.
Gaussian
Consider a Gaussian function
e−x /2σ
2 2
f (x) = √ , (4.11)
2πσ2
where σ measures the width of the Gaussian. The pre-factor was chosen so that f (x)
has unit area:
Z∞ −x2 /2σ2
e
√ dx = 1. (4.12)
2πσ2
−∞
133
This is shown in Fig. 4.2(a). In the limit σ → 0, the Gaussian tends to a δ function.
The Fourier transform (4.1) will be
Z∞
1 1 2
/2σ2 −ikx
f˜(k) = √ √ dxe−x e
2π 2πσ2
−∞
f˜(k) = √ dx √
2π 2πσ2
−∞
f˜(k) = √ dy √
2π 2πσ2
−∞
The remaining integral is nothing but Eq. (4.12) again. Hence, we finally obtain
e−k σ /2
2 2
f˜(k) = √ (4.13)
2π
The Fourier Transform of a Gaussian is thus also a Gaussian, but with width 1/σ in-
stead of σ. Once again, we have this interplay where wide in real space implies thin in
Fourier space and vice-versa. This is shown in Fig. 4.2(b).
The inverse Fourier Transform (4.3) now states that
Z∞
e−k σ /2 ikx
2 2
bounded within a finite region of space. And this will be true no matter how small σ
is, as long as σ , 0.
1 This step is actually a bit subtle since y is complex, so that the integral is now in the complex plane,
instead of the real line. But using complex integration methods, one can show that this turns out to be
unimportant.
134
��� (�) ���� (�)
���
���
� ����
� (�)
�(�)
��� ���
����
���
��� ����
-� -� -� � � � � -�� -� � � ��
� �
Figure 4.2: (a) The Gaussian (4.11) for different values of σ. (b) The corresponding Fourier
transform (4.13).
All that matters is that in a full loop, you get an overall factor of 1/2π.
In the FT business, you have to be clear about which notation you are using. And
you have to be consistent and use the same notation throughout. If you are both clear
and consistent, then feel absolutely
√ free to use whatever notation you prefer. Mathe-
matica, for instance, uses 1/ 2π and a eikx in the FT (opposite sign of (4.1)).
We will now see how the FT emerges from this series when L → ∞. Instead of labeling
the coefficients by n, let us label them by
2πn
kn = ,
L
135
These kn are in one-to-one correspondence with n. Moreover, they are spaced by ∆kn =
2π/L. Thus, as L → ∞ the spacing between the kn becomes smaller and smaller,
suggesting we can use it to take the limit of a continuum.
In terms of kn , we write (4.16) as
X
f (x) = cn eikn x .
n∈Z
To make this sum look like an integral, we introduce a “convenient 1”: since ∆kn =
2π/L, we can multiply f (x) by
L
1= ∆kn .
2π
This leads to
L X ikn x
f (x) = cn e ∆kn .
2π n∈Z
Now the remaining sum looks exactly like the Riemann sum we learn in calculus: when
L → ∞ the increment ∆kn becomes infinitesimal and the kn tend to vary continuously.
Thus, defining
L
f˜(kn ) = √ cn , (4.17)
2π
leads to
X ∆kn Z∞
dk
f (x) = ˜
√ f (kn )e → ikn x
√ f˜(k)eikx ,
n 2π 2π
−∞
which is exactly Eq. (4.3).
We can also do the same for the coefficients cn . From Eq. (4.16),
ZL/2
L 1
f˜(kn ) = √ cn = √ f (x)e−ikn x dx.
2π 2π
−L/2
Taking the limit L → ∞ then yields exactly Eq. (4.1). Incidentally, we have also proven
that Eq. (4.3) is indeed the inverse operation of (4.1).
Example: Consider the boxcar (4.4), but let us assume that it is piecewise periodic
with period L (we assume a < L/2). The Fourier coefficients are
ZL/2 Za
1 1
cn = Ba (x)e −ikn x
dx = e−ikn x dx,
L 2aL
−L/2 −a
136
Real functions and other symmetries
The variable x is real, but the function f (x) can in principle be complex. Let us
see what happens if f (x) is real. We start with the Fourier transform (4.1) and take the
complex conjugate. But since f (x)∗ = f (x), we get
Z∞
1
f˜(k)∗ = √ f (x)eikx dx. (4.18)
2π
−∞
The only thing that changed is that e−ikx → eikx . So this would be the same as comput-
ing f˜(−k). Thus, we conclude that
f˜(k)∗ = f˜(−k) when f (x) ∈ R. (4.19)
When the function is real, the Fourier coefficients may still be complex, in general. But
are not entirely arbitrary; instead, they satisfy this special symmetry.
What if f (x) is both real and even? This was the case of the Boxcar and Gaussian
examples we studied before. And, coincidentally, in both cases we found that f˜(k)
happened to be real. Is this general? We start with Eq. (4.18) and use the fact that
f (x) = f (−x):
Z∞
1
f˜(k) = √
∗
f (−x)eikx dx.
2π
−∞
We now change variables to x0 = −x. There is a minus sign in the differential, dx0 =
−dx. But the integral is now from +∞ to −∞. So in the end we get a double sign
change, leading to
Z∞
˜f (k)∗ = √1 0
f (x0 )e−ikx dx0 = f˜(k).
2π
−∞
which is nothing but the original FT (4.1). Thus, if f (x) is even, the FT will be real:
f˜(k)∗ = f˜(k). Similarly, if f (x) is odd, we use f (−x) = − f (x) to get
Z∞
1
f˜(k)∗ = − √ f (−x)eikx dx.
2π
−∞
Symmetries of the FT
137
• If f (x) is real, then f˜(k) will still be complex, but will satisfy f˜(k)∗ =
f˜(−k).
• If f (x) is real and even, the FT will be real, f˜(k)∗ = f˜(k).
• If f (x) is real and odd, the FT will be purely imaginary, f˜(k)∗ = − f˜(k).
d3 r
Z
f˜(k, t) = f (r, t)e−ik·r . (4.23)
(2π)3/2
Conversely, taking the FT in both r and t would lead to a function f˜(k, ω),
d3 r dt
Z
f˜(k, ω) = √ f (r, t)e−i(ωt+k·r) . (4.24)
(2π)3/2 2π
138
where ω is the Fourier variable associated to t, as in Eq. (4.2). We could have called it
a 4D vector k = (k1 , k2 , k3 , k4 ), but as I mentioned above, we like to distinguish time
and space.
In this case, it is also customary to define the time part with an opposite sign: as
we discussed around Eq. (4.15), it does not matter if we use eiωt or e−iωt in the FT. It is
just a matter of convention. So usually, when we have to take the FT in both time and
space, we would define it as
d3 r dt
Z
f˜(k, ω) = √ f (r, t)ei(ωt−k·r) . (4.25)
(2π)3/2 2π
That is, with e+iωt but e−ik·r . We don’t actually have to do this. We only do it be-
cause, as we saw in Sec. 3.9, plane waves usually have the form ei(ωt−k·r) , so Eq. (4.25)
naturally looks like an expansion in plane waves.
(ik)2 2 (ik)3 3
eikx = 1 + (ik)x + x + x + ....
2! 3!
Plugging this in Eq. (4.27) and using the fact that the average is a linear operation,
139
hA + Bi = hAi + hBi, we get
(ik)2 2 (ik)3 3
G(k) = 1 + (ik)hxi + hx i + hx i + . . .
2! 3!
∞
X (ik)n
= hxn i. (4.28)
n=0
n!
Thus, we see that if we do a series expansion of G(k), the coefficient multiplying kn
will be proportional to hxn i.
So suppose you started with a certain P(x) and succeeded in finding G(k). Now we
expand it in a power series as
∞
X cn kn
G(k) = , (4.29)
n=0
n!
where
dnG
cn = . (4.30)
dkn k=0
Comparing (4.28) and (4.29) we can then conclude that cn = in hxn i. Or, what is equiv-
alent:
1 dnG
hx i = n n .
n
(4.31)
i dk k=0
We therefore obtain the moments by differentiating G(k). This is infinitely easier than
the integral (4.26).
Z∞ ∞
e−(λ−ik)x λ
G(k) = λ e−(λ−ik)x dx = λ = ,
−(λ − ik) 0 λ − ik
0
where the term evaluated at ∞ vanished since λ > 0. We now use the Geometric series
1
= 1 + a + a2 + a3 + a4 + . . . ,
1−a
which leads to
1
G(k) = = 1 + (ik/λ) + (ik/λ)2 + (ik/λ)3 + . . . .
1 − ik/λ
Comparing this with (4.28) we then find that
hxn i 1 n!
= n → hxn i = n .
n! λ λ
This therefore determines a neat and compact formula for all moments of the exponen-
tial distribution.
140
Example: Consider the Gaussian distribution
e−x /2σ
2 2
P(x) = √ . (4.33)
2πσ2
The Fourier transform √ was already computed in Eq. (4.13). We just need to adjust it
slightly because
√ of the 2π. In that occasion, we were using the definition (4.1), which
divided by 2π. The characteristic function (4.27) does not. Hence,
2
σ2 /2
G(k) = e−k
To make this look like (4.28), I introduced in the second equality a fake “i”, by writing
(−1)n k2n = (ik)2n . The resulting series contain only even powers in k. Hence, we can
immediately conclude that all odd moments must vanish, hx2n+1 i = 0. As for the even
moments, if we multiply and divide by (2n)! we get
∞
X (2n)!σ2 k2n
G(k) = .
n=0
2n n! (2n)!
This is now exactly the even part of the series (4.28), so that we can recognize
(2n)! 2n
hx2n i = σ . (4.34)
2n n!
In particular, when n = 1 we obtain the second moment hx2 i = σ2 . Usually σ2 is the
variance hx2 i − hxi2 , but in this case hxi = 0.
This is the analog of the dot product, but for functions instead of vectors. In particular,
the inner product of f with itself is
Z∞
( f, f ) = | f (x)|2 dx. (4.36)
−∞
141
Clearly, this quantity is non-negative and will be zero if and only if f (x) is zero. Func-
tions which are such that ( f, f ) is finite are said to be square integrable. Intuitively
speaking, square integrable functions are those that decay sufficiently fast as x → ±∞;
that is, that are essentially “confined” within some finite region of space (and hence
vanish at x → ±∞).
Using the definition of the Inverse FT, Eq. (4.3), we can also write (4.35) as
Z∞ Z∞ Z∞
dk ˜∗ dk0 0
( f, g) = dx √ f (k)e−ikx √ g̃(k0 )eik x
2π 2π
−∞ −∞ −∞
Z Z∞
dx i(k0 −k)x
= dkdk0 f˜∗ (k)g̃(k0 ) e .
2π
−∞
All I did here was to change the order of the integrals, so that we can first integrate
over x. This is convenient since the resulting integral is nothing but the δ-function
representation (4.10):
Z∞
dx i(k0 −k)x
e = δ(k − k0 ).
2π
−∞
Z∞ Z∞
( f, g) = f (x)g(x)dx =
∗
f˜∗ (k)g̃(k)dk. (4.37)
−∞ −∞
This is known as Parseval’s relation. It means we can take the inner product in real
space or in Fourier space; it does not matter. In particular, if we take the inner product
of f (x) with itself, we are essentially computing the norm of the function. In this case
Eq. (4.37) yields
Z∞ Z∞
| f (x)| dx =
2
| f˜(k)|2 dk. (4.38)
−∞ −∞
One may, in fact, show that the Fourier Transform maps the space of square integrable
functions onto itself, in a one-to-one manner. Eq. (4.38) corroborates this idea.
142
�(�) � (�)
ℱ [�(�)]
Figure 4.3: The Fourier Transform as a machine/blackbox. The input function f (x) is processed
by the transform F to output a new function f˜(k).
This is also an expansion in a basis: √ now the basis elements are δ(x − y) and
the coefficients are f (y). Both eikx / 2π and δ(x − y) are valid choices of basis
for the space of square integrable function. And Parseval’s identity (4.38) is
essentially saying that the choice of basis does not affect the inner product (just
like in the usual case of vectors).
Indeed, F is a linear operator from the space of square integrable functions to itself.
Similarly, I will leave it for you as an exercise to show that
143
Or that
F f (x/a) = a f˜(ak),
(4.41)
Most importantly for us will be what happens when we take derivatives or integrals
of Fourier Transforms. Start with
Z∞
dk
f (x) = √ f˜(k)eikx .
2π
−∞
Thus, we see that if the FT of f (x) is f˜(k), then the FT of f 0 (x) will be ik f (k). In
Fourier space, derivatives are mapped into multiplication by ik:
d
⇐⇒ ik (4.42)
dx
We can also see this in another way. Start with the FT of f 0 (x):
Z
dx
F f 0 (x) = √ f 0 (x)e−ikx
2π
Now we integrate by parts. This just transfer the derivative from f 0 to e−ikx , plus a
boundary term:
∞ Z
1 dx d
F f (x) = √ f (x)e − √ f (x) e−ikx .
0 −ikx
2π −∞ 2π dx
But if the function is square integrable, it must vanish at ±∞, so the boundary term
vanishes. Moreover, the d/dx in the last term simply produces a factor of −ik, so that
we are left with Z
dx
F f 0 (x) = ik √ f (x)e−ikx = ik f˜(k),
2π
which again shows that the FT of f 0 (x) is just ik f˜(k).
We may also repeat the process as many times as we want. For instance,
F f 00 (x) = (ik)2 f˜(k).
Each time we differentiate, we simply get an extra ik. And the reverse logic holds for
the indefinite integral of f (x):
∞
hZ i Z dk f˜(k)
F dx f (x) = √ eikx .
2π ik
−∞
When we integrate, we get instead 1/ik. These properties, as we will see in the next
section, will be the key for solving PDEs using Fourier Transforms.
144
Convolutions
The convolution between two functions f (x) and g(x) is another function
Z∞
dy
( f ∗ g)(x) = √ f (x − y)g(y). (4.43)
2π
−∞
This may seem like a weird way of combining two functions. But it turns out convo-
lutions appear often in physics, specially related to differential equations and Green’s
functions. For instance, in Sec. 2.4 we saw how the particular solution of an inhomoge-
nous ODE Ly = f (t) could be written as
Z∞
y p (t) = G(t − t0 ) f (t0 )dt0 ,
−∞
where the Green’s function G(t) was the solution of LG(t) = δ(t). As you can see, this
is precisely a convolution.
The convolution (4.43) is actually symmetric in f and g:
Z∞ Z∞
dy dy
( f ∗ g)(x) = √ f (x − y)g(y) = √ g(x − y) f (y) = (g ∗ f )(x) (4.44)
2π 2π
−∞ −∞
To see that, we need to change variables in (4.43) to y0 = x − y. This makes dy0 = −dy,
but also flips the integration limits, so that in the end both changes cancel each other:
Z∞ Z−∞ Z∞
dy dy0 dy0
√ f (x − y)g(y) = − √ f (y0 )g(x − y0 ) = √ g(x − y0 ) f (y0 ).
2π 2π 2π
−∞ ∞ −∞
F f ∗ g = f˜(k)g̃(k).
(4.45)
To see why, we simply take the Fourier Transform (4.1) of f ∗g, using also the definition
of the convolution in (4.43)
Z∞
dx
F f ∗g = √ e−ikx ( f ∗ g)(x)
2π
−∞
Z∞ Z∞
dx dy
= √ e−ikx √ f (x − y)g(y).
2π 2π
−∞ −∞
145
We now do the following sorcery: we write e−ikx = e−ik(x−y) e−iky (woooow! Ninja!).
This allows us to rearrange the integrals and put the one over y to the left:
Z Z
dy −iky dx
F f ∗g = √ e−ik(x−y) f (x − y).
√ e g(y)
2π 2π
The reason why this is useful is because, if we now change variables to x0 = x − y in
the x integral, while keeping the y integral intact, we get
dx0
Z Z
dy 0
F f ∗g = √ e−iky g(y) √ e−ikx f (x0 ).
2π 2π
The two integrals are now completely factored. And, what is more, each integral is
nothing but the Fourier Transforms of g(x) and f (x). We therefore arrive at Eq. (4.45).
defined for all x ∈ R. This is a Cauchy problem: we impose initial conditions, but
no boundary conditions. The best way of solving this is via Fourier Transforms. We
define the FT with respect only to the position x as
Z∞
dx
ũ(k, t) = √ u(x, t)e−ikx , (4.47)
2π
−∞
which is therefore
√ still a function of t. To take the FT of (4.46), we multiply on both
sides by e−ikx / 2π and integrate from −∞ to ∞:
Z∞ Z∞
dx dx
√ ut (x, t)e−ikx = α √ u xx (x, t)e−ikx . (4.48)
2π 2π
−∞ −∞
146
Heat in Fourier space
Combining Eqs. (4.49) and (4.50) in Eq. (4.48) we then finally find that
where ũ(k, 0) is the initial condition, which can be found directly from Eq. (4.47):
Z∞
dx
ũ(k, 0) = √ u0 (x)e−ikx . (4.52)
2π
−∞
Moving to Fourier space makes solving the PDE very easy. But now we have to go
back to real space, by taking the inverse FT:
Z∞ Z∞
dk dk 2
u(x, t) = √ ũ(k, t)eikx = √ ũ(k, 0) eikx−αk t . (4.53)
2π 2π
−∞ −∞
To actually compute this integral, we need to know ũ(k, 0) or, what is equivalent, the
initial condition u0 (x).
u0 (x) = δ(x − x0 ).
If we think of u as a concentration, this then means that the problem started with u
being sharply concentrated on a certain point x0 . The corresponding initial condition
in Fourier space, Eq. (4.52), will be simply
e−ikx0
ũ(k, 0) = √
2π
Plugging this in Eq. (4.53) leads to
Z∞
dk −αk2 t ik(x−x0 )
u(x, t) = e e . (4.54)
2π
−∞
147
���
0.1
��� 0.5
�(���)
��� 1
��� 2
���
-� -� � � �
�
Figure 4.4: The solution (4.55) of the 1D heat equation for a δ initial condition, with x0 = 0 and
different values of αt.
This is exactly the Fourier transform of the Gaussian; see, for instance, Eq. (4.14) with
x replaced by x − x0 and σ2 = 2αt. The result will thus be Eq. (4.11):
1 (x−x0 )2
u(x, t) = √ e− 4αt , t > 0. (4.55)
4παt
The initial δ-peak therefore evolves as a Gaussian, centered at position x0 but with a
growing variance σ2 = 2αt. That is, as time evolves the Gaussian spreads out. This
makes a lot of sense, when we think of u as a concentration. The variance of the
Gaussian is σ2 = 2αt, so that the standard deviation scales as
√
σ ∼ t. (4.56)
This type of spreading is usually taken as a trademark of diffusion. That is, when deal-
ing with
√ more general or complicated problems, we say that a problem is “diffusive” if
σ ∼ t.
For a generic initial condition, we can plug (4.52) in (4.53), leading to
Z∞ Z∞
dk 2 dy
u(x, t) = √ eikx−αk t √ u0 (y)e−iky .
2π 2π
−∞ −∞
The integral over k is exactly the δ-function solution (4.54) and (4.55).
The solution (4.55) of the δ initial condition is also called the Green’s func-
148
tion of the heat equation:
1 x2
G(x, t) = √ e− 4αt θ(t), (4.58)
4παt
With this definition, we see that the general solution (4.57) can be written as
Z∞
u(x, t) = dy u0 (y)G(x − y, t). (4.59)
−∞
We therefore see that from the Green’s function, we generate the solution to
any initial condition by taking the convolution of u0 (y) with G(x − y, t).
149
Greens function an inhomogeneous PDE
2
Define G̃(k, t) = e−αk t . This is the FT of the Green’s function (4.58) [c.f.
Eq. (4.54)]. With this we can then write the solution as
Zt
ũ(k, t) = dt0 G̃(k, t − t0 ) f˜(k, t0 ). (4.63)
0
This result is fairly cool: we have a PDE in x and t, but we only moved to
Fourier space with respect to x. As a result, we find that ũ(k, t) is a convolution
in time between G̃ and f˜. But as far as x and k are concerned, this is just a
product of the two Fourier transforms. Thus, according to the convolution the-
orem (4.45), if we now move back to real space, the result will be a convolution
in position between G and f (the integral in t0 does not interfere at all):
Zt Z∞
u(x, t) = dt 0
dy G(x − y, t − t0 ) f (y, t0 ). (4.64)
0 −∞
Isn’t this awesome?! The Green’s function propagates the solution. It says how
u(x, t) responds to the forcing f (y, t0 ), that occurred at previous times t0 and at
different positions y.
We haven’t yet proven our claim about the initial conditions, though. So far this is
general and holds for any inhomogeneity f . We now specialize it to f (x, t) = δ(t)δ(x −
x0 ). This kills both integrals in Eq. (4.64), leaving us precisely with u(x, t) = G(x −
x0 , t). Thus, indeed, what we are calling a Green’s function is actually a solution of two
different, but related problems:
• G(x, t) is the solution of ut − αu xx = 0, with u0 (x) = δ(x).
• G(x, t) is the solution of ut − αu xx = δ(t)δ(x), with u0 (x) = 0.
Notice also how, in both cases, we use G(x, t) to build more general solutions:
• The solution of ut − αu xx = 0, with arbitrary u0 (x) can be computed from
Eq. (4.59).
• The solution of ut − αu xx = f (x, t) with u0 (x) = 0 but arbitrary f (x, t) can be
computed from Eq. (4.64).
150
where r = (x1 , x2 , . . . , xd ), with d = 1, 2, 3, . . . being the dimension of the system (if
this confuses, imagine that d = 3 and (x1 , x2 , x3 ) = (x, y, z)). We now take the Fourier
Transform of the spatial coordinates, by defining
dd r
Z
ũ(k, t) = u(r, t)e−ik·r , (4.66)
(2π)d/2
where dd r = dx1 dx2 . . . dxd . Taking the FT on both sides of (4.65) then leads to
dd r
Z
∂t ũ(k, t) = α (∇2 u)e−ik·r . (4.67)
(2π)d/2
Since ∇2 u = ∂2x1 u + . . . ∂2xd u, the right-hand side will involve a sum of terms. In each
term, the same logic of the 1D case means we should replace ∂ x j into ik j . Put it differ-
ently, the substitution logic is now generalized to:
whose solution is
2
ũ(k, t) = e−αk t ũ(k, 0).
Things will start to change once we plug this back in the inverse FT:
dd k ik·r−αk2 t
Z
u(r, t) = e ũ(k, 0).
(2π)d/2
This integral is now more complicated because it is multidimensional. To learn how to
do it, we focus on the Green’s function case, where ũ(k, 0) = 1/(2π)d/2 . In this case
the solution will read
dd k ik·r−αk2 t
Z
G(r, t) = e . (4.70)
(2π)d
If we can compute this integral, then the solution for a generic initial condition will be,
in analogy with Eq. (4.59),
Z
u(r, t) = dd r 0 u0 (r 0 ) G(r − r 0 , t). (4.71)
Integrals of the form (4.70) can be very difficult. But today is actually our lucky
day, because this one in particular is super easy! What we have to realize is that k2 =
k12 + . . . + kd2 , so that the integral factors as a product:
Z Z Z
dk1 ik1 x1 −αk2 t dk2 ik2 x2 −αk2 t dkd ikd xd −αk2 t
G(r, t) = e 1 e 2 ... e d .
2π 2π 2π
151
Each integral is just a copy of the 1D Gaussian integral that led us from (4.54) to (4.55).
This is really an exercise in pattern matching: we don’t have to redo anything; just re-
cycle previous results. We therefore conclude that G(r, t) will be a product of solutions
of the form (4.55):
1
e−x1 /4αt e−x2 /4αt . . . e−xd /4αt .
2 2 2
G(r, t) =
(4παt)d/2
Finally, we can combine everything and write G solely in terms of r2 = x12 + . . . + xd2 :
1
e−r /4αt .
2
G(r, t) = (4.72)
(4παt)d/2
Diffusion in arbitrary dimensions is therefore also Gaussian. The concentration of a
particle in a river, or in the air, diffuses in all directions.
One interesting thing to notice about Eq. (4.72) is that the Green’s function depends
only on the magnitude of the position r = |r|. This is a consequence of the fact that
Eq. (4.65) is isotropic; that is, it is symmetric under rotations. We could also study
anisotropic diffusion: it would read something like
That is, with different diffusion constants for each direction. I will leave it for you as
an exercise to think about how the Green’s function would change in this case.
is Z
u(r, t) = dd r 0 u0 (r 0 ) G(r − r0 , t),
where
1
e−r /4αt .
2
G(r, t) =
(4παt)d/2
152
where Ĥ is called the Hamiltonian operator. In this section, to be more careful, I will
put hats on operators, so that we know they are not just plain numbers. In the simplest
case of a free particle in 1D, we have
p̂2
Ĥ = , (4.74)
2m
where m is the mass and
∂
p̂ = −i~
, (4.75)
∂x
is the momentum operator. Eq. (4.73) then becomes
∂ψ ~2 ∂2 ψ
i~ =− , (4.76)
∂t 2m ∂x2
which is similar to the heat equation (4.46), but with a complex left-hand side.
There is, however, a fundamental conceptual difference. The wavefunction ψ(x, t)
represents a probability ampitude (instead of a probability). The actual probability is
|ψ(x, t)|2 . Thus, for instance, the average position of the particle is given by
Z
hxi = dx |ψ(x, t)|2 x, (4.77)
The order of terms inside the integral matters: Ô is a differential operator and therefore
acts on anything to its right. So the integrand in (4.80) means that we must first act
with Ô on ψ and then multiply the result by ψ∗ . For instance, the average momentum
is
∂ψ
Z
h p̂i = −i~ dx ψ∗ , (4.81)
∂x
and the average momentum squared is
∂2 ψ
Z
h p̂2 i = −~2 dx ψ∗ 2 . (4.82)
∂x
Again, we use these to compute the variance of the momentum
153
which measures overall fluctuations of the momentum. Similarly, we can compute the
average kinetic energy
h p̂2 i
hĤi = . (4.84)
2m
The position is also an operator x̂, but is one whose effect is kind of trivial: applying x̂
on ψ is the same as multiplying ψ by the number x.
In classical mechanics we describe systems in terms of both position and momen-
tum. In quantum mechanics the wavefunction is only a function of x, while momentum
is upgraded to an operator. We then extract information about the momentum by taking
averages such as (4.80).
dk0 ∂
Z Z ! Z !
0 dk
h p̂i = −i~ dx √ e−ik x ψ̃∗ (k0 , t) √ eikx ψ̃(k, t) .
2π ∂x 2π
The derivative ∂/∂x acts only on eikx , leaving us with
dxdk0 dk −ik0 x ∗ 0
Z
h p̂i = −i~ e ψ̃ (k , t) (ik) eikx ψ̃(k, t)
2π
Z Z
dx i(k−k0 )x
= dkdk0 (~k) ψ̃∗ (k0 , t)ψ̃(k, t) e .
2π
Z
= dkdk0 (~k) ψ̃∗ (k0 , t)ψ̃(k, t) δ(k − k0 )
Z
= dk (~k) |ψ̃(k, t)|2 .
We therefore reach the important conclusion that, if we work in Fourier space, the
average momentum (4.81) becomes simply
Z
h p̂i = dk (~k) |ψ̃(k, t)|2 . (4.86)
This looks exactly like the average position (4.77). There are no derivatives. We simply
average ~k over all ψ̃. Thus, |ψ̃(k, t)| can be directly recognized as the probability for
finding the particle with momentum ~k, just like |ψ(x, t)|2 is the probability of finding it
at position x. Moving to Fourier space is therefore the same as moving to momentum
space: k and momentum are just an ~ apart.
154
Gaussian wavepacket
Consider a Gaussian wavefunction
1
eiqx−(x−x0 ) /4σ ,
2 2
ψ(x) = (4.87)
(2πσ2 )1/4
where x0 , σ and q are parameters. We are not worrying about any possible time depen-
dence. You can think of this as being the state of the system at some fixed time. We
could use this as an initial condition for Eq. (4.76) and it would then start to evolve
as time goes on. But for now let us just think about the properties of this state, at one
given instant of time.
The constant in front of (4.87) is chosen so that the wavefunction is properly nor-
malized [compare with Eq. (4.12)],
Z Z
1
dx e−(x−x0 ) /2σ .
2 2
dx |ψ| = 1 =
2
(4.88)
(2πσ2 )1/2
A repetition of the Gaussian integrals we have already done quite a few times will show
that
hxi = x0 , hx2 i = x02 + σ2 , (4.89)
so that the x0 and σ2 are directly interpreted as the average position and the variance
of the Gaussian wavepacket: σ x = σ.
But what about the parameter q? It turns out it is related to momentum. Applying
the momentum operator to ψ, we find, with some small simplifications,
x − x0
p̂ψ = −i~ iq − ψ. (4.90)
2σ2
We now use this in Eq. (4.81); that is, we multiply by ψ∗ and integrate, which leads to
Z x − x0 2
h p̂i = −i~ dx iq − |ψ| .
2σ2
The second term will be an average of x − x0 , which is zero since hxi = x0 . In first
term, on the other hand, we can put iq outside the integral. All that remains is therefore
|ψ|2 , which is normalized to 1. Whence, we finally arrive at
The parameter q in Eq. (4.87) is therefore directly related to the average momentum.
What about the variance of p̂? To find that, we first compute the second moment.
Applying p̂2 to ψ is a little bit messier, but I will just tell you the result:
(x − x0 )2
( )
iq 1
p̂2 = −~2 − q 2
+ (x − x0 ) − ψ.
4σ4 2σ2 2σ2
We insert this in Eq. (4.80). The term which is quadratic in (x − x0 )2 , when averaged,
yields exactly σ2 . And the term linear in x − x0 integrates to zero. Whence, we find
1 1
h p̂2 i = −~2 2
− q2 − ,
4σ 2σ2
155
or simplifying, a bit:
~2
h p̂2 i = ~2 q2 +
4σ2
The variance in momentum σ2p = h p̂2 i − h p̂i2 , will thus be
~2
σ2p = . (4.92)
4σ2
As one might expect, we see here the same kind of trade-off we saw in the Fourier
business: a large variance σ2 in position implies a small variance ~2 /4σ2 in momen-
tum, and vice-versa. This trade-off is neatly summarized by computing the uncertainty
product:
~
σx σ p = . (4.93)
2
The product of the variances is constant. So if we want to decrease one, we must
increase the other.
Gaussian wavepackets
hxi = x0 , σ x = σ,
~
h p̂i = ~q, σp = ,
2σ
so that σ x σ p = ~/2. I will leave it for you as an exercise to show that the
Fourier Transform is
!1/4
2σ2
e−i(k−q)x0 −(k−q) σ ,
2 2
ψ̃(k) =
π
~
σx σ p > . (4.94)
2
156
The principle states that there is a lower bound to the uncertainty product. We can never
know both x and p with infinite precision. The Gaussian wavepacket is a limiting case,
where the bound is saturated. And even in this case, infinite precision on x would imply
infinite ignorance on p, and vice-versa. For other states ψ, we always get something
larger.
We are going to prove Eq. (4.94). But the first thing to realize is that Heisenberg’s
uncertainty is not only a quantum thing: it is actually a direct consequence of Fourier
analysis and even has important applications in, e.g., signal processing. In fact, it can
be essentially summarized by the statement that a function and its Fourier transform
cannot both be sharply localized. Our proof emphasizes this connection. Actually, to
really emphasize it, I am going to call our function f (x) and its Fourier transform f˜(k).
That way we can be sure that it holds for any square-integral function. We assume,
without loss of generality, that f (x) is normalized, | f (x)|2 dx = 1. And we will be
R
We can also suppose, again without loss of generality, that the means vanish, hxi =
hki = 0.2 The basic idea, therefore, is to compare the width of f (x) in real space, with
the width of f˜(k) in Fourier space. This is the essence of Heisenberg’s principle.
To start, consider the integral
Z 2
I(λ) = λx f (x) + ∂ x f (x) dx,
(4.95)
The first integral is simply hx2 i. In the second integral, we integrate by parts one of the
terms. Assuming all boundary contributions vanish, we get only
Z Z Z Z
(∂ x f ∗ )x f dx = − f ∗ ∂ x (x f )dx = − | f |2 dx − x f ∗ ∂ x f dx.
The last term is going to cancel a corresponding term in Eq. (4.96), while the first is 1
by normalization. Thus, we are left with
Z
I(λ) = λhx2 i − λ + |∂ x f |2 dx.
2 If the means don’t vanish, define a new function F(x) = f (x + x )e−ik0 x , where x = hxi and k = hki.
0 0 0
Then one may verify that hx2 iF = h(x − x0 )2 i f and hk2 iF = h(k − k0 )2 i f , where h. . .iF means an average over
F (and F̃) instead of f .
157
Finally, we play with the last term. Recall Parseval’s identity:
Z Z
| f (x)| dx =
2
| f˜(k)|2 dk.
Moreover, recall that if f˜ is the FT of f , then the FT of ∂ x f will be ik f˜. Thus, Parseval’s
identity for f 0 must yield
Z Z
|∂ x f |2 dx = ik f˜2 dk = hk2 i.
If this
R Parseval argument did not convince you, try plugging the definition of the FT
into |∂ x f |2 dx and show that it is indeed hk2 i. In any case, with this last result, we
finally get
I(λ) = λ2 hx2 i − λ + hk2 i. (4.97)
We started with I(λ) > 0, so this must of course continue to be true. But we can also
look at I(λ) as being a quadratic polynomial in λ. The condition to have I(λ) > 0 is for
this polynomial to have no real roots. The discriminant must thus be non-positive:3
Whence,
1
hx2 ihk2 i > . (4.99)
4
This is Heisenberg’s uncertainty solely from Fourier analysis. We can recover the quan-
tum result (4.94) by noticing that h p̂2 i = ~2 hk2 i.
~2 k2
i~∂t ψ̃ = Ek ψ̃, Ek := . (4.100)
2m
The solution is just ψ̃(k, t) = e−iEk t/~ ψ̃(k, 0). This motivates us to define the Fourier-
space Green’s function
G̃(k, t) = e−iEk t/~ . (4.101)
The Fourier space solution then becomes
158
���
��� ����
��� ���
|ψ �
��� ���
�
���
���
���
-� -� -� � � � �
�
Figure 4.5: Solution of Eq. (4.105) for different times, with an initial wavepacket ψ(x, 0) taken
as a boxcar between [−1, 1] (dashed curve). The curves are plotted with m = ~ = 1.
This is therefore the product of two Fourier Transforms, G̃(k, t) and ψ̃(k, 0).
To go back to real space, we now take the inverse FT. We can skip a bit of work
here using the convolution theorem (4.43). Since ψ̃(k, t) is the product of two FTs, then
ψ(x, t) must be the convolution of ψ(x, 0) with4
Z
dk i(kx−Ek t/~)
G(x, t) = e , (4.103)
2π
which is the FT of G̃(k, t). This is the same Gaussian integral that led us to (4.58), but
with α = i~/2m. Thus, we find Hence
r
m
e−mx /2i~t θ(t).
2
G(x, t) = (4.104)
2πi~t
The final solution is thus
Z∞
ψ(x, t) = dy G(x − y, t)ψ(y, 0). (4.105)
−∞
159
where ρ(r) is the inhomogeneous term, sometimes also called the external source. This
equation can be defined in any dimension and minus sign is placed only for convenience
(it could be absorbed into ρ).
Poisson’s equation appears often in electrostatics, for instance. Consider a region
of space containing a certain charge density ρ(r). Gauss’ law states that the electric
field E(r) generated by this density will satisfy
∇ · E = ρ/0 , (4.107)
− ∇2 φ = ρ/0 , (4.108)
d3 r
Z
φ̃(k) = φ(r) e−ik·r . (4.109)
(2π)3/2
Each ∇ is converted into ik, so that −∇2 is then converted into k2 = |k|2 . Thus,
Eq. (4.106) in Fourier space becomes
k2 φ̃ = ρ̃, (4.110)
where ρ̃ is the FT of ρ(r), and is defined exactly as in (4.109). In Fourier space the
solution is thus trivial,
ρ̃
φ̃ = 2 . (4.111)
k
We now go back to real space and write
d3 k ik·r
Z
φ(r) = e φ̃(k)
(2π)3/2
d3 k ik·r ρ̃(k)
Z
= e
(2π)3/2 k2
d3 k ik·r 1 d3 r 0 −ik·r0
Z Z
= e e ρ(r 0 ).
(2π)3/2 k2 (2π)3/2
Rearranging the integrals allows us to identify the Green’s function, just like before:
0
d3 k eik·(r−r )
Z Z
φ(r) = d r ρ(r )
3 0
(2π)3 k2
Z
= d3 r ρ(r 0 )G(r − r 0 ), (4.112)
where
d3 k eik·r
Z
G(r) = , (4.113)
(2π)3 k2
160
Figure 4.6: The convenient choice of axes for carrying out the k-integral in Eq. (4.113).
Eq. (4.112) is a convolution of the Green’s function with the external source. Ac-
cording to the convolution theorem, in Fourier space φ̃(k) should then simply be a
product of the Fourier Transforms of ρ and G. And, indeed, this is exactly what we see
in Eq. (4.111): The Fourier Transform of G(r) is (being sloppy with 2π’s) G̃(k) = 1/k2 ,
which can be read off directly from (4.113). So Eq. (4.111) is indeed nothing but
φ̃(k) = ρ̃(k)G̃(k).
Before we compute G(r), it is fun to realize that you actually already know the
answer; you saw it in your introductory electromagnetism lectures. Eq. (4.114) must
describe (except for a factor of 1/0 ) the electrostatic potential produced by a point
charge (of magnitude q = 1 at position r = 0. Thus, G(r) must be given by Coulomb’s
law:
1
G(r) = . (4.115)
4πr
Of course, numerical factors like 4π are harder to predict. But what is important is that
G ∝ 1/r. Let us now compute the integral in Eq. (4.113) and indeed show that this is
the case.
Integrals of the form (4.114) appear often in this Fourier business. This is a volume
integral, over d3 k = dk x dky dkz . You are probably used to doing volume integrals in real
space. This is the same thing, but in Fourier space. And, in fact, the real space position
r is just a parameter (as far as the integral is concerned), so we must, in principle,
compute a new integral for each value of r we plug in. The trick to evaluate G is to
realize that we have some freedom in how we choose the orientation of the k reference
frame, for a given r. This is illustrated in Fig. 4.6. For a given r, we can always
reorient the k axis and choose it so that kz is parallel to r.
161
We then move to spherical coordinates (in k-space), by defining d3 k = k2 sin θdkdθdϕ.
Since we chose r to be parallel to kz , it then follows that k · r = kr cos θ. The integral
in (4.113) then becomes
Z∞ Zπ Z2π
1 eikr cos θ
G(r) = dk k 2
dθ sin θ dϕ . (4.116)
(2π)3 k2
0 0 0
The integral in ϕ is trivial and gives 2π. For the other 2, we first compute the integral
in θ and then the one in k. That is, we write this as
Z∞ Zπ
2π
G(r) = dk dθ sin θ eikr cos θ .
(2π)3
0 0
To compute the θ integral, we define z = cos θ. This yields dz = − sin θ dθ. Moreover,
the integration limits are now z = 1 when θ = 0 and z = −1 when θ = π. In fact, the
following transformation rule is useful to know:
Zπ Z1
dθ sin θ f (cos θ) = dz f (z). (4.117)
0 −1
As a consequence, we get
Zπ Z1
ikr cos θ eikr − e−ikr
dθ sin θ e = dz eikrz = .
ikr
0 −1
Whence,
Z∞ Z∞
1 (eikr − e−ikr ) 1 sin kr
G(r) = dk = 2 dk .
(2π)2 r ik 2π r k
0 0
Computing this integral is slightly tricky. One way to do it is using residues and com-
plex integration. Alternatively, we can get it as a particular case of an integral that will
appear in the problem set. In any case, the result turns out to be
Z∞
sin kr π
dk = (4.118)
k 2
0
162
Green’s function for Poisson’s equation
Notice also how the Green’s function depends only on the magnitude r. This is a
consequence of the fact that −∇2 is an isotropic differential operator.
Example: a thin rod of length 2L. Suppose the external source is generated by a
thin rod of length 2L, displaced along the x axis, from −L to L. In this case we get
ρ(r 0 ) = ρ0 δ(y0 )δ(z0 )θ(x0 − L)θ(L − x0 ), (4.121)
where ρ0 is the magnitude of the source density. That is, two deltas in y and z, plus
apboxcar in x. When plugging this in Eq. (4.119), it is convenient to write |r − r 0 | =
(x − x0 )2 + (y − y0 )2 + (z − z0 )2 . Since the δ’s kill the integrals in y0 and z0 , we are then
left only with
ZL
ρ0 1
φ(r) = dx0 √ ,
4π x − 2xx0 + r2
02
−L
where I wrote (x − x0 )2 + y2 + z2 as x02 − 2xx0 + r2 . Using that
Z
1 √
dx0 √ = ln x0 − x + x02 − 2xx0 + r2 ,
x02 − 2xx0 + r2
we then arrive at
√
ρ0 L − x + L2 − 2xL + r2
( )
φ(r) = ln √ . (4.122)
4π −L − x + L2 + 2xL + r2
In particular, if we are at x = 0, this simplifies to
√
ρ0 L + L2 + r 2
( )
φ(0, y, z) = ln √ .
4π −L + L2 + r2
We can also analyze what happens if the rod is very large. To do that, we first rewrite
this as
1 + 1 + (r/L)2
p
ρ0
( )
φ(0, y, z) = ln .
−1 + 1 + (r/L)2
p
4π
163
Δ�
�
� � ��
Figure 4.7: A function f (t) of period T (infrared) and small be discretized in small steps ∆t
(ultraviolet).
√
When L r, we can then expand 1 + ' 1 + /2, to find
ρ0 2 + (r/L)2 /2 ρ0 4L2
( ) !
φ(0, y, z) ' ln = ln 1 + 2 .
4π (r/L)2 /2 4π r
The last term is now much larger than 1, so that we can also write
ρ0 2 2 ρ0
φ(0, y, z) ' ln 4L /r = ln 2L/r . (4.123)
4π 2π
We could also split the log into ln(2L) and − ln(r). The first term is in principle infinite
if L → ∞. But this is not unphysical since this is a constant and electrostatic potentials
are only defined up to a constant (this term, for instance, does not affect at all the
electric field).
164
�� �� ��
��� ���
��� ���
��� ���
���
�� ���
� � � � �� ��� ��
-��� ��� � �� ��
-���
-��� �� -���
��� ��� ��� ��� ���
ZT
1
cn = dt ei2πnt/T f (t). (4.125)
T
0
The period T is what we are going to call the infrared cut-off of f (t). This name is
borrowed from high energy physics, and is just a mnemonic, to give you some intuition.
The logic is that infrared mean something that has a low energy, or long wavelengths.
And the periodicity T is the longest length there is, and also what defines the lowest
energy harmonic ei2πt/T . As we saw in Sec. 4.1, If we want to obtain a Fourier Trans-
form, we can simply take T → ∞. That is, we eliminate the infrared cut-off and this
converts the sum to an integral. The term “infrared” is thus directly related with the
fact that the series is a sum, instead of an integral.
But now suppose we cannot sample f (t) for all t ∈ [0, T ], but only on a discrete set
of points
t j = j∆t, j = 0, 1, 2, . . . , N − 1, ∆t = T/N (4.126)
That is, we assume [0, T ) is divided into N equally spaced points and we are only able
to determine f at these points (Fig. 4.8). The function evaluated at these points then
generates a sequence of N points f j = f (t j ). Since the function is periodic, we can
of course also extend the sequence to points outside [0, T ). And the corresponding
sequence will thus also be periodic.That is, f j+N = f j . This is illustrated in Fig. 4.8.
The discretization step ∆t will be called the ultraviolet cut-off. We often like to
think that ∆t is very small. So just like infrared fixed the lowest energy harmonic,
we will now see that the ultraviolet cut-off fixes the highest energy harmonic. This is
super cool: the series (4.124) has no ultraviolet cut-off; the harmonics are summed all
the way to infinity. It has only an infrared cut-off due to the periodicity T . Conversely,
discretizing the function, introduces an ultra-violet cut-off. That is, it truncates the sum
in n to a maximum value nmax . This will be the main result of the next section.
To contrast, Fourier Transforms have neither infrared nor ultraviolet cut-offs; the
magnitude of the energy, |ω|, is allowed to range from 0 (lowest energy) to ∞ (highest
energy). The hierarchy of cut-offs is therefore as follows:
165
• Generic continuous function f (t): has a Fourier Transform (no cut-offs).
• Continuous, but periodic with period T : has an infrared cut-off, yielding a series
instead of an integral.
• Discrete and periodic f j : has both infrared and ultraviolet cut-offs. Infrared
means the Fourier representation is a series, while ultraviolet truncates the series
to a finite number of terms.
since e−i2π j = 1 for any integer j. Thus, we see that shifting n → n + N takes e−i2πn j/N
to itself. This means that, as we vary n, there are actually only N distinct numbers
e−i2πn j/N . Usually we take them to be those with n = 0, 1, 2, . . . , N − 1. But we are free
to take any other choice. For instance, a popular one is
h N N i
n∈ − , −1 , N even,
2 2
h N − 1 N − 1i
n∈ − , , N odd.
2 2
This produces the same N points e−i2πn j/N . It is just a bit more annoying to use since
we need to keep track of whether N is even or odd. This is illustrated in Table 4.1.
We can therefore regroup the sum in (4.127) as
N−1
X ∞
X
fj = e−i2πn j/N cn+νN .
n=0 ν=−∞
That is, for each n ∈ [0, N − 1], we group all terms cn , cn±N , cn±2N , . . . which will be
multiplied by the same exponentials. Recall that the exponentials e−i2πn j/N form a basis
for the set of functions (in this case these are discrete functions; i.e., sequences). We
may therefore define new coefficients
∞
X
f˜n = cn+νN , (4.128)
ν=−∞
166
Table 4.1: The basic exponentials e−i2πn j/N for n ∈ [0, N − 1] (upper table) and n ∈ [−N/2, N/2 − 1]
(lower table), with N = 10 and j = 3.
n 0 1 2 3 4 5 6 7 8 9
e−i2πn j/N 1 e−i2π/5 e−i4π/5 ei4π/5 ei2π/5 1 e−i2π/5 e−i4π/5 ei4π/5 ei2π/5
n -5 -4 -3 -2 -1 0 1 2 3 4
e−i2πn j/N 1 e−i2π/5 e−i4π/5 ei4π/5 ei2π/5 1 e−i2π/5 e−i4π/5 ei4π/5 ei2π/5
The series now has only N terms. The assumption that f is only evaluated at a discrete
set of points therefore reduces an infinite sum to a finite one. If we want to recover a
continuous function, we take N → ∞. That is, we evaluate f at an infinite number of
points.
I want to convince you that when j, j0 are integers, this also satisfies an orthogonality
relation. In fact, we are going to prove that
N−1
1 X i2π( j− j0 )/N
e = δ j j0 , (4.130)
N n=0
where δ j j0 is the Kronecker delta. It is easy to accept that the case j = j0 is correct,
P
since in this case we get n 1, and the sum has N terms. It is much less obvious what
happens if j , j0 (both still integers, of course). The idea is that in this case the complex
exponentials all cancel out. I will try to give two arguments to convince you of this.
167
�=� �=� �=� �=� �=� �=�
The first is Fig. 4.9: I plot the position of ei2πn j/N in the complex plane. Each plot is
for a fixed j, with N = 6. The message is that except for j = 0, in all other cases the
horizontal and vertical components of each red dot will cancel identically.
0
Another, more rigorous way of seeing this is by defining x = ei2π( j− j )/N , so that our
sum becomes
N−1
X N−1
X
0
ei2πn( j− j )/N = xn
n=0 n=0
We assume j , j0 . This is then a truncated geometric series, so we may use the
following tabled result
N−1
X 1 − xN
xn = , (4.131)
n=0
1−x
That is,
N−1 0
X
i2πn( j− j0 )/N 1 − ei2π( j− j )
e = .
n=0
1 − ei2π( j− j0 )/N
0
But this, in turn, will be identically zero since ei2π( j− j ) = 1. Thus, indeed, if j , j0 the
sum yields zero. And if j = j0 , it gives N. We have thus proved (4.130).
Armed with Eq. (4.130) we can now finally compute the Fourier coefficients f˜n in
Eq. (4.129). We multiply both sides by ei2πm j/N and sum over j, from 0 to N − 1:
N−1
X N−1
X N−1
X
f j ei2πm j/N = f˜n ei2π(m−n) j/N .
j=0 n=0 j=0
The sum over j yields Nδn,m , which in turn kills the sum over m, leaving us with
N−1
X
f j ei2πm j/N = N f˜m .
j=0
168
�
�
� �π-Ω
� Ω=�
�
|� �|
�
��
�
-� �
-� �
� �� ��� ��� � � � � � � �
� ω�
Figure 4.10: Left: the function f j = sin(Ω j) + R j , where R j is a random number between [−2, 2]
representing some noise that has been added to the data. The plot was produced
using Ω = 1. Right: Corresponding Fourier Transform | f˜n |.
which is our desired formula. It determines the Fourier coefficients directly from orig-
inal function f j . Notice the beautiful symmetry with respect to (4.129).
The move from one to the other and back is a consequence of the orthogonality
relation
N−1
1 X i2π( j− j0 )/N
e = δ j j0 , (4.133)
N n=0
which holds for any integers j, j0 , N. The original function is evaluated at a set
of points t j = j∆t, with j = 0, 1, . . . , N − 1. And since
169
���
� ���
���/���
� ���
|� �|
� ���
� ���
���
���� ���� ���� ���� ���� ���� ���� ��� ��� ��� ��� ��� ��� ���
���� ω�
Figure 4.11: Left: ratio between Euro and Brazilian Real as a function of time. Right: Corre-
sponding Fourier Transform | f˜n |.
some noise to it. That is, we consider the sequence f j = sin(Ω j) + R j , where R j is a
random number between [−2, 2]. The result is the left panel in Fig. 4.10. Note that sin()
varies between −1 and 1, while the noise varies from -2 to 2. We are therefore adding a
huge noise on top of it. In fact, the original sinusoidal behavior is barely recognizable.
But we can do a spectral analysis. That is, we take the DFT (4.132) (more on
how to do that on a computer below). The result for | f˜n | is shown in the right panel
of Fig. 4.10. Remarkably, the identification of the frequencies is still quite clear. On
top of all the background noise, two peaks clearly stand out. One peak corresponds to
ωn ∼ Ω. And since sin a = (eia − e−ia )/2i, there will also be a peak at ωn ∼ 2πΩ.
In the real world, the examples are not always this gentle. In Fig. 4.11 I plot the
time series for the ratio between the Euro and the Brazilian real, over the last 12 years.
As can be seen, on top of the overall trend, there is a bunch of noise. This noise, how-
ever, has several forms. Some noises are low frequency, which can mean for instance
seasonal fluctuations and so on. Conversely, there are noises which have very high
frequency, like the fact that the market generally behaves differently on Mondays or
Fridays. Each point in Fig. 4.11 is for one day. So ∆t = 1 day. This defines the ultra-
violet cut-off of this series. The highest frequency is therefore 2π/(1 day). However,
I plot only up to π, since, as we saw in Fig. 4.10, the interval [π, 2π] will just be a
repetition of the previous one.
The Fourier analysis presented in Fig. 4.11 is quite crude and there are many other,
more sophisticated ways of analyzing a time series in Fourier space. This is related to
terms such as “Power spectral density” and “Autocorrelation function”. Unfortunately
I will not have time to go into these details. But you can learn about it in any data
analysis or signal processing book.
170
The Fast Fourier Transform (FFT) algorithm
This section is based on the book “Linear Algebra”, by Gilbert Strang, which, by
the way, is the best book ever written in the entire world.
The reason why the DFT is so useful in practice is because it can be computed
insanely fast. The algorithm that does this is called the Fast Fourier Transform, or FFT.
It was invented by Cooley and Tukey in 1965, although there are bits and pieces of it
already present in Gauss’ work in 1805. For a sequence { f j } of N points, a naive calcu-
lation of { f˜n } using Eq. (4.132) would required N 2 operations: there are N coefficients
f˜n to compute. And for each one, we need to add up N term f j e−i2πn j/N . FFT does it
with 21 N ln2 N. This is huge improvement. It is the difference between 1 million and
5000. Imagine if this was money in a bank account we were talking about. In fact,
the dependence on (ln N) is not very relevant [remember that ln10 (1023 ) = 23], so the
algorithm scales roughly linearly with the size of the list.
The first step in constructing the FFT algorithm is to realize that the DFT (4.132) is
actually nothing but a matrix multiplication. Define the N × N matrix WN , with entries
where ωN = e−i2π/N is introduced by convenience. Note that the indices of the matrix
go from 0, . . . , N − 1, like in many programming languages. With this definition, we
can now write the DFT (4.132) as
N−1
X N−1
X
fj = (WN ) jn f˜n = ωNjn f˜n . (4.135)
n=0 n=0
It is exactly the same thing. We can thus store the sequences { fn } and { f˜n } in vectors
f = ( f0 , f1 , . . . , fN−1 ) and f˜ = ( f˜0 , f˜1 , . . . f˜N−1 ). Then Eq. (4.135) can be written as
f = WN f˜, (4.136)
171
Eq. (4.136) would then become
1 f˜0
f0 1 1 1
f1 1 −i −1 i f˜1
= .
f 1
2 −1 1 −1 f˜2
f3 1 i −1 −i f˜3
This matrix WN is very special. Its entries are complex and it is symmetric. That is,
WT = W. But most importantly, it is what we call, a unitary matrix. To explain what
that means, we first define an operation called the Hermitian conjugate, according to
A† = (AT )∗ .
That is, you first transpose the matrix and then take the complex conjugate. The symbol
A† reads “A-dagger”. A matrix U is said to be unitary when U † U = I, where I is the
identity matrix. Unitary matrices are the complex generalization of rotation matrices.
The reason why they are special is because U † is the inverse of U. Recall that the
inverse of a matrix is defined so that A−1 A = I. So, since, U † U = I, it follows that
Inverting (which is usually a difficult operation) becomes trivial for unitary matrices.
The matrix WN itself is not unitary. But N11/2 WN is:
1 †
WN† WN = NIN → WN−1 = W . (4.139)
N N
This gives us an easy method to invert Eq. (4.136): f˜ = N1 WN† f , which is nothing but
the second equation in (4.132). The journey back is therefore as easy as the journey
forward (I wished it had been this easy for Frodo and Sam). Note also that even though
WT = W, it is not true that W† = W. That is, the matrix is not Hermitian.
We are now ready to describe the main idea behind the FFT algorithm. All the
algorithm does is compute WN f˜. It multiplies a vector by a matrix. This takes N 2
operations and, in principle, one would think that it is hard to do any better than that.
But the FFT can. The trick is recursion. The matrix WN is expressed in terms of
WN/2 . Then WN/2 is expressed in terms of WN/4 . And so on. This is possible because
WN is a very special matrix. First, since entries are ωnNj , they are displaced in a very
special way along the matrix. And second, since ωN = e−i2π/N , we have the very special
relation ωN/2 = ω2N .
Let us see how this factorization works. What we are interested in is computing the
matrix vector product
N−1
X
y = WN x, or yj = ωnNj xn . (4.140)
n=0
172
sum into even and odd terms:
X X
yj = ωnNj xn + ωnNj xn
n=0,2,4,... n=1,3,5,...
N N
−1 2 −1
2
X X
= ω2k j
N x2k + ω(2k+1)
N
j
x2k+1
k=0 k=0
If we stare at this for a second, we will realize that the resulting sums are nothing but
the application of WN/2 to the two smaller vectors, xe = (x0 , x2 , x4 , . . .) and xo =
(x1 , x3 , x5 , . . .). That is,
This is the essence of the FFT: instead of computing the big product WN x, we compute
the smaller products WN/2 xe/o , associated with the even and odd components of x.
And, as you can probably imagine, we don’t stop there, but keep going recursively.
That is, instead of computing WN/2 xe directly, we split this further into two other
vectors xee = (x0 , x4 , x8 , . . .) and xeo = (x2 , x6 , x1 0, . . .), and apply WN/4 to them,
following the recipe in Eq. (4.141). And similarly for WN/2 xo .
FFT always uses vectors whose lengths are powers of 2. That is, N = 2` . If your
vector has a different length, the algorithm automatically increases it to the closest
power of 2 by padding with zeros. The first step is then to rearrang the vector x
into even/odd components, several times. For instance, it may produce something like
x = (xee , xeo , xoe , xoo ). Or it can go further; it depends on how many recursions
you want. The idea is to go down to a subvector which is small enough, so that the
application of W becomes very fast. This ordering operation is very fast. Once x
is reordered, the algorithm applies the small W to each, saves the result, and then
uses (4.141) recursively to reconstruct back the result. There are ` = ln2 N levels to
move and to reconstruct each level requires N/2 multiplications/additions like the one
in (4.141). The total cost is thus N2 ` = N2 ln2 N.
173
Chapter 5
Legendre Polynomials
This chapter discusses two related concepts: series solutions of ODEs and orthog-
onal polynomials. The prototypical example we are going to analyze is Legendre’s
differential equation
(1 − x2 )y00 − 2xy0 + n(n + 1)y = 0, (5.1)
where n = is a constant. This is a 2nd order, linear ODE but with non-constant coeffi-
cients. It is therefore more difficult to handle than the equations we solved in chapter 2.
This type of equation appears, as we will see, when dealing with Poisson or Laplace’s
equation in spherical coordinates. It therefore appears in electromagnetism and quan-
tum mechanics. For instance, do you remember those funny-looking atomic orbitals
that you learned in chemistry? It will turn out that they are directly related with the
solutions of Eq. (5.1).
The usual method for solving these equations is to try out a series solution; that is,
a solution of the form
∞
X
y(x) = a j x j = a0 + a1 x + a2 x2 + . . . , (5.2)
j=0
where a j are coefficients that we try to adjust. In general the sum will be infinite, so the
solution y(x) may be any kind of exotic function. Sometimes, however, the series trun-
cates, yielding a solution y(x) that is a polynomial in x. In the Legendre equation (5.1),
this will happen when the constants n are integers, n = 0, 1, 2, 3 . . .. The resulting
174
solutions are called Legendre polynomials, and the first few such polynomials are
P0 (x) = 1,
P1 (x) = x,
1 2
P2 (x) = (3x − 1),
2
1 3
P3 (x) = (5x − 3x), (5.3)
2
1
P4 (x) = (35x4 − 30x2 + 3),
8
1
P5 (x) = (63x5 − 70x3 + 15x),
8
1
P6 (x) = (231x6 − 315x4 + 105x2 − 5).
16
For instance, if you are bored, you can check by hand that P6 (x) is a solution of
Eq. (5.1) when n = 6, and so on. We will learn below a more sophisticated and general
way of doing this.
The Legendre polynomials satisfy a remarkable property. Namely, they form a set
of orthogonal functions in the interval [−1, 1]. That is, they satisfy
Z1
2
Pn (x)Pm (x)dx = δn,m . (5.4)
2n + 1
−1
If n , m, they integrate to zero. If n = m, they give this silly constant 2/(2n + 1). It is
easy to check this for, say, P1 and P2 . I recommend you do it, to get a feeling of what
is happening.
Eq. (5.4) should remind you of the orthogonality relations of sines, cosines and
complex exponentials that we learned in chapter 1. In that occasion, it was precisely the
orthogonality relations which allowed us to construct the Fourier series. That is, which
allowed us to express an arbitrary periodic function in terms of a linear combination of
sines and cosines. Here a similar logic will apply. That is, any function f (x) defined in
the interval x ∈ [−1, 1] can be expanded in Legendre polynomials as
∞
X
f (x) = cn Pn (x), (5.5)
n=0
where cn are coefficients that are determined just like in the Fourier case: we multiply
both sides of (5.5) by Pm (x) and integrate from -1 to 1. Due to the orthogonality (5.4),
we are then left with
Z1 ∞ Z1
X 2
dx f (x)Pm (x) = cn dxPn (x)Pm (x) = cm ,
n=0
2m + 1
−1 −1
175
or,
Z1
2n + 1
cn = Pn (x) f (x)dx. (5.6)
2
−1
This kind of procedure is identical to the one we used in the Fourier business, many
times. And this example serves to show that it is actually more general. It works
whenever we want to expand a function as a linear combination of a set of orthogonal
functions. As we will learn, there are in fact quite a few such sets, depending on the
interval in question and the types of properties one is interested in.
This chapter will therefore be centered on these two new concepts: series solutions
of ODEs and orthogonal polynomials. They are, to a great extent, generalizations of the
ideas that we already treated in previous chapters. First, series solutions offer a more
sophisticated method for solving harder ODEs. And second, orthogonal polynomials
generalize the idea of orthogonality of functions beyond sines and cosines. I think this
is a nice way of concluding the course: we spend the entire semester learning about
new ideas and methods. And now we finish it by realizing that these methods are just
the tip of the iceberg, and these core ideas can actually be extended much much further.
where {. . .} will be coefficients that depend on the a j . Since the functions x j are lin-
early independent, for y to be a solution for all x, each coefficient {. . .} must vanish
independently. This will then give us a relation between the a j ’s.
To execute this idea, we need to compute−2xy0 and (1− x2 )y00 , with y given by (5.2).
First,
∞
X
y0 = j a j x j−1 .
j=1
The sum in principle starts at 1 because the derivative of a0 is zero. But notice we
could also write it as starting at 0 since there is a factor j in there, which is zero when
j = 0. Playing around with the index of the sum is an important trick in this business.
So please make sure you understand this point. Now, what we actually want is −2xy0 ,
so we get
X∞
− 2xy0 = −2 j a j x j , (5.7)
j=0
where I already wrote the sum starting from 0. We see that multiplying by −2x replen-
ishes the missing x from the derivative. That is, the result is now already proportional
to x j . We will try to write all our sums as being proportional to x j .
176
Next we turn to the second derivative:
∞
X
y00 = j( j − 1)a j x j−2 .
j=2
The sum now starts at 2, but we could also written it starting from 0 if we wanted, since
j( j − 1) is zero when j = 0 or j = 1. From this we then compute
∞
X ∞
X
(1 − x2 )y00 = j( j − 1)a j x j−2 − j( j − 1)a j x j .
j=2 j=0
The last term is already proportional to x j , so I wrote the sum starting from 0. But the
first sum is still needs some adjustments. Since we want everything involving powers
of x j , we change variables from j to j0 = j − 2, only in this first sum:
∞
X ∞
X 0
j( j − 1)a j x j−2 = ( j0 + 2)( j0 + 1) a j0 +2 x j
j=2 j0 =0
We can now call j0 as j again, since this is a dummy variable (i.e., it is being summed
over). This allows us to combine the two sums as
∞ h
X i
(1 − x2 )y00 = ( j + 2)( j + 1) a j+2 − j( j − 1) a j x j . (5.8)
j=0
Plugging Eqs. (5.7) and (5.8) back into Eq. (5.1) finally yields
X∞ (h i )
( j + 2)( j + 1) a j+2 − j( j − 1) a j − 2 j a j + n(n + 1)a j x j = 0.
j=0
Equating each term to zero gives a recursion relation, specifying the a j+2 in terms of
a j:
j( j + 1) − n(n + 1)
a j+2 = a j, j = 0, 1, 2, 3, . . . . (5.9)
( j + 2)( j + 1)
This therefore fixes the even coefficients in terms solely of a0 , and the odd ones in
terms solely a1 . That is, given a0 , we use this to determine a2 , then a4 , then a6 , etc. For
instance,
1
a2 = − n(n + 1)a0 ,
2
1 1
a4 = − (n + 3)(n − 2)a2 = n(n + 1)(n + 3)(n − 2)a0 ,
12 24
177
etc. Similarly, given a1 , this determines a3 , then a5 and so on. The general solution
will thus have the form
" #
1 1
y(x) = 1 − n(n + 1)x2 + n(n + 1)(n − 2)(n + 3)x4 − . . . a0 (5.10)
2 4!
" #
1 1
+ x − (n − 1)(n + 2)x + (n − 1)(n + 2)(n − 3)(n + 4)x − . . . a1 .
3 5
3! 5!
The values of a0 and a1 are undetermined, leaving us with two constants to adjust, as
befits a 2nd order ODE. We have therefore found the two independent solutions. One
is even in x, while the other is odd. I know they may look a bit messy. But it is quite
remarkable that we can actually write them down explicitly. After all, it is not an easy
ODE we are dealing with.
Several questions still remain, though. First, is the solution (5.10) unique? Yes.
We have found two linearly independent solutions to a linear 2nd order ODE. It must
therefore be unique. Second, does the solution converge? We can analyze this using
the ratio test. A sequence ∞j=0 u j converges absolutely when lim j→∞ |u j+1 /u j | < 1. In
P
our case, we have to test the sequence ∞j=0 a j x j (that is, u j = a j x j ). From Eq. (5.9),
P
we have that
j( j + 1) − n(n + 1) x2 = x2 ,
a j+2 x j+2
lim = lim
j→∞ a j x j ( j + 2)( j + 1)
j→∞
since the limit of the j quantities is 1. Thus, we see that the series converges only for
x2 < 1. For x2 = 1, in general, the series does not converge. The limits of validity of
the solution are thus
− 1 < x < 1. (5.11)
Quite often, in applications, we will see that Legendre’s equation appear with x being
the cosine of something. That is, x = cos θ. The restriction to have x2 < 1 thus turns
out to be ok.
Fig. 5.1 shows a plot of the solution (5.10) for (a0 , a1 ) = (1, 0) and (a0 , a1 ) = (0, 1),
with fixed n = 4.5. In the interval x ∈ [−1, 1] the function is well behaved, and wiggles
around happily. But as soon as we leave this interval, it diverges quickly. The table on
image (b) illustrates this divergence, by showing the values of y(x) at x = 1.05 for the
solution (5.10), but going only up to a maximum value jmax .
Legendre polynomials
The solution (5.10) is not a polynomial. Even though it is a power series in x, the
series is infinite and therefore it may sum up to look like any kind of function. But
something quite special happens if n is an integer. Looking at Eq. (5.9), we see that in
this case one of the series truncates. For instance, suppose n = 4. Then, when j = 4,
the numerator will cancel, leading to a6 = 0. But since a6 = 0, we will also have a8 = 0
and then a10 = 0 and so on. The only even coefficients for n = 4 will thus be a0 , a2 and
a4 . Conversely, the odd series does not truncate, because the numerator never vanishes.
Thus, if n is even, the solution proportional to a0 in (5.10) will be a finite polyno-
mial and the other will be an infinite series. And if n is odd, the solution proportional
178
�
(�) (�)
��� ���� �
� �� �������
��� �������
� ���
�
��� �������
�
-�
-���
-�
-��� -��� ��� ��� ��� -��� -��� ��� ��� ���
� �
Figure 5.1: Solution (5.10) of Legendre’s equation for n = 4.5. (a) (a0 , a1 ) = (1, 0) and (b)
(a0 , a1 ) = (0, 1). The table in (b) illustrates the values of the solution at x = 1.05,
but summing only up to a certain maximum value jmax . If we were to sum (5.10) to
infinity, outside [−1, 1], we would simply get infinity.
��
��
��
��
��� ��� ��� ��� ���
-��� -��� -��� -��� -���
-��� -��� -��� -��� -���
-���-��� ��� ��� ��� -���-��� ��� ��� ��� -���-��� ��� ��� ��� -���-��� ��� ��� ��� -���-��� ��� ��� ���
� � � � �
��� ��� ��� ��� ���
��� ��� ��� ��� ���
��
��
��
��
��
��� ��� ��� ��� ���
-��� -��� -��� -��� -���
-��� -��� -��� -��� -���
-���-��� ��� ��� ��� -���-��� ��� ��� ��� -���-��� ��� ��� ��� -���-��� ��� ��� ��� -���-��� ��� ��� ���
� � � � �
Figure 5.2: The first few Legendre polynomials in the interval [−1, 1].
to a1 will be a polynomial, and the other will not. The family of polynomials this
generates are the Legendre polynomials.
For instance, if n = 4 the polynomial will have the form
P4 (x) = a0 1 − 10x2 + 35x4 /3 .
The value of a0 is arbitrary and depends on the convention that is adopted. Usually, we
choose a0 so that P4 (1) = 1. This then gives
179
Sturm-Liouville theory
Looking back at Eq. (5.1), we can also write the first two terms as
dh di
(1 − x2 )y00 − 2xy0 = (1 − x2 ) y.
dx dx
Let us then define the differential operator
dh di
L= (1 − x2 ) , (5.12)
dx dx
so that (5.1) is converted into
L(y) = −n(n + 1)y. (5.13)
This is now starting to look like eigenstuff business: an operator times a function is
suppose to be a number times the same function. In fact, it can be shown that the
eigenvalue/eigenfunction equation
L(y) = −λy, (5.14)
together with the added assumption that the solution should be regular at x = ±1, has
eigenfunctions which are precisely the Legendre polynomials Pn (x), and eigenvalues
λ = n(n + 1). This type of scenario is usually called a Sturm-Liouville problem. A
generic Sturm-Liouville problem is of the form
dh di
p(x) y + q(x)y = −λw(x)y, (5.15)
dx dx
for given functions p(x), q(x) and w(x). One must also specify the interval of interest,
x ∈ [a, b], and boundary conditions have to be specified at a and b. The problem is
then to find the allowed solutions y(x), together with the eigenvalues λ.
For instance, consider the ODE
y00 = −λy, y(0) = y(L) = 0, (5.16)
which we studied exhaustively in Chapter 3. This is a Sturm-Liouville problem, with
p(x) = w(x) = 1 and q(x) = 0. Indeed, we saw that the eigenfunctions were y(x) =
sin(kx), with eigenvalues λ = k2 , where k = π`/L. Sturm-Liouville problems form an
important part in the Mathematics of differential equations, as there are many applica-
tions which can be placed under this category. Unfortunately, we will not have the time
to discuss the general theory in much detail, but will have to focus only some particular
examples.
180
Leibniz’ rule for differentiating products
Before we start, we must first briefly discuss a formula that will be quite useful in
what follows. Let u(x) and v(x) be two arbitrary functions. Then
d
(uv) = (∂ x u)v + u(∂ x v).
dx
Similarly, This becomes even clearer if we use ∂ x for the derivatives:
d2
(uv) = (∂2x u)v + 2(∂ x u)(∂ x v) + u(∂2x v).
dx2
You can see here a kind of binomial shape taking place:
(a + b)2 = a2 + 2ab + b2 .
But instead of powers, we have derivatives. We can continue this and prove, by induc-
tion, that
n
dn
!
X n j
(uv) = (∂ x u)(∂n− j
x v), (5.17)
dxn j=0
j
n
which is called Leibniz’ rule for the derivative of a product. Here j = n!
j!(n− j)! is the
binomial coefficient.
Rodrigues formula
We will now use Leibniz’ rule to find a more systematic way of generating the Leg-
endre polynomials, which will make their analytical properties much more transparent.
Namely, we will now show that
1 dn h 2 i
Pn (x) = n n
(x − 1)n .
2 n! dx
We are going to prove this in two ways. First, we will simply check that it indeed
matches the polynomials in Eq. (5.3). And second, we will show that this indeed
satisfies Legendre’s differential equation (5.1).
To compare with (5.3), we use Leibiniz’ rule (5.17), by first factoring x2 − 1 =
(x − 1)(x + 1). This then leads to
n !
1 X n j
∂ x (x − 1)n ∂n−
j
Pn (x) = n x (x + 1) .
n
2 n! j=0 j
181
And the second is
n!
∂n− j
x (x + 1) = n(n − 1) . . . (n − (n − j) + 1)(x + 1)
n n−(n− j)
= (x + 1) j .
j!
I know these two formulas are a bit nasty. It helps if you think about specific examples,
n n = 4 and j = 2 or something. In any case, with these two results, and using also
like
j = j!(n− j)! , we finally get
n!
n !2
1 X n
Pn (x) = (x − 1)n− j (x + 1) j .
2n j=0 j
This provides an explicit, and very useful formula for computing any Legendre polyno-
mial. I leave it to you to verify that if we plug specific values of n, we get for instance
the polynomials in Eq. (5.3).
Next, we verify that Eq. (5.18) is indeed a solution of Legendre’s differential equa-
tion (5.1). Except for a numerical factor, the Legendre polynomials in Eq. (5.18) are
essentially the n-th derivative of the function ψ(x) := (x2 − 1)n . That is,
1 dn ψ
Pn (x) = . (5.20)
2n n! dxn
But since ψ0 = 2nx(x2 − 1)n−1 , this function satisfies the cute property
182
We will now obtain Legendre’s equation by differentiating both sides of this expression
n + 1 times:
dn+1 h 2 i dn+1
n+1
(x − 1)ψ0 = 2n n+1 (xψ). (5.22)
dx dx
We differentiate n + 1 times because Pn is the n-th derivative of ψ, plus Eq. (5.1) is
a second order ODE, while (5.21) still has only one derivative. Differentiating n + 1
times must, therefore, give us a 2nd order ODE for Pn . And then we can verify that
this ODE is indeed (5.1).
To actually carry out the computation, we use Leibniz’ rule (5.17) again. We start
with the right-hand side of (5.22), using u = x and v = ψ:
n+1
dn+1 n+1
X !
(xψ) = (∂ xj x)(∂n+1−
x
j
ψ).
dxn+1 j=0
j
The sum goes now up to n + 1, instead of n, since we want the (n + 1)-th derivative. But
lucky for us, the first derivative, ∂ xj x, will be non-zero only for j = 0 or j = 1. Whence
dn+1 (n + 1)! (n + 1)!
(xv) = x ψ) +
(x)(∂n+1 (1)(∂nx ψ).
dx n+1 0!(n + 1)! 1!(n + 1 − 1)!
Or, simplifying,
dn+1
(xψ) = xψ(n+1) + (n + 1)ψ(n) . (5.23)
dxn+1
Next we do the same for the left-hand side of (5.22), with u = (x2 − 1) and v = ψ0 :
n+1
dn+1 h 2 n+1 j 2
X !
(x − 1)ψ ] =
0
∂ x (x − 1) (∂n+1− j 0
ψ ).
x
dxn+1 j=0
j
183
Orthogonality of the Legendre polynomials
A major application of Rodrigues’ formula is to prove that the Legendre poly-
nomials are orthogonal. In fact, it turns out, the Legendre polynomials are not only
orthogonal among themselves. They are also orthogonal with respect to any monomial
xm , with m < n. That is, we will now show that
Z1
Pn xm dx = 0, m < n. (5.25)
−1
This derivative will now be identically zero whenever m < n. Hence, we just proved (5.25).
From this, we then know that Pn and Pm are orthogonal [Eq. (5.26)] when m , n.
The last thing we need to do is to compute the value of the integral when m = n; that
is,
Z1 Z1 n n
1 d ψd ψ
Pn (x)dx = n 2
2
dx.
(2 n!) dxn dxn
−1 −1
Integrating by parts, as before, we can write this as
Z1 Z1
(−1)n d2n ψ
P2n (x)dx = n 2 ψ dx.
(2 n!) dx2n
−1 −1
vanish, except the first term x2n in the binomial expansion, which will give a factor of
(2n)!. Thus
Z1 Z1
(−1)n (2n)!
Pn (x)dx =
2
(x2 − 1)n dx.
(2n n!)2
−1 −1
184
Finally, we compute this remaining integral again using integration by parts. We
first write it (x2 − 1)n = (x − 1)n (x + 1)n and identify u = (x − 1)n and dv = (x + 1)n dx.
The boundary terms again vanish and we get
Z1 Z1
n
(x − 1) (x + 1) dx = −
n n
(x − 1)n−1 (x + 1)n+1 dx.
n+1
−1 −1
Z1 Z1
(n!)2 (n!)2 22n+1
(x − 1) (x + 1) dx = (−1)
n n n
(x + 1)2n dx = (−1)n .
(2n)! (2n)! (2n + 1)
−1 −1
Whence
Z1 2
(−1)n (2n)! n (n!) 22n+1
P2n (x)dx = (−1) .
(2n n!)2 (2n)! (2n + 1)
−1
Or, simplifying,
Z1
2
P2n (x)dx = . (5.27)
2n + 1
−1
1
G(x, λ) = √ . (5.28)
1 − 2xλ + λ2
We are going to show that these coefficients turn out to be exactly the Legendre poly-
nomials; qn (x) ≡ Pn (x). For this reason G is called the generating function of the
Legendre polynomials. If you think about it, this is quite impressive: all polynomials
can be generated from this very simple formula, by a simple Taylor expansion. The
185
variable λ does not need to have a particular physical meaning (although sometimes it
does). It is simply an auxiliary variable for obtaining the polynomials.
The way to prove this is quite fun. We are going to first obtaining a partial differ-
ential equation for G in terms of x and λ. We then try to solve this PDE by the series
method, expanding G as in Eq. (5.29). What we are going to find is that each coefficient
qn (x) will have to be a solution of Legendre’s equation (5.1) for a different value of n.
To cook up our PDE, let us first play with the partial derivatives of G. From (5.28)
we have that
∂G λ
= = λG3 , (5.30)
∂x (1 − 2xλ + λ2 )3/2
∂2 G 3λ2
= = 3λ2G5 , (5.31)
∂x2 (1 − 2xλ + λ2 )5/2
and
∂G x−λ
= = (x − λ)G3 , (5.32)
∂λ (1 − 2xλ + λ2 )3/2
∂2 G 1 3(x − λ)2
= − + = −G3 + 3(x − λ)2G5 . (5.33)
∂λ2 (1 − 2xλ + λ2 )3/2 (1 − 2λx + λ2 )5/2
In this last term, we may also write
(x − λ)2 = x2 − 2xλ + λ2 = (x2 − 1) + (1 − 2xλ + λ2 ) = (x2 − 1) + G−2 .
Thus,
∂2G
= 3(x2 − 1)G5 + 2G3 . (5.34)
∂λ2
We now try to combine all these derivatives in a clever way.
For instance, we can start by trying to match the coefficients that involve G5 .
From (5.31) and (5.34) we see that
∂2 G 2∂ G
2
(1 − x2 ) + λ = 2λ2G3
∂x2 ∂λ2
so we kill the terms with G5 . We now only need to combine ∂G/∂x and ∂G/∂λ in a
way that yields −2λ2G3 . The way to do so is as
∂G ∂G
−2x + 2λ = −2λ2G3 .
∂x ∂λ
Thus, adding the two will finally give zero:
∂2 G ∂G ∂G 2∂ G
2
(1 − x2 ) − 2x + 2λ + λ = 0.
∂x2 ∂x ∂λ ∂λ2
This is a PDE satisfied by G, in terms of x and λ. The last two terms, in particular, can
also be written in a more compact way, leading to
∂2 G ∂G ∂2
(1 − x2 ) − 2x + λ 2 (λG) = 0. (5.35)
∂x 2 ∂x ∂λ
186
We now try to solve this using the power series method, similar to what we did in
Sec. 5.1. But since we have two variables, we attempt a series solution only in λ, just
like in Eq. (5.29). And then see what happens for the coefficients qn (x). The first two
terms in (5.35) are easy to handle. Since the derivatives are with respect to x, they act
only on the qn ’s. That is,
∞
∂2 G ∂G X n o
(1 − x2 ) − 2x = (1 − x2 )q00n − 2xq0n λn .
∂x 2 ∂x n=0
This is pretty cool eh? We see that the coefficients are each given precisely by Legen-
dre’s equation (5.1), for integer values of n. Hence, the solutions must be the Legendre
polynomials, qn (x) = Pn (x).
There is one final detail we must address. The Legendre polynomials are normal-
ized so that Pn (1) = 1. And we must check that G is indeed giving this. On the one
hand,
G(1, λ) = q0 (1) + q1 (1)λ + q2 (1)λ2 + . . . .
On the other, if we use (5.28) we get that
1
G(1, λ) = = 1 + λ + λ2 + . . . ,
1−λ
which is nothing but the Geometric series. Comparing the two formulas we then find,
indeed, that qn (1) = 1. Thus, in all details, the qn are exactly the Legendre polynomials.
(5.38)
187
Each term in the sum is given exactly by Legendre’s equation (5.1) for integer
n.
Recurrence relations
One of the major applications of the generating function is in finding recurrence
relations between the Legendre polynomials. Recurrence relations are a little bit like
trigonometric identities. They are relations between the Pn which can be used to sim-
plify the calculations. For instance, one such recursion relation is
known as Bonnet’s recursion formula. This is not at all obvious, right? And it is also
super useful because it provides a very easy method for constructing the polynomials
systematically: if we know P0 = 1 and P1 = x, we can use this to construct P2 , then P3
and so on.
There is no single general recipe for deriving relations like (5.39). But usually,
the idea is to play around with the generating function and see if you can find useful
relations. For instance, we can prove (5.39) starting with Eq. (5.32) and rewriting it as
∂G
(1 − 2xλ + λ2 ) = (x − λ)G. (5.40)
∂λ
We now plug the expansion (5.37) on both sides, which leads to
∞
X ∞
X
(1 − 2xλ + λ2 ) nPn λn−1 = (x − λ) Pn λn .
n=0 n=0
To get the recursion relation (5.39), we now just need to write everything under the
same power of λn . This can be done by manipulating the indices of the sums, just like
we did in Sec. 5.1. I will leave it for you as an exercise.
Recurrence relations
188
5.4 Multipole expansion of Poisson’s equation
We now finally arrive at the first big application of Legendre polynomials. In
Sec. 4.6 we discussed the solution of Poisson’s equation
− ∇2 φ = ρ(r), (5.45)
where ρ is the external source and r = (x, y, z). We saw that the solution could be
written as Z
φ(R) = d3 r G(R − r)ρ(r), (5.46)
where r = |r| and R = |R|. Moreover, let θ denote the angle between the vectors r and
R, so that R · r = rR cos θ. We may then write
√ p
|R − r| = R2 − 2Rr cos θ + r2 = R 1 − 2(r/R) cos θ + (r/R)2 .
If we now stare at this for a second, we realize that the square-root part is nothing
but the characteristic function for Legendre polynomials in Eq. (5.37), provided we
identify λ = r/R and x = cos θ. That is
1
G(R − r) = G(cos θ, r/R).
4πR
A series expansion in powers of Eq. (5.48) will then lead to
∞
1 X
G(R − r) = (r/R)n Pn (cos θ). (5.49)
4πR n=0
189
Figure 5.3: Very often one is interested in the potential generated by a localized source ρ(r), at
a distant point R.
This should then to be plugged back into the solution (5.46), leading to
ρ(r)
Z
1
φ(R) = d3 r p
4πR 1 − 2(r/R) cos θ + (r/R)2
∞ Z
1 X
= d3 rρ(r) (r/R)n Pn (cos θ), (5.50)
4πR n=0
This integral is nothing but the total charge Q = d3 r ρ(r). Thus, infinitely far away,
R
190
where, to make the dependence on r more explicit, I wrote cos θ = (r · R̂)/r, with
R̂ = R/R being the unit vector in the direction of R. The quantity
Z
p= d3 r ρ(r) r, (5.51)
is called the dipole moment of the charge distribution1 Up to first order in (r/R), the
potential (5.50) will thus be
Q p · R̂
φ(R) = + + .... (5.52)
4πR 4πR2
Quite often, we can have distributions ρ(r) where the net charge Q is zero. This hap-
pens if the distribution is neutral, but not necessarily uniform. For instance, when it is
composed of two charges +q and −q. The first term in (5.52) will vanish in this case.
But, as we can see, there will still be an electrostatic potential, although it will now fall
as 1/R2 instead of 1/R (the field will thus fall as 1/R3 ). A non-uniform distribution
will hence still generate an electrostatic potential. But it falls faster with the distance.
Eq. (5.50) goes by the name of multipole expansion. The first term is the monopole
and the second is the dipole. The next term, associated with n = 2, will be associated
with the quadrupole moment. In this case P2 = (3x2 − 1)/2 so this term will have the
form
2 3 cos θ − 1
Z 2 Z 2
1 1 1 2 (r · R̂)
h i
d r ρ(r)(r/R)
3
= d 3
r ρ(r)r 3 − 1 .
4πR 2 4πR3 2 r2
The quadrupole cannot be associated to a single vector, like the dipole in Eq. (5.51).
Instead, it is associated with a tensor,
Z
Ti j = d3 r ρ(r) (ri r j − r2 δi j ). (5.53)
In fact, I will leave it for you as an exercise to check that the n = 2 contribution is
written as
1 1X
Ti j Ri R j . (5.54)
4πR3 2 i j
The solution, up to n = 2, will thus be
Q 1 X 1 1X
φ(r) = + pi Ri + Ti j Ri R j + . . . . (5.55)
4πR 4πR2 i 4πR3 2 i j
You can probably see a pattern emerging: The first term is a scalar operation in R. The
second is the contraction of R with a vector (which is a rank 1 tensor). The next is a
contraction with a rank 2 tensor and so on. There are some systems for which both the
monopole and dipole terms vanish, and the first contribution is the quadrupole. This
happens, for instance, in some types of liquid crystals and is associated with their
shapes and composition.
1 You may have seen the dipole moment in the case of a system with only two charges, +q and −q. To
obtain that, we can simply write ρ(r) = qδ(r − r1 ) − qδ(r − r1 ). The Dirac deltas kill the integral, leaving
us with p = q(r1 − r2 ), which is the more familiar formula for the dipole moment.
191
5.5 Associated Legendre functions
Next section, when we study the Laplacian in spherical coordinates, the following
ODE is going to show up:
m2
" #
(1 − x )y − 2xy + `(` + 1) −
2 00 0
y = 0, (5.56)
1 − x2
where m is an integer. This is called the associated Legendre equation. And, as the very
name hints us, it turns out to be very associated with Legendre’s equation (5.1). First,
clearly if m = 0 we recover (5.1). Next, for m , 0, define a new variable u according
to
y = (1 − x2 )m/2 u. (5.57)
Plugging this in Eq. (5.56) then leads, after a bit of algebra/patience to
h i
(1 − x2 )u00 − 2(m + 1)xu0 + `(` + 1) − m(m + 1) u = 0. (5.58)
This is looking even more like Legendre’s equation (5.1). In fact, again, if m = 0, we
get exactly that.
The solutions of (5.62) are not Legendre polynomials. Instead, as we now show,
they are derivatives of the Legendre polynomials:
dm
u= P` (x).
dxm
To verify that, we start with Legendre’s equation
which is exactly Eq. (5.62). This therefore proves that the solutions are, indeed, deriva-
tives of the Legendre polynomials.
192
Associated Legendre functions
m2
" #
(1 − x )y − 2xy + `(` + 1) −
2 00 0
y = 0, (5.62)
1 − x2
is
dm
2 m/2
Pm
` (x) := (1 − x ) P` (x), (5.63)
dxm
which is called an associated Legendre function. Since P` is a polynomial
of degree `, these solutions only make sense for |m| 6 `. Moreover, since
Eq. (5.62) involves only m2 , Eq. (5.63) is also a solution for m < 0. That is,
` = P` . But please note that some sources, including Mathematica, adopt
P−m m
Many people call these functions the associated Legendre polynomials; but, as you
may have notice, if m is odd, these will not be polynomials. So the term “functions” is
a bit more precise.
Here is a list of the first few associated Legendre functions
P00 = 1,
√
P01 = x, P11 = − 1 − x2 ,
3x2 − 1 √
P02 = , P12 = −3x 1 − x2 , P22 = 3(1 − x2 ),
2
5x3 − 3x 3√
P03 = P13 = 1 − x2 (1 − 5x2 ), P23 = 15x(1 − x2 ), P33 = −15(1 − x2 )3/2 .
2 2
The associated Legendre functions are not mutually orthogonal. For instance, P11 is not
orthogonal to P22 . But some subsets turn out to be. For instance, one may show that
Z1
2(` + |m|)!
` (x)Pn (x) =
dx Pm δ`,n ,
m
(5.64)
(2` + 1)(` − |m|)!
−1
with the same m in both. I put |m| on the right-hand side to also contemplate the case
m < 0. Similarly, one may show that they also satisfy
Z1
Pm k
` P` (` + |m|)!
dx = δm,k . (5.65)
1 − x2 |m|(` − |m|)!
−1
That is, now with the same ` in both, but varying the upper index. This results only
holds for m, k , 0. If m = k = 0, the integral diverges.
193
Figure 5.4: Spherical coordinates.
194
And one has the following formulas for the gradient, divergence and Laplacian in
spherical coordinates:
∂f 1 ∂f 1 ∂f
∇ f = er + eθ + eφ , (5.68)
∂r r ∂θ r sin θ ∂φ
1 ∂(vr r2 ) 1 ∂(vθ sin θ) 1 ∂vφ
∇·v = + + , (5.69)
r 2 ∂r r sin θ ∂θ r sin θ ∂φ
1 ∂ 2∂f ∂ ∂f ∂2 f
! !
1 1
∇2 f = 2 r + 2 sin θ + (5.70)
r ∂r ∂r r sin θ ∂θ ∂θ r2 sin2 θ ∂φ2
Unfortunately I will not have time to properly derive these formulas. You may look,
for instance, at Cahill, Sec. 6.4.
The angular part of the Laplacian is very important and it is worth defining the
differential operator
1 ∂ ∂ 1 ∂2 f
" ! #
L̂ = −
2
sin θ + , (5.71)
sin θ ∂θ ∂θ sin2 θ ∂φ2
In quantum theory, L̂2 turns out to be associated with the angular momentum operator.
Proving this is a bit nasty; you will do it when you study Quantum Mechanics properly.
The idea is to start with the definition of the angular momentum vector, L = r × p, and
upgrade p to an operator, p = −i~∇. One may then verify that L · L = ~2 L̂2 , with L̂2
being the operator in (5.71). You can give it a go. It is a fun exercise. What it means is
that, except for a factor of ~2 , this operator L̂2 is nothing but the square of the angular
momentum. Notwithstanding this interpretation, introducing this separation between
an angular and a radial part turns out to also be quite convenient, as we will see.
∇2 f = −k2 f, (5.73)
195
for unknown functions R and Y. The important thing to notice is that L̂2 acts only on
the angular part, since it only contain derivatives with respect to θ and φ. Hence, we
find !
Y d 2 dR R
r − 2 L̂2 Y = −k2 RY.
r2 dr dr r
Multiplying both sides by r2 /(RY) leads to
!
1 d 2 dR 1
r + k2 r2 = L̂2 Y.
R dr dr Y
The left-hand side is now only a function of r, while the right-hand side is only a
function of θ and φ. This can thus only happen if both quantities are a constant. For
reasons that will become clear in the next section, we label this constant as `(` + 1).
That is, we write
Radial equation
Consider first the particular case where k = 0. That is, where Helmholtz’ equation
reduces to Laplace’s equation, ∇2 f = 0. The radial equation (5.76) becomes, in this
case, !
d 2 dR
r = `(` + 1)R.
dr dr
Let us try a solution of the form R(r) = rn , for some constant n. We then get R0 = nrn−1
and so ! !
d 2 dR d
r = nr n+1
= n(n + 1)rn .
dr dr dr
This will therefore be a solution, but only provided n(n + 1) = `(` + 1). There are two
possible solutions, befitting of a second order ODE: Either n = ` or n = −(` + 1). Thus,
for each value of `, the general solution will be
b
R` (r) = ar` + , (5.77)
r`+1
196
where a and b are constants. Since ` is a non-negative integer, the solution r` will be
regular at all points, while 1/r`+1 will diverge at r = 0.
In the case where k > 0, Eq. (5.76) becomes more complicated. The solution, in this
case, turns out to be a special set of functions, called the spherical Bessel functions.
These are a bit outside the scope of this course (you will learn about Bessel functions
next semester). So I won’t discuss it any further. Instead, we now turn to the angular
equation.
1 ∂ ∂Y 1 ∂2 Y
" ! #
L̂2 Y = − sin θ + = `(` + 1)Y.
sin θ ∂θ ∂θ sin2 θ ∂φ2
We then attempt a separation of variables again, now writing Y(θ, φ) = Θ(θ)Φ(φ). This
yields, after multiplying both sides by sin2 θ/Y,
sin θ ∂ ∂Θ 1 ∂2 Φ
!
sin θ + `(` + 1) sin2 θ = − .
Θ ∂θ ∂θ Φ ∂φ2
The left-hand side depends only on θ and the right-hand side only on φ. As a conse-
quence, each term must be a constant. Traditionally, we call this constant m2 . That is,
we write
sin θ ∂ ∂Θ
" ! #
sin θ + `(` + 1) sin2 θ = m2 (5.78)
Θ ∂θ ∂θ
1 ∂2 Φ
= −m2 (5.79)
Φ ∂φ2
Things are starting to improve. Eq. (5.79) can now be easily solved:
d2 Φ
= −m2 Φ → Φ = e±imφ .
dφ2
This looks just like the usual harmonic oscillator equation y00 = −ω2 y. There is, how-
ever, one fundamental difference: φ is an angle so, for physical reasons, the solution
must be periodic in φ, with period 2π. That is,
eim(φ+2π) = eimφ .
197
Consequently, we see that not all values of m solve Eq. (5.79); a solution will exist only
if m = 0, ±1, ±2, ±3, . . .. Thus, to summarize, the solution of Eq. (5.79) is
There should be two solutions, eimφ and e−imφ , since the ODE is 2nd order. But we are
already taking care of both by allowing m to run over the negatives as well.
Next we turn to Eq. (5.78) for Θ, which we rewrite as
∂ ∂Θ
! h #
sin θ sin θ + `(` + 1) sin2 θ − m2 Θ = 0. (5.81)
∂θ ∂θ
∂ 2 ∂Θ
! h #
sin θ
2
sin θ + `(` + 1) sin θ − m Θ = 0.
2 2
∂x ∂x
∂ ∂Θ m2 i
" # h
(1 − x2 ) + `(` + 1) − Θ = 0.
∂x ∂x 1 − x2
∂ ∂Θ
" #
(1 − x2 ) = (1 − x2 )Θ00 − 2xΘ0
∂x ∂x
Thus, what we find is, ta-da!, exactly the associated Legendre equation (5.62)
h m2 i
(1 − x2 )Θ00 − 2xΘ0 + `(` + 1) − Θ = 0. (5.82)
1 − x2
As we saw all the way back in Sec. 5.1, these solutions will be stable at x = ±1 only
when ` is an integer,
` = 0, 1, 2, . . .
Moreover, as we saw in Sec. 5.5, for each `, the only allowed values of m are integers
satisfying |m| 6 `. Hence, the solutions will be the associated Legendre functions
Pm` (x), defined in Eq. (5.63), with x = cos θ:
Θ(θ) = Pm
` (cos θ),
` = 0, 1, 2, 3, . . . , (5.83)
198
Combining this with the Φ solution, Eq. (5.80), we then finally obtain the general
family of linearly independent solutions of the angular equation,
Zπ Z2π
dθ sin θ dφ |Y(θ, φ)|2 = 1. (5.86)
0 0
Zπ Z2π
2(` + m)!
dθ sin θ dφ |Y(θ, φ)|2 = |A|2 (2π) = 1.
(2` + 1)(` − m)!
0 0
The sign of A is arbitrary and different sources use different conventions. I will adopt
here the “quantum convention” (because it is the convention used in quantum mechan-
ical applications): s
(2` + 1) (` − |m|)!
A = (−1)m .
4π (` + |m|)!
This is a good time to summarize what we have learned.
Spherical Harmonics
The general solution of the angular equation (5.75) are the Spherical Har-
monics s
(2` + 1) (` − |m|)! imφ m
Y` (θ, φ) = (−1)
m m
e P` (cos θ), (5.87)
4π (` + |m|)!
where
` = 0, 1, 2, 3, . . . , (5.88)
with eigenvalue `(` + 1). The shape of the first few harmonics is illustrated in
Fig. 5.5.
It is interesting to note how Y`m is defined by two indices, ` and m, while the eigenvalues
depend only on `. We say the eigenvalues of L̂2 are degenerate, which means each
199
eigenvalue is associated with more than one eigenfunction. For each given `, there
are in fact 2` + 1 allowed values of m: −`, −` + 1, . . . , ` − 1, `. Hence, we say that
the degeneracy of the eigenvalue `(` + 1) is 2` + 1 (“degeneracy’ is the number of
eigenfunctions associated to a certain eigenvalue).
The Spherical Harmonics form a basis for functions living on the unit sphere.
That is, functions f (θ, φ) which depend only on the angles in spherical coordinates.
The functions eimφ are orthogonal in the sense that
Z2π
0
ei(m−m )φ dφ = 2πδm,m0 .
0
If we combine this with the orthogonality (5.64) of the associated Legendre functions
and our choice of normalization (5.86), we then see that the Spherical Harmonics are
naturally orthonormal,
Z
0
dΩ (Y`m )∗ Y`m0 = δ`,`0 δm,m0 , (5.91)
where dΩ = sin θdθdφ is an element of solid angle. Hence, any function f (θ, φ) can be
expanded as
∞ X
X `
f (θ, φ) = c`,m Y`m (θ, φ), (5.92)
`=0 m=−`
with coefficients Z
c`,m = dΩ f (θ, φ)(Y`m )∗ . (5.93)
This is exactly the same logic as the Fourier business we started our course with.
200
Figure 5.5: Plots of the first few spherical harmonics. The colors distinguish the regions where
Re(Y`m ) > 0 (orange) and < 0 (blue).
201
5.8 The hydrogen atom
Lastly, we turn to what is one of the major applications of orthogonal polynomials:
the quantum properties of a hydrogen atom. We assume that the proton is very heavy
and therefore does not participate in the dynamics. The problem is then reduced to
modeling an electron, of mass m, subject to the Coulomb potential
e2
V(r) = − , (5.94)
4π0 r
where e is the electron charge, 0 is the vacuum permittivity and r = |r| is the po-
sition, measured with respect to the nucleus. Schrödinger’s equation for the electron
wavefunction is given by
∂Ψ
i~ = ĤΨ, (5.95)
∂t
where
p̂2
Ĥ = + V(r), (5.96)
2m
is the Hamiltonian. All quantum problems we treated before were in 1D, while here
one must naturally work in 3D. In this case the momentum operator p̂ will be a vector
∂Ψ ~2
i~ = − ∇2 Ψ + V(r)Ψ, (5.98)
∂t 2m
which is a partial differential equation for Ψ(r, t).
As always, we start by solving it using separation of variables in time and space:
Ψ(r, t) = e−iEt/~ ψ(r). This transforms (5.98) into an eigenstuff equation Ĥψ = Eψ. Or,
more explicitly,
~2 2
− ∇ ψ + V(r)ψ = Eψ. (5.99)
2m
The goal is then to simultaneously solve this for the eigenfunctions ψ(r) and the eigen-
values E.
202
The situation here is very similar to that of Sec. 5.6. We first write the Laplacian as
in Eq. (5.72), with the operator L̂2 defined in Eq. (5.71). Eq. (5.99) then becomes
~2 1 ∂ 2 ∂ψ
( ! )
1 2
− r − 2 L̂ ψ + V(r)ψ = Eψ. (5.100)
2m r2 ∂r ∂r r
We multiply both sides by −2mr2 /~2 and write ψ(r, θ, φ) = R(r)Y(θ, φ):
2mr2 h
!
d 2 dR i
Y r − RL̂2 Y − 2 V(r) − E RY = 0.
dr dr ~
Finally, we divide both sides by RY:
2mr2 h
!
1 d 2 dR i
r − 2 V(r) − E = L̂2 Y. (5.101)
R dr dr ~
The left-hand side is now only a function of r, while the right-hand side is only a
function of θ and φ. Whence, they must each be a constant. Just like we did in (5.75),
we write this constant as `(` + 1). That is,
~2 d 2 dR
! h i
− 2
r − Veff (r) − E R = 0. (5.105)
2mr dr dr
This means that, in practice, what the electron feels is not the Coulomb potential, but
Veff . This new term is called a centrifugal potential. It is positive and therefore rep-
resents a repulsion: it tends to throw the particle outward, away from the center. This
203
����
Figure 5.6: Pictorial illustration of the effective potential (5.104). The Coulomb potential is
negative (blue-dashed) while the centrifugal term is positive (green-dotted). One
is thus attractive and the other repulsive. As a consequence, we get the red curve,
2
which has a minimum at rmin = a0 `(` + 1), where a0 = 4π 0~
me2
is Bohr’s radius.
is the strange way in which angular momentum manifests itself in quantum theory:
the eigenvalue ` represent the possible eigenmodes of the angular momentum and, the
higher the value of `, the stronger is the push outward.
This centrifugal term competes with the Coulomb potential (5.94), which is nega-
tive and hence always attractive. The combination leads to something like the potential
shown in Fig. 5.6. One curve goes down, the other goes up. When we mix the two,
we get a potential that eventually has a minimum somewhere. The position of the
minimum can be found by computing dVeff /dr = 0 and reads
where
4π0 ~2
a0 = ' 5.29 × 10−11 m, (5.107)
me2
is known as Bohr’s radius. It can be viewed as a kind of typical distance scale for atomic
stuff. We thus see that the minimum occurs further from the nucleus, the larger is the
angular momentum `. The existence of a minimum is important because it suggests that
there may be stable configurations where the smallest possible energy is “somewhere
in between”. That is, where the electron neither collapses toward the proton nor flies
away to infinity. We call these configurations bound states.
Returning to Eq. (5.105), define u = rR. One may verify that
d 2 dR d2 u
r =r 2,
dr dr dr
so that Eq. (5.105) becomes
~2 d2 u e2 ~2 `(` + 1)
" #
− + − + − E u = 0. (5.108)
2m dr2 4π0 r 2m r2
204
We now have to start cleaning this up. Define
√
−2mE
κ= , (5.109)
~
1 me2
η= = , (5.110)
κa0 4π0 ~2 κ
ρ = 2κr. (5.111)
The constant κ has units of 1/length (wavenumber), so the new variable ρ is dimension-
less (as is η). I will skip a bit of annoying calculations here. But you may check that,
in terms of these new quantities, Eq. (5.108) becomes
d2 u 1 η `(` + 1)
" #
= − + u. (5.112)
dρ2 4 ρ ρ2
This is very nice because now everything is dimensionless. Note that this is still an
eigenequation, so the energies E are also unknowns. In this new notation, they are
hidden away in η. That is to say, Eq. (5.112) is really an eigenequation for determining
both u and η.
At this point things start to get a little nasty. We are interested in bound solutions.
That is, solutions which remain finite as u → ∞ and u → 0. This is where quantization
comes from. Not all values of η will lead to regular solutions. Most, actually, will not.
The situation is very similar to what happened with the Legendre polynomials. We
started the chapter with Eq. (5.1), where n could be any constant. But we then saw
that most solutions were not well behaved and only if n was an integer would we get
something regular. The same thing will also happen here, although seeing it is a bit
trickier. What one has to two is first make another change of variables to2
for a new unknown function v(ρ). I will leave for you then the boring task of checking
that v(ρ) satisfies
ρv00 + (2` + 2 − ρ)v0 + (η − ` − 1)v = 0. (5.114)
This equation will be associated with the Laguerre polynomials. In fact, in the problem
set, you saw that the associated Laguerre polynomials, defined by the Rodrigues
formula,
dk e x d j j −x
Lkj = (−1)k k L j+k , L j (x) = (x e ), (5.115)
dx j! dx j
2 This may seem quite magical. But there is actually a logic to it. The idea is to strip away the asymptotic
behavior of the solution, when ρ → ∞ and ρ → 0. For instance, if ρ is very large Eq. (5.112) is approximated
by u00 = u, whose solutions must thus be eρ or e−ρ . Since eρ would be divergent, we can then conclude that at
very large distances the solution must decay as u ∼ e−ρ . Similarly, when ρ → 0 the dominant term in (5.112)
will be the centrifugal one and the solution which does not explode will be u ∼ ρ`+1 . What we are doing in
Eq. (5.113) is essentially stripping away these two asymptotic behaviors, in the hope that the equation for v
will turn out to be more manageable.
205
were a solution of the ODE
xy00 + (k + 1 − x)y0 + jy = 0, (5.116)
but only provided k and j were integers.
Eq. (5.114) is of this form, with
k = 2` + 1, j = η − ` − 1.
Thus, regular solutions will only exist provided
j = 0, 1, 2, 3, . . . ,
This quantizes the allowed values of η. It says, essentially, that
η ≡ n = 1, 2, 3, . . . ,
i.e, any positive integer n. Moreover, for each given n, it also imposes that ` must be
constrained to
` = 0, 1, 2, . . . , n − 1.
The regular solutions of (5.112) will thus be, up to a constant,
u(ρ) = ρ`+1 e−ρ Ln−`−1
2`+1
(ρ). (5.117)
Please don’t get me wrong. I am not being rigorous at all here. To really show that
these are the only regular solutions and so on, is a bit harder and, unfortunately, we
will not have time to do it. What I am doing here is merely to try to show you how
these orthogonal polynomials may emerge in this quantum business.
Bohr’s formula
The condition that η, defined in Eq. (5.110), must be a positive integer, im-
plies that √
−2mE 1
κ= = ,
~ na0
which therefore determines the allowed energies:
~2
En = − , n = 1, 2, 3, . . . . (5.118)
2ma20 n2
This is called Bohr’s formula for the energy levels of the Hydrogen atom.
From a historical perspective, it is perhaps the most important result in quan-
tum mechanics. It was figured out by Bohr in 1912, before quantum theory, and
matches very well with experiments in spectroscopy. When Schrödinger first
proposed his equation in 1926, the first thing he did was to apply it to the Hy-
drogen atom, exactly like we did above. And to find Bohr’s formula naturally
emerge from the formalism was seen, by him, as a strong confirmation that his
ideas were correct.
The smallest (most negative) energy is called the ground state and has the
206
�=� -����
�=� -����
�=� -���
�=� -����
Figure 5.7: First few energy levels of the Hydrogen atom, in electron-Volt.
value
~2
E1 = − ' −13.6 eV. (5.119)
2ma20
The other levels, n = 2, 3, . . . are called excited states and all have energies
larger than E1 , but still negative (Fig. 5.7). States with energy above zero are
not bound states. That is, they do not describe the electron being bound to the
proton.
Here the factor of r2 appears because we are working in spherical coordinates, so that
the element of integration is d3 r = r2 sin θ drdθdφ. It is a little bit nasty to find an
207
Figure 5.8: Pretty plots of the Hydrogen atom wavefunctions |ψn`m |2 defined in Eq. (5.122), for
a bunch of values of (n, `, m).
s
!3 !`
2 (n − ` − 1)! 2r
Rn` (r) = e−r/na0 Ln−`−1
2`+1
(2r/na0 ). (5.121)
na0 2n(n + `)! na0
Yeah, I told you: it’s nasty. But nasty as it may be, we now have the full solution of
Schrödinger’s equation for the Hydrogen atom:
where Y`m are the spherical harmonics in Eq. (5.87). Some pretty plots of |ψn`m |2 are
shown in Fig. 5.8.
The states with different values of ` usually receive funny names, which are due to
historical reasons. For instance, ` = 0 is called s, ` = 1 is called p, ` = 2 is called d
and ` = 3 is called f . Thus, for instance, all eigenstates with n = 2, ` = 1 are called 2p
states. This may be reminding you of chemistry class, and that’s exactly the point. In
208
chemistry you probably learned how to build the periodic table by filling the electrons
into orbitals, which were labeled as
209