A Guide To Distributions Theory and Fourier Transforms
A Guide To Distributions Theory and Fourier Transforms
A Guide to
Distribution Theory
and Fourier
Transfonlls
CRC PRESS
Boca Raton Ann Arbor London Tokyo
Library of Congress Cataloging-in-Publication Data
Strichartz, Robert.
A guide to distribution theory and Fourier transforms I by Robert Strichartz.
p. cm.
Includes bibliographical references and index.
ISBN 0-8493-8273-4
I. Theory of distributions (Functional analysis). 2. Fourier analysis. I. Title.
QA324.S77 1993
515'.782-dc20 93-36911
CIP
This book contains information obtained from authentic and highly regarded sources.
Reprinted material is quoted with permission, and sources are indicated. A wide variety
of references are listed. Reasonable efforts have been made to publish reliable data and
information, but the author and the publisher cannot assume responsibility for the validity
of all materials or for the consequences of their use.
Neither this book nor any part may be reproduced or transmitted in any form or by any
means, electronic or mechanical, including photocopying, microfilming, and recording,
or by any information storage or retrieval system, without prior permission in writing
from the publisher.
CRC Press, Inc.'s consent does not extend to copying for general distribution, for pro-
motion, for creating new works, or for resale. Specific permission must be obtained in
writing from CRC Press for such copying.
Direct all inquiries to CRC Press, Inc., 2000 Corporate Blvd., N.W., Boca Raton, Florida,
3111L.
Preface vii
1 What are Distributions? 1
1.1 Generalized functions and test functions 1
1.2 Examples of distributions 5
1.3 What good are distributions? 7
1.4 Problems 9
2 The Calculus of Distributions 11
2.1 Functions as distributions 11
2.2 Operations on distributions 12
2.3 Adjoint identities 17
2.4 Consistency of derivatives 19
2.5 Distributional solutions of differential equations 21
2.6 Problems 23
3 Fourier Transforms 26
3.1 From Fourier series to Fourier integrals 26
3.2 The Schwartz class S 29
3.3 Properties of the Fourier transform on S 30
3.4 The Fourier inversion formula on S 35
3.5 The Fourier transform of a Gaussian 38
3.6 Problems 41
4 Fourier Transforms of Tempered Distributions 43
4.1 The definitions 43
4.2 Examples 46
4.3 Convolutions with tempered distributions 52
4.4 Problems 54
v
vi Contents
Distribution theory was one of the two great revolutions in mathematical anal-
ysis in the 20th century. It can be thought of as the completion of differential
calculus, just as the other great revolution, measure theory (or Lebesgue integra-
tion theory), can be thought of as the completion of integral calculus. There are
many parallels between the two revolutions. Both were created by young, highly
individualistic French mathematicians (Henri Lebesgue and Laurent Schwartz).
Both were rapidly assimilated by the mathematical community, and opened up
new worlds of mathematical development. Both forced a complete rethinking
of all mathematical analysis that had come before, and basically altered the
nature of the questions that mathematical analysts asked. (This is the reason
I feel justified in using the word "revolution" to describe them.) But there
are also differences. When Lebesgue introduced measure theory (circa 1903),
it almost came like a bolt from the blue. Although the older integration the-
ory of Riemann was incomplete-there were many functions that did not have
integrals-it was almost impossible to detect this incompleteness from within,
because the non-integrable functions really appeared to have no well-defined in-
tegral. As evidence that the mathematical community felt perfectly comfortable
with Riemann's integration theory, one can look at Hilbert's famous list (dating
to 1900) of 23 unsolved problems that he thought would shape the direction of
mathematical research in the 20th century. Nowhere is there a hint that com-
pleting integration theory was a worthwhile goal. On the other hand, a number
of his problems do foreshadow the developments that led to distribution theory
(circa 1945). When Laurent Schwartz came out with his theory, he addressed
problems that were of current interest, and he was able to replace a number of
more complicated theories that had been developed earlier in an attempt to deal
with the same issues.
From the point of view of this work, the most important difference is that in
retrospect, measure theory still looks hard, but distribution theory looks easy.
Because it is relatively easy, distribution theory should be accessible to a wide
audience, including users of mathematics and mathematicians who specialize in
other fields. The techniques of distribution theory can be used, confidently and
vii
viii Preface
ysis will be able to get something out of this book, but will have to accept that
there will be a few mystifying passages. A solid background in multidimen-
sional calculus is essential, however, especially in part II.
Recently, when I was shopping at one of my favorite markets, I met a graduate
of Cornell (who had not been in any of my courses). He asked me what I was
doing, and when I said I was writing this book, he asked sarcastically "do you
guys enjoy writing them as much as we enjoy reading them?" I don't know
what other books he had in mind, but in this case I can say quite honestly that
I very much enjoyed writing it. I hope you enjoy reading it.
Acknowledgments
I am grateful to John Hubbard, Steve Krantz, and Wayne Yuhasz for encouraging
me to write this book, and to June Meyermann for the excellent job of typesetting
the book in ~Tp'.
Ithaca, NY
October 1993
What are Distributions?
! f(x)<p(x) dx
where <p(x) depends on the nature of the thermometer and where you place
it-<p( x) will tend to be "concentrated" near the location of the thermometer
bulb and will be nearly zero once you are sufficiently far away from the bulb.
To say this is an "average" is to require
<p( x) ~ 0 everywhere, and
However, do not let these conditions distract you. With two thermometers you
can measure
! f(x)<pt(x)dx and
1
2 What are Distributions?
and by subtracting you can deduce the value of f f(x)[ipt (x) - ip2(X)] dx. Note
that ipt(x) - ip2(X) is no longer nonnegative. By doing more arithmetic you
can even compute f f(x)(atipt (x) - a2ip2(x)) dx for constants at and a2, and
atipt(x) - a2ip2(x) may have any finite value for its integral.
The above discussion is meant to convince you that it is often more meaningful
physically to discuss quantities like f f(x)ip(x) dx than the value of f at a
particular point x. The secret of successful mathematics is to eliminate all
unnecessary and irrelevant information-a mathematician would not ask what
color is the thermometer (neither would an engineer, I hope). Since we have
decided that the value of f at x is essentially impossible to measure, let's stop
requiring our functions to have a value at x. That means we are considering a
larger class of objects. Call them generalized functions. What we will require
of a generalized function is that something akin to f f(x)ip(x) dx exist for a
suitable choice of averaging functions ip (call them test functions). Let's write
(f, ip) for this something. It should be a real number (or a complex number if
we wish to consider complex-valued test functions and generalized functions).
What other properties do we want? Let's recall some arithmetic we did before,
namely
Notice we have tacitly assumed that if ipt, ip2 are test functions tHen at ipt +a2ip2
is also a test function. I hope these conditions look familiar to you-if not,
please read the introductory chapter of any book on linear algebra.
You have almost seen the entire definition of generalized functions. All you
are lacking is a description of what constitutes a test function and one technical
hypothesis of continuity. Do not worry about continuity-it will always be
satisfied by anything you can construct (wise-guys who like using the axiom of
choice will have to worry about it, along with wolves under the bed, etc).
So, now, what are the test functions? There are actually many possible
choices for the collection of test functions, leading to many different theories of
generalized functions. I will describe the space called D, leading to the theory
of distributions. Later we will meet other spaces of test functions.
The underlying point set will be an n-variable space jRn (points x stand for
»
x = (Xt, ... , x n or even a subset n c jRn that is open. Recall that this means
Generalized functions and test junctions 3
is in 'lJ
o
is not in 'lJ
o
is not in 'lJ
o
FIGURE 1.1
The second example fails because it does not vanish near the boundary point 0,
and the third example fails because it is not differentiable at three points. To
actually write down a formula for a function in V is more difficult. Notice that
no analytic function (other than <p == 0) can be in V because of the vanishing
requirement. Thus any formula for <p must be given "in pieces." For example,
in ~l
_1/x2
x>O
1j;(x) = {
~
xS;O
has continuous derivatives of all order:
( ~)
dx
k -1/x 2 _
e -
polynomial in x -1/x2
polynomial in x e
4 What are Distributions?
and as x ---+ 0 this approaches zero since the zero of e- t / x2 beats out the pole
1. ,.ID x • Thus (f")(x)
of potynOlrua .,..
= 1j;(x)1j;(1 - x) has continuous derivatives of all
orders (we abbreviate this by saying cp is COO) and vanishes outside 0 < x < 1,
so cp E V(JR'); in fact, cp E V(a < x < b) provided a < 0 and b > 1 (why not
a ::; 0 and b 2': 1?). Once you have one example you can manufacture more by
(Yes, Virginia, a, CPt + a2CP2 is in V(O) if CPt and CP2 are in V(O).) By c6ntin-
uous I mean that if CPt is close enough to cP then (f, cp,) is close to (f, cp)-
the exact definition can wait until later. Continuity has an intuitive physical
interpretation-you want to be sure that different thermometers give approxi-
mately the same reading provided you control the manufacturing process ade-
quately. Put another way, when you repeat an experiment you do not want to
get a different answer because small experimental errors get magnified. Now,
whereas discontinuous functions abound, linear functionals all tend to be con-
tinuous. This happy fact deserves a bit of explanation. Fix cp and CPt and call
the difference CPt - cp = CP2· Then CPt = cp+ CP2· Now perhaps (f, cp) and (f, CPt)
are far apart. So what? Move CPt closer to cP by considering cP + tCP2 and let t
get small. Then (f, cP + tCP2) = (f, cp) + t(f, CP2) by linearity, and as t gets small
this gets close to (f, cp). This does not constitute a proof of continuity, since
the definition requires more "uniformity," but it should indicate that a certain
amount of continuity is built into linearity. At any rate, all linear functionals on
V(O) you will ever encounter will be continuous.
Examples of distributions 5
kl2
or
FIGURE 1.2
6 What are Distributions?
- - - area = I
-Uk Uk
FIGURE 1.3
for suitable choice of fk' but this is nonsense, showing the futility of pointwise \
thinking.) ,
Now suppose we first differentiate fk and then let k -+ oo?
if .fie is thenfk'is
-Uk 11k
FIGURE 1.4
What good are distributions? 7
and
(f 'rn)""CP(-f,J-cp(f,.-)
k,.,- "" 11k
(the points - 2~ and 2~ are the midpoints of the intervals and the factor (1/k)-1
= k is the area) which approaches -cp'(O) as k -+ 00. We obtain the same
answer formally by integrating by parts:
f} 2u(x, t) = e f}2U(X, t)
f}t 2 f}x 2
has a solution u(x, t) = f(x - kt) for any function of one variable f, which
has the physical interpretation of a "traveling wave" with "shape" f(x) moving
at velocity k.
f(x) = u(x, 0)
f(x - k) = u(x, 1)
FIGURE 1.5
f} 2u 2
AU = ! l 2
L..l
+ f}!l2u \·n TT1l2
IN.
uX\ uX 2
and
2
f} u f} u 2
f} u 2
_+_+_ in ~3.
f}xy f}x~ f}x~
in jR2, satisfies the vibrating string equation. But U(Xl' X2) = log(xi + x~) as a
distribution in jR2 does not satisfy the Laplace equation. In fact
a2 u a2 u
axl2 +aX 22 =c6
(and in jR3 similarly
a2V a2v a2v
aXl2 + aX 22 + ax32 = c l 6
for V(Xl' X2, X3) = (xi + x~ + X~)-1/2) for certain constants c and Cl. These
facts are not just curiosities-they are useful in solving Poisson's equation
as we shall see.
1.4 Problems
1. Let H be the distribution in V' (jRl) defined by the Heaviside function
H(x) = {Io ~f
If
x> 0
x:::; O.
Show that if hn(x) are differentiable functions such that f hn(x)cp(x) dx
---+ (H, cp) as n ---+ 00 for all cp E V, then
is a distribution.
7. Prove that ga does not depend on a, and the resulting distribution may
also be gi ven by
lim
a-+O+ }
r-
-00
a
+ 1
a
00
cp(x) dx.
X
8. Suppose I
is a distribution on Jm.1. Show that (F, cp) = (f, cpy), for cp E
V(Jm.2 ) ,where cpy (x) = cp( x, y) (here y is any fixed value), defines a
distribution on Jm.2.
9. Suppose I is a distribution on Jm.1. Show that (G, cp) = I~oo (f, Cpy) dy for
cp E V(Jm.2 ) defines a distribution on Jm.2. Is G the same as F in problem 8?
10. Show that
n=1
I (x) = {O1 ~f x
If x
#
=0
0
J
then I{x)cp{x) dx = 0 for all test function, so I and the zero function define
the same distribution (call it the zero distribution). Intuitively this makes sense-
a function that vanishes everywhere except at the origin must also vanish at the
origin. Anything else is an experimental error. Already you see how distribution
theory forces you to overlook useless distinctions! But if the functions 11 and
h are "really" different, say It > h on some interval, then the distributions It
11
12 The Calculus of Distributions
and h are really different: (JI, 'P) > (h, 'P) if'P is a nonnegative test function
vanishing outside the interval.
Now for some notation. A function f(x) defined on 0 for which I f(x)'P(x)dx
is absolutely convergent for every 'P E D(O) is called locally integrable, de-
noted f E Lloc(O). A criterion for local integrability is the finiteness of the
integral IB If(x)1 dx over all sets B ~ 0 that are bounded and stay away
from the boundary of O. To explain this terminology let me mention that an
integrable function on OU E LI(O)) is one for which InIf(x)1 dx is finite.
Clearly an integrable function is locally integrable but not conversely. For ex-
ample f(x) == 1 is locally integrable but not integrable on jRn, and f(x) = ;
is locally integrable but not integrable on 0 < x < 00.
For every locally integrable function f we associate a distribution, also de-
noted f, given by U,'P) = I f(x)'P(x)dx. We say that two locally integrable
functions are equivalent if as distributions they are equal. Thus by ignoring the
distinction between equivalent functions, we can regard the locally integrable
functions as a subset of the distributions, Lloc(O) ~ D'(O). This makes precise
the intuitive statement that the distributions are a set of objects larger than the
set of functions, justifying the term "generalized functions."
There are many interesting functions that are not locally integrable, for in-
stance 1/ x and 1/ Ixl on jRI (you should not be upset that I said recently that
l/x is locally integrable on 0 < x < 00; the concept of local integrability
depends on the set 0 being considered-O determines what "local" means). In
the problems you have already encountered distributions associated with these
functions; for example, a distribution satisfying
(J,'P) = ! 'P(x)
Txfdx,
whenever 'P(O) = O. However, this is an entirely different construction. You
should never expect that properties or operations concerning the function carry
over to the distribution, unless the function is locally integrable. For instance,
you have seen that more than one distribution corresponds to l/lxl-in fact, it is
impossible to single out one as more "natural" than another. Also, although the
function 1/ Ixl is nonnegative, none of the associated distributions are nonnega-
tive (a distribution f is nonnegative if (j, 'P) 2: 0 for every test function 'P that
satisfies 'P(x) 2: 0 for all x E 0). In contrast, a nonnegative locally integrable
function is nonnegative as a distribution. (Exercise: Verify this.) Thus you see
that this is an aspect of distribution theory where it is very easy to make mistakes.
on test functions may be extended to distributions. Now why did I say "test
functions" rather then "locally integrable functions"? Simply because there are
more things you can do with them: for example, differentiation. How does
the extension to distributions work? There are essentially two ways to proceed.
Both always give the same result.
At this point I will make the convention that operation means linear operator
(or linear transform), that is, for any cP E V, Tcp E V and
At times we will want to consider operations Tcp that may not yield functions
in V, but we will always want linearity to hold.
The first way to proceed is to approximate an arbitrary distribution by test
functions. That this is always possible is a remarkable and basic fact of the
theory. We say that a sequence of distributions 11,12, ... converges to the
distribution I if the sequence (fn, cp) converges to the number (f, cp) for all
test functions. Write this In -+ I, or say simply that the In approximate f.
Incidentally, you can use the limiting process to construct distributions. If Un}
is a sequence of distributions for which the limit as n -+ 00 of (fn, cp) exists
for all test functions cp, then (f, cp) = limn--+oo (fn, cp) defines a distribution and
of course In -+ f.
THEOREM 2.2.1
Given any distribution IE V'(O), there exists a sequence {CPn} of test functions
such that CPn -+ I as distributions.
lim jCPn(X
n->oo
+ y)cp(x) dx.
If I is the Dirac O'-function I = 0', (0', cp) = cp(O), we may take the sequence
CPn to look like
-lin lin
FIGURE 2.1
lim j CPn(x
n->oo
+ y)cp(x) dx = cp( -y).
Thus (TO', cp) = cp( -y). (This is sometimes called the O'-function at -y.)
In general we can do a similar simplification by making a change of variable
x-+x-y:
= n->oo
lim J CPn(x)cp(x - y) dx
Next example: T = d/dx (for simplicity assume n = 1). This is the infinites-
imal version of translation. If we write Ty cp( x) = cp( x + y) for translation and
Icp(x) = cp(x) for the identity, then
d . 1
- = hm-(Ty - 1).
dx y ..... Oy
So we expect to have
Since
But
(notice that this defines a distribution since (d/ dx)cp is a test function whenever
cp is).
Let's do that again more directly. Let CPn -4 / , CPn E V(JRI ). Then ix CPn -4
d~ /, meaning
(there are no boundary tenns because cp(x) = 0 outside a bounded set). Substi-
tuting in we obtain
Again it is instructive to look at the special case f = b. Then we have CPn --4 b
where
({In
f ({In (x)d.x = 1
({In (0):= n
FIGURE 2.2
! T'1jJ(x)cp(x)dx = ! '1jJ(x)Scp(x)dx
for all cp, '1jJ E V. Here T is the operator we are interested in, and S is an
operator that makes the identity true (in other words, there is in general no
recipe for S in terms of T). Suppose we are lucky enough to discover an
adjoint identity involving T. Then we may simply define (T f, cp) = (J, Scp).
The adjoint identity says this is true if f E V, and since any distribution f is
the limit of test functions, CPn --+ f, CPn E V, and we have (Tcpn, cp) = (CPn, Scp)
by the adjoint identity, we obtain
= (J, Scp).
by the first definition. This also shows that the answer does not depend on the
choice of the approximating sequence, something that was not obvious before.
In most cases we will use this second definition and completely by-pass the
first definition. But that means we first have to do the work of finding the adjoint
identity. Another point we must bear in mind is that we must have Scp E V
whenever cP E V, otherwise (J, Scp) may not be defined.
Now let us look at the two previous examples from this point of view. In both
cases the final answer gives us the adjoint identity if we specialize to f E V :
! Ty'1jJ(X)CP(x) dx = ! '1jJ(X)Lycp(X) dx
and
Of course both of these are easy to verify directly, the first by the change of
I Note that in Hilbert space theory the adjoint identity involves complex conjugates, while in
distribution theory we do not take complex conjugates even when we deal with complex-valued
functions.
18 The Calculus of Distributions
and
for any test function by the integration-by-parts formula in the xk-integral. How-
ever, the distribution (8/8xk)J is defined by
THEOREM 2.4.1
Let J(x) E Lloc (lRn) and let g( x) as above exist and be continuous except as a
single point y, and let g E LlocCllRn) (define it arbitrarily at the point y). Then,
if n ~ 2 the distribution derivative 8 J /8Xk equals g, while if n = 1 this is true
if, in addition, J is continuous at y.
We have cut away a neighborhood of the exceptional point. Now we can apply
integration by parts:
and
This time there are boundary terms, but they appear with opposite sign. Adding,
we obtain
Now let € -+ O. The first two terms approach j(O)cp(O) - j(O)cp(O) = 0 because
j is continuous. Thus
= I: g(x)cp(x) dx
Now for every X2, except X2 = 0, j(XI' X2) is continuously differentiable in XI,
so we integrate by parts:
Putting this back in the iterated integral gives what we want since the single
point X2 = 0 does not contribute to the integral.
Distributional solutions of differential equations 2I
and
so
8(y,z) = ( 1
dx dt = 2k dy dz.
8(x, t)
Then
8 8 8 8 8 8
-
8x
= 8y
- +8z
- and - = - k - +k-
8t 8y 8z
so
Thus
We claim the z integration already produces zero, for all y. To see this note that
r 8y8z
ia
82cp
b
(y, z) dz
8cp 8cp
= 8y (y, b) - 8y (y, a).
Thus
1 00
-00
82cp
-88 (y,z)dz = 0
y z
since cp and hence 8cp/8y vanishes outside a bounded set. Thus u(x, t) =
I(x - kt) is a weak solution.
In contrast, let's see if log(x 2+y2) is a weak solution to the Laplace equation
(::2 + ::2)
( U, cp ) = 0
for all test functions cpo It will simplify matters if we pass to polar coordinates,
in which case
82 82 8218182
-+-=-+--+--
8x 2 8y2 8r2 r 8r r2 80 2
and dxdy = rdrdO. Then for u = logx 2 +y2 = logr2 the question boils down
to
r21r
io
roo logr 2((82
io
18 182)
8r2 + ;- 8r + r2 802 cp(r,O)
)
r dr dO = o?
To avoid the singularity of u at the origin we extend the r-integration from E to
00 for E > 0 and let E --+ O. Before letting E --+ 0 we integrate by parts to put
all the derivatives on u. Since u is harmonic away from the origin we expect
to get zero from the integral term, but the boundary terms will not be zero and
they will determine what happens. So let's compute.
Finally
Now
8 2
and -logr2 =-
8r r
so adding everything up we obtain
= 121r 1 00
( _ ~ + ~) <p(r, 0) dr dO
+ 121r ( _ log f2 + log f2 + 2)<p( f, 0) dO
r 21r 8<p
+ 10 (-d Ogf2) 8r (f, 0) dO.
The first term is zero, as we expected, so
Since <p is continuous, <p( f, 0) approaches the value of <p at the origin as
f ---+ O. Thus the first term approaches 411"(0', <p). In the second term 8<p/8r
remains bounded while flog f2 ---+ 0 as f ---+ 0 so the limit is zero. Thus
~ log(x 2 + y2) = 411"0', so log(x 2 + y2) is not a weak solution of ~u = O.
Actually the above computation is even more significant, for it will enable us
to solve the equation ~u = f for any function f. We will return to this later.
2.6 Problems
1. Extend to distributions the dilations dr<p(x) = <p(rxt, rX2, ... , rXn)' (Hint:
J(dr1/J(x))<p(x) dx = r- n J 1/J(x)dr -t<p(x) dx).) Show that f(x) = Ixlt
24 The Calculus of Distributions
1 X
Ixl
dx +r <p(x) - <p(0) dx
1LI Ixl
l
(the integral may be taken over any interval of length 27r ~ince both f(x) and
e- ikx are periodic) and satisfy Parseval's identity
Perhaps you are more familiar with these formulas in terms of sines and
cosines:
It is easy to pass back and forth from one to the other using the relation eikx =
cos kx + i sin kx. Although the second form is a bit more convenient for dealing
with real-valued functions (the coefficients bk and Ck must be real for f(x) to be
26
From Fourier series to Fourier integrals 27
real, whereas for the coefficients ak the condition is a_k = ak), the exponential
form is a lot more simple when it comes to dealing with derivatives.
Returning to the exponential form, we can obtain a Fourier series expansion
for functions periodic of arbitrary period T by changing variable. Indeed, if
f(x + T) = f(x) then setting
we have
-00
ak = 21111" . dx
F(x)e- tkX
11" -11"
Substituting
-00
passing to the limit let's change notation. Let ~k = 21rk/T and let g(~k) =
(T/21r)ak. Then we may rewrite the equations as
1 jT/2 .
g(~k) = -2 f(x)e-'X~k dx,
1r -T/2
f(x) '" f
-00
g(~k)eixek ~ = f
-00
g(~k)eix~k (~k - ~k-d,
Written in this way the sums look like approximations to integrals. Notice
that the grid of points {~d gets finer as T --+ 00, so formally passing to the
limit we obtain
where I have written '" instead of = since we have not justified any of the
above steps. But it turns out that we can write = if we impose some conditions
on f. The above formulas are referred to as the Fourier transform, the Fourier
inversion formula, and the Plancherel formula, although there is some dispute
as to which name goes with which formula.
The analogy with Fourier series is clear. Here we have f(x) represented as
an integral of the "simple" functions eixf., ~ E a
instead of a sum of "simple"
functions eikx , k E Z, and the weighting factor g(~) is given by integrating
f(x) against the complex conjugate of the "simple" function eix~ just as the
coefficient ak was given, only now the integral extends over the whole line
instead of over a period. The third formula expresses the mean-square of f(x)
in terms of the mean-square of the weighting factors g(~).
Notice the symmetry of the first two formulas (in contrast to the asymmetry
of Fourier series). We could also regard g(~) as the given function, the second
formula defining a weighting factor f(x) and the first giving g(~) as an integral
of "simple" functions e-ix~; here ~ is the variable and x is a parameter indexing
the "simple" functions.
Let us take this point of view and simultaneously interchange the names of
some of the variables.
The Schwartz class S 29
and
Warning: There is no general agreement on whether to put the minus sign with
j or f, or on what to do with the 1/21r. Sometimes the 1/21r is put with F- I ,
sometimes it is split up so that a factor of 1/ V2ii is put with both F and F- I ,
and sometimes the 211" crops up in the exponential (f e27rixe f(x) dx), in which
case it disappears altogether as a factor. Therefore, in using Fourier transform
formulas from different sources, always check which definition is being used.
This is a great nuisance-for me as well as for you.
Returning now to our Fourier inversion formula FF- I f = f (or F- IF f =
f; they are equivalent if you make a change of variable), we want to know for
which functions f it is valid. In the case of Fourier series there was a smoothness
condition (differentiability) on f for f(x) = ~akeikx to hold. In this case
we will need more, for in order for the improper integral f~oo f(x)e ike dx to
exist we must have f(x) tending to zero sufficiently rapidly. Thus we need a
combination of smoothness and decay at infinity. We now tum to a discussion
of a class of functions that has these properties (actually it has much more than
is needed for the Fourier inversion theorem).
note that any derivative is a polynomial times e- 1xI2 , and e- 1x12 decreases fast
enough as x -+ 00 that it beats out the growth of any polynomial.)
We need the following elementary properties of the class S(]Rn):
c 1
00
(1 + r)-n-lrn-I dr < 00
j(~) = f f(x)eix.~ dx
J]Rn
where x,~ E]Rn ,x· ~ = xl6 + ... + xn~n and
1. Translation:
Tyf(x) = f(x + y)
F(Tyf)(~) = f f(x + y)eix.f. dx
Ixw.n
= r
}IJRn
f(x)ei(x-y).f. dx
= J f(x)eix.(f,+y) dx
= Ff(~ + y) = TyFf(x).
3. Differentiation:
= - Jf(x)~(eiX.f.)
8Xk dx
= -i~k J f(x)eix.f. dx
= -i~kFf(~)
where we have integrated by parts in the Xk variable (the boundary terms are
zero because f is rapidly decreasing).
32 Fourier Transforms
F (p (:X) f) = p(-iOFf(~)
for any polynomial p, where p (a/ax) means we replace Xk by a / aXk in the
polynomial.
= ! f(x)~eik'~
a~k
dx
= ! ixk!(x)eix.~ dx = iF(Xkf(x))
(the interchange of the derivative and integral is valid because f E S). Iterating,
we obtain
8 8Xk 1 * g(x) = 88
Xk
! I(x - y)g(y) dy
= ! :!k (x - y)g(y) dy
= 881 * g(x)
Xk
but also
88Xk 1 * g(x) = 88
Xk
! I(y)g(x - y) dy
= ! ::k
I(y) (x - y) dy
8g
=I*-(x).
8Xk
F(I*g)(O = !! I(x-y)g(y)dyeix·edx.
For each fixed y make the change of variable x --+ x +y in the inner integral
and you have just the Fourier transform of I. Thus
= FI(~)Fg(~).
since
-I 1 (
F F(x) = (21r)nFF -x).
Many of the above identities are valid for more general classes of functions
than S. We will return to this question later.
We should pause now to consider the significance of this table. Some of the
operations are simple, and some are complicated. For instance, differentiation is
The Fourier inversion formula on S 35
p( -i~)u(~) = j(~)'.
This is a simple equation to solve for u; we divide
1 A
p( -i~/(~) = F(g * 1)
if
i.e., if
g = F- 1 (P(~i~)) .
Thus u = g * f. Thus the problem is to compute
F-I(_1
p(-iO
).
In many cases this can be done explicitly. Of course this is an oversimplification
because in general it is difficult to make sense of F-I(I/p(-i~)), especially
if p( -i~) has zeroes for ~ E lIRn. At any rate, this is the basic idea of Fourier
analysis-by using the above table many diverse sorts of problems may be
reduced to the problem of computing Fourier transforms.
:; I If(x)eix·el dx
and leix·el = 1.
1
Now let f E S. We want to show that is rapidly decreasing. That means that
p(~)I(~) remains bounded for any polynomial p. But by the table p(~)I(O =
F(p(itx)f) and so
But
where
Gt(x _ y) = ~ [00 e-i(x-Y)'~e-tl~12 d~,
27r Loo
which just means that G t is the inverse Fourier transform of the Gaussian
function e-tl~12 .
38 Fourier Transforms
~
271"
1 00
-00
e- ix .€ j(~)e-tl€12 d~ = 1
00
-00
f(y)Gt(x - y) dy
for all t > 0, and we take the limit as t -+ O. The limit of the left side is
2~ f e-ix'€j(Od~, which is just F-'Ff(x). The limit on the right side is
f (x) because it is what is called a convolution with an approximate identity,
G t * f(x). The properties of Gt that make it on approximate identity are the
following:
Indeed 2 and 3 are obvious, and 1 follows by a change of variable from the
formula
i: e- x2 dx = v'7r
which we will verify in the process of computing the Fourier transform of the
Gaussian.
Thus the Fourier inversion formula is a direct consequence of the approximate
identity theorem: if Gt is an approximate identity then Gt * f -+ f as t -+ 0
for any f E S. To see why this theorem is plausible we have to observe that the
convolution Gt * f can be interpreted as a weighted average of f (by properties
1 and 2, with Gt(x - y) as the weighting factor (here x is fixed and y is the
variable of integration). But by property 3 this weighting factor is concentrated
around y = x, and this becomes more pronounced as t -+ 0, so that in the limit
we get f(x).
functions is of the form ce-tlxl2 for some constant c and complex t with Re t >
0). Since the Fourier transform preserves these properties we expect the Fourier
transform to have the same form. But we give a more direct computation.
The method used is the calculus of residues. We have
f(x) = e-tlxl2
-tx
2.
+ zx~ = -t (i~)2
x - 2t
e
4t
so
Now it is tempting to make the change of variable x ---+ x + i~/2t in this last
integral to eliminate the dependence on ~. But this is a complex change of
variable and must be justified by a contour integral. Consider the contour C R in
the following figure for ~ < 0 (if ~ > 0 the rectangle will lie below the x-axis).
Since e- tz2 is analytic,
by the Cauchy integral theorem. The integral along the x-axis side is
40 Fourier Transforms
5
2t
-R o +R
FIGURE 3.1
I: e- tx2 dx - I: e- t (x- i e/ 2t )2 dx = O.
Our computation will be complete once we evaluate g(t) = J::"oo e- tx2 dx.
There is a famous trick for doing this:
= 11
00
-00
00
-00
e- t (x2+y2) dx dy.
g(t)2 = 1211" 1 00
e- tr\ dr dO
= 271" 1 00
e- tr2 r dr
~-2~e~:T t
Problems 41
and so
and
_1_ { e-tl~12e-ix.~ d~ = 1 e-lxI2/4t.
(27r)n iRn (47rt)n/2
An interesting special case is when t = 1/2; then if f(x) = e-lxI2/2 we have
Ff = (27r)n/2f.
This formula is the starting point for many Fourier transform computations,
and it is useful for solving the heat equation and SchrOdinger's equation.
3.6 Problems
1. Compute e-slxl2 * e-tlxl2 using the Fourier inversion formula.
2. Show that F(drf) = r-ndl/rFf for any f E S where drf(x) =
f(rxI, . .. , rx n ).
3. Show that f(x) is real valued if and only if j( -~) = j(~).
4. Compute F(xe- tx2 ) in JR1.
5. Compute F( e-ax2+bx+c) in JRI for a > O. (Hint: Complete the square
and use the ping-pong table.)
6. Let
f(x) = { ~ if - a < x
otherwise
<a
rOO 2
= i- oo eX /2 dx = +00
43
44 Fourier Transforms of Tempered Distributions
r
A"I~A
If(x)1 dx S cAN as A ~ 00
Exercise: Verify that if f satisfies this estimate then In~.n If(x)cp(x)1 dx < 00
for all cp E S(JRn ) so In~n f(x )cp(x) dx is a tempered distribution.
Now why complicate things by introducing tempered distributions? The an-
swer is that it is possible to define the Fourier transform of a tempered dis-
tribution as a tempered distribution, but it is impossible to define the Fourier
transform of all distributions in V' (JRn ) as distributions.
Recall that we were able to define operations on distributions via adjoint
identities. If T and S were linear operations that took functions in V to functions
in V such that
J (T'IjJ(x))cp(x) dx = J 'IjJ(x)Scp(x) dx
for any distribution f E V'. The same idea works for tempered distributions.
The adjoint identity for 'IjJ, cp E S is usually no more difficult than for 'IjJ, cp E V.
The only new twist is that the operations T and S must preserve the class S
instead of V. This is true for the operations we discussed previously with one
exception: Multiplication by a COO function m(x) is allowed only if m(x) does
not grow too fast at infinity; specifically, we require m(x) S clxl N as x ~ 00
for some c and N. This includes polynomials but excludes e1x12, for e1xl2 e-1x12 /2
is not in S while e-lxI2/2 E S.
But in dealing with the Fourier transform it is a real boon to have the class
S: If cp E S then Fcp E S, while if cp E V it may not be true that Fcp E V
(surprisingly it turns out that if both cp and Fcp are in V then cp = O!!) So all
that remains is to discover an adjoint identity involving F. Such an identity
should look like
J ?j;(x)cp(x) dx = J 'IjJ(x)Scp(x) dx
= J 'IjJ(y)cjJ(y) dy.
Now the adjoint identity allows us to define the Fourier transform of a tem-
pered distribution 1 E S'(Jm.n ) : (j, cp) = (J, cjJ). In other words, j is that
functional on S that assigns to cp the value (J, cjJ). If 1 is actually a function
in S then j is the tempered distribution identified with the function j (x). In
other words ,this definition is consistent with the previous definition, since we
are identifying functions 1(x) with the distribution
J j(x)cp(x) dx = J l(x)cjJ(x) dx
4.2 Examples
1. Let j = O. What is I?
We must have c,O(O) = (0, c,O) = (I,
ep). But by
definition c,O(O) = f ep(x) dx so 1 == 1. In this example j is not at all smooth,
1
so has no decay at infinity. But j has rapid decay at infinity so is smooth. 1
2. Let j = 0' (n = 1). Since j = (djdx)o and g = 1 we would like to use our
"ping-pong" table to say I(~) = -i~ . g(~) = -i~. This is possible-in fact,
the entire table is essentially valid for tempered distributions (for convolutions
and products one factor must be in S). Let us verify for instance that
F (8~k j) = (-iXk)1
for any j E S(Jm.n ). By definition
By definition
-(f,F(ixkep)) = -(/,ixkep).
Finally, use the definition of multiplication by -iXk:
-(/,iXkep) = (-iXkl,ep).
Examples 47
Altogether then,
which is to say F( a~J) = -iXk j. The other entries in the table are verified
by similar "definition chasing" arguments. Note that 8' is somewhat "rougher"
than 8, so its Fourier transform grows at infinity.
Having first obtained the answer, how do we justify it from the definition?
We have to show
which is to say
for all cp E S, both integrals being well defined. Now our starting point was the
48 Fourier Transforms of Tempered Distributions
fact that F(e-tlxI2) = (fr/ 2 e-lxI2/4t, which via the adjoint identity gives
and
for fixed rp E S. For Re z > 0 the integrals converge (note that 1/ z also has
real part> 0) and can be differentiated with respect to z so they define analytic
functions in Re z > o.
We have seen :hat F and G are equal if z is real (and> 0). But an analytic
function is determined by its values for z real so F(z) = G(z) in Re z > o.
Finally F and G are continuous up to the boundary z = is for s :j:. 0,
4. Let f(x) = e- t1xl , t > O. This is a rapidly decreasing function but it is not in
S because it fails to be differentiable at x = O. For n = 1 it is easy to compute
i:
the Fourier transform directly:
1(0 = e-tlxleix~ dx
= rO
etx +ix € dx + roo e- tx +ix € dx
i-oo io
= eX(t+i€)] 0 e X(-t+iO ] 00
t + h,°C
-00
+ -t + ~."°C 0
1 1 2t
-t+-~-o~ - -t + i~ - t 2 + ~2·
Examples 49
e- tlxl = 10 00
g(s)e-slxI2 ds (g depends on t).
F(e- tlxl ) = 10 00
g(s)F(e-slxI2) ds
= 10 00
g(s) (~) n/2 e-lxI2/4s ds.
We then try to evaluate this integral (even if we cannot evaluate it explicitly, it
may give more information than the original Fourier transform formula).
Now the identity we seek is independent of the dimension because all that
appears in it is Ixl which is just a positive number (call it A for emphasis). So
we want
e-t>, = 10 00
g(s)e- s)..2 ds
for all A > O. We will obtain such an identity from the one-dimensional Fourier
transform we just computed. We begin by computing
1 o
00
e
-st2 -se
e
_
ds -
e- s(t 2+e)
(2
- t
(;2)
+ ..
lOO
o
1
- t2 +e'
Since we know e- tlxl = ~ J~oo t2!~2e-ix~ d~ we may substitute in for 1/(t2 +
e) and get
so
50 Fourier Transforms of Tempered Distributions
which is essentially what we wanted. (We could make the change of variable
S -+ 1/4s to obtain the exact form discussed above, but this is unnecessary.)
Now we let x vary in JRn and substitute Ixl for A to obtain
so
The last step is to eyaluate2 this inte9ral. 2 We fi~st try to remove the depend~nce
on~. Note that e- 8t e- 81E1 = e- 8(t +IEI ). ThIs suggests the change of van able
S -+ sl(t 2 + 1~12). Doing this we get
-r"( e -tlxl)(c)
.r ." -_ t
(t 2 + 1~12)
.!!.±!
2
1
0
00
_1 -(4 'irS )n/2 e- 8 ds.
()1/2
'irS
Note this agrees with our previous computation where n = 1. Once again we see
the decay at infinity of e- t1x I mirrored in the smoothness of its Fourier transform,
while the lack of smoothness of e- t1xl at x = 0 results in the polynomial decay
at infinity of the Fourier transform.
Actually we will need to know F-l(e-tlxl), which is (2;)nF(e-tlxl)(-x),
so
5. Let f(x) = Ixl"'. For a > -n (we may even take a complex if Re
a > -n), f is locally integrable (it is never integrable) and does not increase
Examples 51
(we have made the change of variable s -+ slxl- 2). Of course for this integral
to converge, the singularity at s = 0 must be better than s-I, so we require
a < O. Thus we have imposed the conditions -n < a < 0 to obtain
= 7rn/2
("')
r -2
1 0
00
'"
s-2 n
- 2-le-I~12/48 ds.
Now to evaluate the integral make the change of variable s -+ 1~12 /4s, ds -+
lif ds so
4s 2
v,*'IjJ(x) = J rp(x-y)'IjJ(y)dy
if rp, 'IjJ E S defines a function in Sand F( rp * 'IjJ) = rp .;j;. Since products are
not defined for all distributions we cannot expect to define convolutions of two
tempered distributions. However if one factor is in S there is no problem. Fix
'IjJ E S. Then convolution with 'IjJ is an operation that preserves S, so to define
'IjJ * f for f E S' we need only find an adjoint identity. Now
true is that the distribution defined by this function is tempered and agrees with
the previous definition. This is in fact the case. What you have to show is that
if we denote by g(x) = (f,T-x,(fi) then J g(x)<p(x)dx = (f,,(fi * <p). Formally
we can derive this by substituting
j g(x)<p(x)dx = j(f,T-x,(fi)<P(X)dX
= \1, !(T-x,(fi)<P(X)dX)
What this shows is that convolution is a smoothing process. If you start with
any tempered distribution, no matter how rough, and take the convolution with
a test function, you get a smooth function.
Let us look at some simple examples. If 1 = 6 then
= 1/J * - 6
a
aXk
then it is just the convolution of 1 with 2~ J~CXJ e-ix.~ d~ which is the inverse
Fourier transform of the constant function 1. But we recognize from g := 1 that
2~ J~CXJ e-ix.~ d~ = 6(x) (in the distribution sense, of course) so that
F- 1F1=1*6=j.
54 Fourier Transforms of Tempered Distributions
4.4 Problems
1. Let
f(x) = CO 1 CXJ
to-le-te-tlxI2 dt
9. Let
(f, rp) = lim ( rp(x) dx in Jm.1 .
<-+0 J1xl>< x
Show that :Ff (~) = c sgn ~ because :Ff is odd and homogeneous of degree
zero. Compute the constant c by using d/d~ sgn ~ = 26. (Convolution
with f is called the "Hilbert transform".)
10. Compute the Fourier transform of sgnx e- t1xl on Jm.1 . Take the limit as
t -+ 0 to compute the Fourier transform of sgnx. Compare the result with
problem 9.
11. Use the Fourier inversion formula to "evaluate" J~oo si:x dx (this integral
converges as an improper integral
·
I1m
N-+oo
j-N
N .
smx dx
--
X
i: ci:Xr
12. Use the Plancherel formula to evaluate
dx.
{'X> 1
Loo (1 + x2)2 dx.
13. Compute the Fourier transform of 1/(1 + x 2 )2 in Jm.1 •
14. Compute the Fourier transform of
x>o
x ::; O.
Can you do this even when k is not an integer (but k > O)?
Solving Partial Differential Equations
in JR2 or
82 82 82
- +8X2
8X2
- +8X2
-
I 2 3
in JR3. First we ask if there are solutions to the equation ~u = f for a given
f. If there are, they are not unique, for we can always add a harmonic function
(solution of ~u = 0) without changing the right-hand side.
Now suppose we could solve the equation ~P = 8. Then
~(P * f) = ~P * f = 8 * f = f
so P * f is a solution of ~u = f. Of course we have reasoned only formally,
but if P turns out to be a tempered distribution and f E S, then every step is
justified.
Such solutions P are called fundamental solutions or potentials and have been
known for centuries. We have already found a potential when n = 2. Remember
we found ~ 10g(XI + x~) = 8/41r so that 10g(XI + xD/41r is a potential (called
the logarithmic potential).
When n = 3 we can solve ~P = 8 by taking the Fourier transform of both
sides. We get
F(~P) = _1~12p(~) = F8 = 1.
SO P(~) = _1~1-2 and we have computed (example 5 of section 4.2)
56
The Laplace equation 57
(this is called the Newtonian potential). We could also verify directly using
Stokes' theorem that tlP = 6 in this case.
Now there is one point that should be bothering you about the above compu-
tation. We said that the solution was not unique, and yet we came up with just
one solution. There are two explanations for this. First, we did cheat a little.
From the equation _1~12p = 1 we cannot conclude that P = -1/1~12 because
the multiplication is not of two functions but a function times a distribution.
Now it is true that _1~12 . (_1~1-2) = 1 regarding _1~1-2 as a distribution. But
if we write P = _1~1-2 + g then _1~12p = 1 is equivalent to _1~12g = 0 and
this equation has nonzero solutions. For instance, g = 6 is a solution since
w = g- h on B
where h = P * F restricted to B. Now it can be shown that h is continuous
so that the problem for w is the classical Dirichlet problem: find a harmonic
function on D with prescribed continuous values on B. This problem always
has a unique solution, and for some domains D it is given by explicit integrals.
Once you have the unique solution w to the Dirichlet problem, u = v + w is
the unique solution to the original problem.
Next we will use Fourier transforms to study the Dirichlet problem when
D is a half-plane (D is not bounded, of course, but it is the easiest domain
to study). It will be convenient to change notation here. We let t be a real
variable that will always be 2: 0, and we let x = (XI, ... , x n ) be a variable in
Jm.n (the cases of physical interest are n = 1,2). We consider functions u(x, t)
for X E Jm.n , t 2: 0 which are harmonic
and which take prescribed values on the boundary t = 0 : u(x,O) = f(x). For
now let us take f E S(Jm.n).
The solution is not unique for we may always add ct, that is a harmonic
function that vanishes on the boundary. To get a unique solution we must add
a growth condition at infinity, say that u is bounded.
The method we use is to take Fourier transforms in the x-variables only (this
is sometimes called the partial Fourier transform). That is, for each fixed t 2: 0,
we regard u(x, t) as a function of x. Since it is bounded it defines a tempered
The Laplace equation 59
82
at 2u(x, t) + b.xu(x, t) = 0
becomes
82
8t2Fxu(~, t) -1~12Fxu(~, t) = 0
82
at2Fxu(~, t) -1~12Fxu(~, t) = 0
has solutions CI etl~1 + c2e-tl~1 where CI and C2 are constants. Since these
constants can change with ~ we should write
for the general solution. Now we can simplify this formula by considering the
fact that we want u(x, t) to be bounded. The term CI (~)etlel is going to grow
with t as t -+ 00, unless CI(~) = O. So we are left with Fxu(~, t) = c2(~)e-tl~l.
From the boundary condition Fxu(~,O) = j(~) we obtain C2(0 = j(O so
Fxu(~, t) = j(~)e-tl~l, hence
so
This is referred to as the Poisson integral formula for the half-space. The
integral is convergent as long as f is a bounded function and gives a bounded
harmonic function with boundary values f. The derivation we gave involved
some questionable steps; however, the validity of the result can be checked and
the uniqueness proved by other methods.
60 Solving Partial Differential Equations
u(x, t) =:;1 1 00
can be derived from the Poisson integral formula for the disk by conformal
mapping.
and the initial condition becomes Fx(~, 0) = j(~). Solving the differential
equation we have
Fxu(~, t) = c(~)e-ktl~12
and from the initial condition c(~) = j(~) so that
One interesting aspect of this solution is the way it behaves with respect to
time. This is easiest to see on the Fourier transform side:
which correspond to a circular piece of wire (insulated, so heat does not transfer
to the ambient space around the wire). In the problems you will encounter other
boundary conditions.
The word "periodic" is the key to the solution. We imagine the initial
temperature f(x), which is defined on [0,1] (to be consistent with the peri-
odic boundary conditions we must have f(O) = f(I)), extended to the whole
line as a periodic function of x, f(x + 1) = f(x) all x, and similarly for
u(x, t), u(x + I, t) = u(x, t) all x (note that the periodic condition on u implies
the boundary condition just by setting x = 0).
Now if we substitute our periodic function f in the solution for the whole
line, we obtain
u(x t) =
, 1 ~
10t ( (47rkt)I/2.~ e-(X+j-Y)2/ 4kt ) fey) dy
J=-OO
because the series converges rapidly. Observe that this formula does indeed
produce a periodic function u(x, t), since the substitution x ---+ x + 1 can be
erased by the change of summation variable j ---+ j - 1.
Perhaps you are more familiar with a solution to the same problem using
Fourier series. This, in fact, was one of the first problems that led Fourier to the
discovery of Fourier series. Since the problem has a unique solution, the two
solutions must be equal, but they are not the same. The Fourier series solution
looks like
u(x, t) = L
00
ane-411'2n2kte211'inx
n=-oo
Both solutions involve infinite series, but the one we derived has the advantage
that the terms are all positive (if f is positive).
Before inverting the Fourier transform let us make some observations about
the solution. First, it is clear that time is reversible--except for a minus sign
in the second term, there is no difference between t and -to So the past is
determined by the present as well as the future.
Another thing we can see, although it requires work, is the conservation of
energy. The energy of the solution u(x, t) at time t is defined as
The first term is kinetic energy and the second is potential energy. Conservation
of energy says that E(t) is independent of t.
To see this we express the energy in terms of Fxu(~, t). Since
(the cross terms cancel and the sin 2 + cos 2 terms add to one). Thus
E(t) = J
2(2~)n (el~12Ij(~W + Ig(~W) d~
independent of t.
Now to invert the Fourier transform. When n = 1 this is easy since cos ktl~1 =
!-(
eikt~ + e-ikt~) so
I 1
+ kt) + f(x -
A
Similarly
where d(7(x) is the element of surface integration on the unit sphere. In terms
of spherical coordinates (x,y,z) = (cosOJ,sinOlcos(h,sinOlsin02) for the
sphere, with °: ;
01 ::; 7r and °: ;
O2 ::; 27r, this is just
Now to compute 8"(0 we need to evaluate this integral when cp(x) = eix.f.. To
make the computation easier we use the observation (from problem 3.8) that 8" is
radial, so it suffices to compute 8"(6,0,0), for then 0'(6,6,6) = 8"(1~1,0,0).
But
((7, eixI~I) = 127r 1'" ei~1 cos iii sin 01 dOl d02
47r sin 6
~I
The wave equation 65
and so 8-(0 = 47rsin I~I/I~I. Similarly, if (jr denotes the surface integral over
the sphere Ixl = r of radius r, then 8-r(~) = 47rr sin rl~I/I~1 and so
(
1 ) sin ktl~1
F 47rk2t (jkt = kl~1 .
Thus
F
_I (Ag(~) sinktl~l)
kl~1
1
= 47rk 2t(jkt * g(x).
Furthermore, if we differentiate this identity with respect to t we find
a
u(x, t) = at ( 47rk1 2t (jkt * f(x) ) + 47rk1 2t (jkt * g(x).
The convolution can be written directly as
(jkt * f(x) = r
J1yl=kt
f(x + y) d(jkt(Y),
or it can be expressed in terms of integration over the unit sphere
u(x, t) =
+ -t 127r 17r g(xl + kt cos 0), X2 + kt sin 0 cos ( 2) sin 0 dOl d02.
1 1
47r 0 0
66 Solving Partial Differential Equations
There is another way to express this solution. The pair of variables (cos (h ,
sin 01 cos O2 ) describes the unit disk xi + x~ ::; 1 in a two-to-one fashion (two
different values of O2 give the same value to cos O2 ) as (0], O2) vary over
o ::; 01 ::; 71",0 ::; O2 ::; 271". Thus if we make the substitution (YI, Y2)
(cos 01, sin 01cos O2) then dYI dY2 = sin 2011 sin O2IdOl d02 and sin 011 sin 021 =
JI -IYI2 so
Note that these are improper integrals because (1 _lyI2)-1/2 becomes infinite
as IYI ---+ I, but they are absolutely convergent. Still another way to write the
integrals is to introduce polar coordinates:
The convergence of the integral is due to the fact that Jol (rdr /~) is finite.
There are several astounding qualitative facts that we can deduce from these
elegant quantitative fonnulas. The first is that k is the maximum speed of
propagation of signals. Suppose we make a "noise" located near a point y at
time t = o. Can this noise be "heard" at a point x at a later time t? Certainly
not if the distance (x - y) from x to y exceeds kt, for the contribution to u( x, t)
from f(y) and g(y) is zero until kt ~ Ix - YI. This is true in all dimensions,
and it is a direct consequence of the fact that u(x, t) is expressed as a sum of
convolutions of f and 9 with distributions that vanish outside the ball of radius
kt about the origin. (Compare this with the heat equation, where the "speed
of smell" is infinite!) Also, of course, there is nothing special about starting at
t = o. The finite speed of sound and light are well-known physical phenomena
(light is governed by a system of equations, called Maxwell's equations, but each
component of the system satisfies the wave equation). But something special
happens when n = 3 (it also happens when n is odd, n ~ 3). After the noise
is heard, it moves away and leaves no reverberation (physical reverberations
of sound are due to reflections off walls, ground, and objects). This is called
Huyghens' principle and is due to the fact that distributions we convolve f and
9 with also vanish inside the ball of radius kt. Another way of saying this
is that signals propagate at exactly speed k. In particular, if f and 9 vanish
Schrodinger's equation and quantum mechanics 67
outside a ball of radius R, then after a time f(R + lxi), there will be a total
silence at point x. This is clearly not the case when n = 1,2 (when n = 1
it is true for the initial position f, but not the initial velocity g). This can be
thought of as a ripple effect: after the noise reaches point x, smaller ripples
continue to be heard. A physical model of this phenomenon is the ripples you
see on the surface of a pond, but this is in fact a rather unfair example, since
the differential equations that govern the vibrations on the surface of water are
nonlinear and therefore quite different from the linear wave equation we have
been studying. In particular, the rippling is much more pronounced than it is
for the 2-dimensional wave equation.
There is a weak form of Huyghens' principle that does hold in all dimensions:
the singularities of the signal propagate at exactly speed k. This shows up in the
convolution form of the solution when n = 2 in the smoothness of (1-lyI2)-'/2
everywhere except on the surface of the sphere IYI = 1.
Another interesting property is the focusing of singularities, which shows up
most strikingly when n = 3. Since the solution involves averaging over a sphere,
we can have relatively mild singularities in the initial data over the whole sphere
produce a sharp singularity at the center when they all arrive simultaneously.
Assume the initial data is radial: f(x) and g(x) depend only on Ixl (we write
f(x) = f(lxi), g(x) = g(lxi)). Then u(O, t) = (8/8t)(tf(kt)) + tg(kt) since
1
47r k 2t
1.
IYI=kt
f(y) dy = 4 1k 2 47r(kt) 2f(kt)
7r t
etc.
It is the appearance of the derivative that can make u(O, t) much worse than
°
f or g. For instance, take g(x) = and
The wave function changes with time. If u(x, t) is the wave function at time t
then
:t u(x, t) = ik[}'xu(x, t).
This is the free Schrodinger equation. There are additional terms if there is a
potential or other physical interaction present. The constant k is related to the
mass of the particle and Planck's constant.
The free Schrodinger equation is easily solved with initial condition u(x, 0) =
cp(x). We have (8/8t)rxu(~,t) = ikl~12rxu(~,t) and rxu(~,O) = cp(~) so
that
rxu(~, t) = eiktl €12 cp(~).
Referring to example (3) of 4.2 we find
f 1) f Irxu(~,tWd~
lu(x,tWdx = -(2
JIR3 ~ n JIR 3
is independent of t, so once the wave-function is normalized at t = 0 it remains
normalized.
The interpretation of the wave function is somewhat controversial, but the
standard description is as follows: there is an imperfect coupling between phys-
ical measurement and the wave function, so that a position measurement of a
particle with wave function cp will not always produce the same answer. Instead
we have only a probabilistic prediction: the probability that the position vector
will measure in a set A ~ lffi.3 is
IA Icp(xWdx
IIR3Icp(x)12 dx .
We have a similar statement for measurements of momentum. If we choose
units appropriately, the probability that the momentum vector will measure in a
set B ~ lffi.3 is
I8 Icp(xWd~
IIR3Icp(x)12 d(
Note that IIR3Icp(~W d~ = (h)n IIR3(CP(X)? dx so that the denominator is
always finite.
Now what happens as time changes? The position probabilities change in a
very complicated way, but Irxu(~, t)1 = Icp(~)1 so the momentum probabilities
Problems 69
5.5 Problems
1. For the Laplace and the heat equation in the half-space prove via the
Plancherel formula that
lu(x,t)1 ~ [JRn
[ Gt(Y)dY] sup If(y)l·
vERn
3. Solve
8 2
[ 8t 2 + ~x ] 2 U(X, t) = 0
and the factors commute. Deduce from this that an analytic function is
harmonic.
8. Let f be a real-valued function in S(Jffi.I ) and define 9 by g(~) =
-i(sgn oj(~). Show that 9 is real-valued and if u(x, t) and v(x, t) are the
harmonic functions in the half-plane with boundary values f and 9 then
they are conjugate harmonic functions: u + iv is analytic in z = x + it.
(Hint: Verify the Cauchy-Riemann equations.) Find an expression for v
in terms of f. (Hint: Evaluate F-I(-isgn~e-tl€l) directly.)
9. Show that a solution to the heat equation (or wave equation) that is inde-
pendent of time (stationary) is a harmonic function of the space variables.
10. Solve the initial value problem for the Klein-Gordon equation
8 2u _
8t 2 -
A
iJ.xU -
2
mum> °
8u
u(x,O) = f(x), 8t (x,O) = g(x)
for Fxu(~, t) (do not attempt to invert the Fourier transform).
11. Show that the energy
(Hint: Extend all functions by odd reflection about the boundary points
°
x = and x = 1 and periodicity with period 2).
13. Do the same as problem 12 for Neumann boundary conditions
8 8
8x u(O, t) = 0, 8x u(l, t) = 0.
where
Gs(Y) = 1 e-lyI2/4ks
(47rks)n/2
is the solution kernel for the homogeneous heat equation. Use this to
solve the fully inhomogeneous problem.
a
at u(x, t) = ktl",u(x, t) + F(x, t)
u(x,O) = f(x).
15. Show that the inhomogeneous wave equation on lffi.n with homogeneous
initial data
a2
at 2 u(x, t) = e tl",u(x, t) + F(x, t)
u(x,O) = 0
a
atu(x,O) = 0
where
Show this is valid for negative t as well. Use this to solve the inhomoge-
neous wave equation with inhomogeneous initial data.
16. Interpret the solution in problem 15 in terms of finite propagation speed
and Huyghens' principle (n = 3) for the influence of the inhomogeneous
term F(x, t).
17. Show that if the initial temperature is a radial function then the temperature
at all later times is radial.
18. Maxwell's equations in a vacuum can be written
1 a
--a E(x, t) = curl H(x, t)
c t
1 a
--a H(x,t) = -curl E(x,t)
c t
72 Solving Partial Differential Equations
where the electric and magnetic fields E and H are vector-valued func-
tions on ~. Show that each component of these fields satisfies the wave
equation with speed of propagation c.
19. Let u(x, t) be a solution of the free Schrodingerequation with initial wave
function 'P satisfying fIR} 1'P(x)1 dx < 00. Show that lu(x, t) :'S ct- 3/ 2 for
some constant c. What does this tell you about the probabilty of finding
a free particle in a bounded region of space as time goes by?
The Structure of Distributions
Definition 6.1.1
T == 0 on n means (T, cp) = 0 for all test functions cp with supp cp ~ n.
Then supp T is the complement of the set of points x such that T vanishes in a
neighborhood of x.
73
74 The Structure of Distributions
where the sum set A + B is defined to be the set of all sums a + b with a E A
and bE B.
In I/(x)1 dx = 0
In I(x)cp(x) dx = 0
for every test function cp with support in n (in fact this is true for any function
cp). This should seem plausible, and it follows from the inequalities
(not alphabetical order, alas, even in French). For the distributions the contain-
ments are reversed:
['<:;;;S'c'D'.
In particular, since distributions of compact support are tempered, the Fourier
transform theory applies to them.
We will have a lot more to say about this in section 7.2.
T = L 8(n)(x - n),
n=1
that means
00
(T,'P) = L'P(n)(n).
n=1
For each fixed 'P, all but a finite number of terms will be zero. Of course
we have not yet written T as a sum of derivatives of functions, but recall that
8 = H' where H is the Heaviside function
so
00
T = L H(n+I)(x - n)
n=1
78 The Structure of Distributions
does the job. In fact, we can go a step further and make the functions continuous,
if we note that H = X' where
X(x) ={ ~ if x> 0
if x ~ o.
Thus we can also write
00
T= Lx(n+2)(x-n).
n=t
This is typical of what is true in general.
The results we are talking about can be thought of as "structure theorems"
since they describe the structure of the distribution classes [', S', and V'. On the
other hand, they are not as explicit as they might seem, since it is rather difficult
to understand exactly what the 27th derivative of a complicated function is.
Before stating the structure theorems in m.n , we need to introduce some no-
tation. This multi-index notation will also be very useful when we discuss
general partial differential operators. We use lowercase greek letters a, (3, ...
to denote multi-indices, which are just n-tuples a = (at, ... , an) of nonneg-
ative integers. Then (a/ax)'" stands for (a/axt}a 1 ••• (a/axn)",n . Similarly
xa = Xfl ... x~n. We write lal = at + ... + an and call this the order of the
multi-index. This is consistent with the usual definition of the order of a partial
derivative. We also write a! = at! ... an!.
such that
Structure Theorem for v': Let T be a distribution on m.n . Then there exist
continuous functions fa such that
where for every bounded open set n, all but a finite number of the distributions
(a / ax) a fa vanish identically on n.
Stnucturetheorenls 79
a
-fl+-12
a
aXI aX2
requires (a2/aXlaX2)f.
The highest order of derivative (the maximum of lal = al + a2 + ... + an) in
a representation T = L (a/ax) a fa is called the order of the distribution. The
trouble with this definition is that we really want to take the minimum value of
this order over all such representations (remember the lack of uniqueness). Thus
if we know T = L (a/ax)'" fa with lal ~ m then we can only assert that the
order of T is ~ m, since there might be a better representation. (Also, there
is no general agreement on whether or not to insist that the functions fa in the
representation be continuous; frequently one obtains such representations where
the functions fa are only locally bounded, or locally integrable, and for many
purposes that is sufficient.) However, despite all these caveats, the imprecise
notion of the order of a distribution is a widely used concept. One conclusion
that is beyond question is that the first two structure theorems assert that every
distribution in [' or S' has finite order. We have seen above an example of a
distribution of infinite order.
One subtle point in the first structure theorem is that the functions fa them-
selves need not have compact support. For example, the 6-function is not the
derivative (of any order) of a function of compact support, even though 6 has
support at the origin. The reason for this is that (T', 1) = 0 if T has compact
support, where 1 denotes the constant function. Indeed (T', 1) = (T', 'Ij;) by
definition, where 'Ij; E D is one on a neighborhood of the support of T'. But
(T','Ij;) = (T,'I//) and '1// vanishes on a neighborhood of the support of T, so
(T', 'Ij;) = O. Notice that this argument does not work if T' has compact support
but T does not.
The proofs of the structure theorems are not elementary. There are some
proofs that are short and tricky and entirely nonconstructive. There are con-
structive proofs, but they are harder. When n = 1, the idea behind the proof
is that you integrate the distribution many times until you obtain a continuous
function. Then that function differentiated as many times brings you back to
the original distribution. The problem is that it is tricky to define the integral
of a distribution (see problem 9). Once you have the right definition, it requires
the correct notion of continuity of a distribution to prove that for T in [' or S'
80 The Structure of Distributions
On the other hand, we might also conjecture that certain infinite sums might be
allowed if the coefficients aa tend to zero rapidly enough. It turns out that the
first guess is right, and the second is wrong. In fact, the following is true: given
any numbers Ca , there exists a test function <p in V such that (a/ax)'" <p(O) =
Ca. In other words, there is no restriction on the Taylor expansion of a Coo
This means that the Taylor expansion does not have to converge (even if it does
converge, it does not have to converge to the function, unless <p is analytic).
Suppose we tried to define T = 2:: aa (a/ax) a 6 where infinitely many aa
are nonzero. Then (T, <p) = 2::( -I )Ialaaca and by choosing Ca appropriately
we can make this infinite. So such infinite sums of derivatives of 6 are not
distributions. Incidentally, it is not hard to write down a formula for <p given
Ca. It is just
Distributions with point support 81
-1
L o
FIGURE 6.1
and Aa --t 00 rapidly as a --t 00. If supp 1jJ is Ixl :::; I, then supp 1jJ(AaX) is
Ixl :::; A;;1 which is shrinking rapidly. For any fixed x f- 0, only a finite number
of terms in the sum defining 'P will be nonzero, and for x = 0 only the first
term is nonzero. If we are allowed to differentiate the series term by term then
we get (a/ax)'" 'P(O) = C a because all the derivatives of 1jJ( AaX) vanish at the
origin. It is in justifying the term-by-term differentiation that all the hard work
lies, and in fact the rate at which Aa must grow will be determined by the size
of the coefficients Ca.
Just because there are no infinite sums L aa (a/ax) a 6 does not in itself
prove that every distribution supported at the origin is a finite sum of this sort.
It is conceivable that there could be other, wilder distributions, with one point
support. In fact, there are not. Although I cannot give the entire proof, I can
indicate some of the ideas involved. If T has support {O}, then (T, 'P) = 0 for
every test function 'P vanishing in a neighborhood of the origin. However, it
then follows that (T, 'P) = 0 for every test function 'P that vanishes to order N
at zero, (a/ax)'" 'P(O) = 0 for all lal :::; N (here N depends on the distribution
T). This is a consequence of the structure theorem, T = L (a/ax )'" fa (finite
sum) and an approximation argument: if'P vanishes to order N we approximate
'P by 'P(x)(1 - 1jJ(AX)) as A --t 00. Then (T, 'P(x)(1 - 1jJ(AX))) = 0 because
'P(x)(I - 1jJ(AX)) vanishes in a neighborhood of zero and
(the convergence as A ----+ 00 follows from the fact that <p vanishes to order Nat
zero and we never take more than N derivatives-the value of N is the upper
bound for lad in the representation T = L: (a/ax)" fa). Thus (T, <p) is the
limit of terms that are always zero, so (T, <p) = o. This is the tricky part of the
argument, because for general test functions, say if <p(0) =f:. 0, the approximation
of <p(x) by <p(x)(1 -1/J(AX)) is no good-it gives the wrong value at x = O.
Once we know that (T, <p) = 0 for all <p vanishing to order N at the origin, it
is just a matter of linear algebra to complete the proof. Let us write, temporarily,
VN for this space of test functions. It is then easy to construct a finite set of
test functions <Pa (one for each a with lal : : ; N) so that every test function <p
can be written uniquely <p = IjJ + L: Ca<Pa with IjJ E VN. Indeed we just take
<Pa = (l/a!)x<>1/J(x), and then Ca = (a/ax)" <p(0) does the trick. In terms of
linear algebra, VN has finite codimension in V. Now (T, <p) = L: ca(T, <Pa) for
any <p E V because (T, 1jJ) = O. But the distribution L: ba (a/ax) 0 6 satisfies
= L(-I)lolcaba
since
if a =f:. (3 and 1 if a = (3. Thus we need only choose bo = (-I )1 0 1(T, <p) in
order to have (T, <p) equal to
FIGURE 6.2
test function 'IjJ that is identically one on a neighborhood of supp (T). Then
(T, 'IjJ) is a positive number; call it M. We then claim that I(T, cp)1 :::; eM where
e = sup", Icp(x)l, for any test function cpo
To prove this claim we simply note that the test functions e'IjJ ± cp are positive
on a neighborhood of supp (T), and so (T, e'IjJ ± cp) ~ 0 (here we have used
the observation that if T is a positive distribution of compact support we only
need nonnegativity of the test function cp on a neighborhood of the support of
T to conclude (T, cp) ~ 0). But then both (T, cp) :::; (T, e'IjJ) and
I(T,cp)l:::; (T,e'IjJ) = eM
as claimed.
What does this inequality tell us? It says that the size of (T, cp) is controlled
by the size of cp (the maximum value of Icp(x)l). That effectively rules' out
anything that involves derivatives of cp, because cp can wiggle a lot (have large
derivatives) while remaining small in absolute value.
Now suppose T = T) - T2 where T) and T2 are positive distributions of
compact support. Then if M) and M2 are the associated constants we have
so T satisfies the same kind of inequality. Since we know 0' does not satisfy
such an inequality, regardless of the constant, we have shown that it is not the
difference of positive distributions of compact support. But it is easy to localize
the argument to show that 0' =1= T) - T2 even if T) and T2 do not have compact
support: just choose 'IjJ as before and observe that 0' = T) - T2 would imply
0' = 'ljJo' = 'ljJT) - 'ljJT2 and 'ljJT) and 'ljJT2 would be positive distributions of
compact support after all.
For readers who are familiar with measure theory, I can explain the story
more completely. The positive distributions all come from positive measures. If
IL is a positive measure that is locally finite (IL(A) < 00 for every bounded set
A), then (T, cp) = J cp dlL defines a positive distribution. The converse is also
true: every positive distribution comes from a locally finite positive measure.
This follows from a powerful result known as the Riesz representation theorem
and uses the inequality we derived above. If you are not familiar with measure
theory, you should think of a positive measure as the most general kind of
probability distribution, but without the assumption that the total probabity adds
up to one.
Continuity of distribution 85
1st stage
2 nd stage
FIGURE 6.3
Now define a probability measure by saying each of the two intervals in stage
1 has probability 4,
each of the four intervals in stage 2 has probability ~, and
in general each of the 2n intervals in the nth stage has probability 2 -n. The
resulting measure is called the Cantor measure. It assigns total measure one to
the Cantor set and zero to the complement. The usual length measure of the
Cantor set is zero.
cP then limj-+oo(T,cpj) = (T,cp), and (3) would say that I(T,cp)1 is less than a
constant times some quantity that measures the size of cpo Statements like (1)
and (2) should be familiar from the definition of continuous function, although
we have not yet explained what "CPt is close to CP2" or "Iimj-+oo CPj = cP"
should mean. Statement (3) has no analog for continuity of ordinary functions,
and depends on the linearity of (T, cp) for its success. Readers who have seen
some functional analysis will recognize this kind of estimate, which goes under
the name "boundedness."
To make (1) precise we now list the conditions we would like to have in
order to say that CPt is close to CP2. We certainly want the values CPt (x) close
to CP2(X), and this should hold uniformly in x. To express this succinctly it is
convenient to introduce the "sup-norm" notation:
for a finite number of derivatives, because higher order derivatives tend to grow
rapidly as the order of the derivative goes to infinity.
There is one more condition we will require in order to say CPt is close to
CP2, and this condition is not something you might think of at first: we want the
supports of CPt and CP2 to be fairly close. Actually it will be enough to demand
that the supports be contained in a fixed bounded set, say Ixl :::; R for some
given R. Without such a condition, we would have to say that a test function
cP is close to 0 even if it has a huge support (of course it would have to be very
small far out), and this would be very awkward for the kind of locally finite
infinite sums we considered in section 6.2.
Continuity of distribution 87
for all lal : : ; m and supp !.ph supp !.p2 ~ {Ixl : : ; R}. Of course "close" is a
qualitative word, but the three parameters 8, m, R give it a quantitative meaning.
The statement that T is continuous at !.pI is the statement: for every f > 0 and
R sufficiently large there exist parameters 8, m (depending on !.pI, R, and f)
such that if !.pI is close to !.p2 with parameters 8, m, R then
for every a (a = 0 takes care of the case of no derivatives), and the condition on
the supports is that there exists a bounded set, say Ixl : : ; R, that contains supp
!.p and all supp !.pj. Again this is not an obvious condition, but it is necessary if
we are to have everything localized when we take limits. This is the definition
of the limit process for V. Readers who are familiar with the c~ncepts will
recognize that this makes V into a topological space, but not a metric space-
there is no single notion of distance in any of our test function spaces V,S,
or E.
With this definition of limits of test functions, it is easy to see that if T
is continuous then (T,!.p) = Iimj--+oo (T,!.pj) whenever !.p = Iimj--+oo !.pj. The
argument goes as follows: we use the continuity of T at!.p. Given f > 0 we
take R as above and then find 8, m so that I(T,!.p) - (T, vJ)1 : : ; f whenever !.p
and vJ are close with parameters 8, m, R. Then every!.pj has support in Ixl : : ; R,
and for j large enough
Thus, for j large enough, all CPj are close to cP with parameters 8, m, R and so
But this is what limj->oo (T, CPj) = (T, cp) means. A similar argument from
the contrapositive shows the converse: if T satisfies limj->oo (T, CPj) = (T, cp)
whenever limj->oo CPj = cP (in the above sense) then T is continuous.
This form of continuity is extremely useful, since it allows us to take limiting
processes inside the action of the distribution. For example, if CPs (t) is a test
function of the two variables (s, t), then
The reason for this is first that the linearity allows us to bring the difference
quotient inside the distribution,
because the Riemann sums approximating f~oo CPs ds converge in the sense of
V, and the Riemann sums are linear. We will use this observation in section
7.2 when we discuss the Fourier transform of distributions of compact support.
The third form of continuity, the boundedness of T, involves estimates that
generalize the inequality
for all ep with support in Ixl ~ R. It is easy to see that this boundedness implies
continuity in the first two forms. For example, if epl and ep2 both have support
in Ixl ~ R then
I(T,epl) - (T,ep2)1 = I(T,epl -ep2)1
so if
= L
lalsm
(_l)la l J
Ja(x) (:x) a ep(x)dx.
90 The Structure of Distributions
hence
1 Ixl:C;R
Ifa(X)1 dx
for some m and M, for all test functions 'P, regardless of support. For tempered
distributions, the boundedness condition is again just a single estimate, but this
time of the form
for some m, M, and N, and all test functions (it turns out to be the same
whether you require 'P E D or just 'P E S, for reasons that will be discussed
in the next section). This kind of condition is perhaps not so surprising if you
recall that the finiteness of
if supp <p ~ {Ixl :::; R}. For the same R, then, we have
if supp <p ~ {Ixl :::; R}, which is of the required form with m increased to
m + 1. The same goes for all the other operations on distributions. Thus, by
postponing the discussion of bounded ness until now, I have saved both of us a
lot of trouble.
Here is an example of how the boundedness is implicit in the proof of exis-
tence. Remember the distribution
and both integrals exist and are finite. To get the boundedness we simply use a
standard inequality to estimate the integrals:
I(T, 'P)I 1
~ [ : + 00 II~I)I dx + [II 1'P'(y(x))1 dx.
If 'P is supported in Ix I ~ R then
l
dx =J- I+ir 1'PIxl(x)1
-R l
R
dx ~ 21ogRII'P1100
and
Thus altogether
I(T,'P)I ~ 21ogRII'P1100 +211'P'1100
which is an estimate of the required form with m = I. Perhaps you are surprised
that the estimate involves a derivative, while the original formula for T did not
appear to involve any derivatives. But, of course, that derivative does show up
in the existence proof.
Finally, you should beware confusing the continuity of a single distribution
with the notion of limits of a sequence of distributions. In the first case we
hold the distributions fixed and vary the test functions, while in the second case
we hold the test function fixed and vary the distributions, because we defined
Iimj--+oo Tj = T to mean Iimj--+oo (Tj, 'P) = (T, 'P) for every test function. This
is actually a much simpler definition, but of course each distribution T j and T
must be continuous in the sense we have discussed.
FIGURE 6.4
It is identically one on a large set and then falls off to zero. Actually we
want to think about a family of cut-off functions, with the property that the set
on which the function is one gets longer and larger. To be specific, let 'Ij; be one
fixed test function, say with support in Ixi S; 2, such that 'Ij;(x) = 1for Ixi S; 1.
Then look at the family 'Ij;(EX) as E- t O. We have 'Ij;(EX) = 1 if IExl S; 1,
or equivalently, if Ixi 0::; c l , and c l - t 00 as E- t O. Since 'Ij;(EX) E V, we
can form 'Ij;(Ex)T for any distribution T, and it should come as no surprise that
'Ij;( Ex)T - t T as E - t 0 as distributions. Indeed if cp is any test function, then
cp has support in Ixl S; R for some R, and then cp(X)'Ij;(EX) = cp(x) if E< l/R
(for then 'Ij;(EX) = 1 on the support of cp). Thus
for E < 1/ R, hence trivially Iim€->o('Ij;(Ex)T, cp) = (T, cp), which is what we
mean by 'Ij;( Ex)T - t T as E - t O. But it is equally trivial to show that 'Ij;( Ex)T
is a distribution of compact support: in fact its support is contained in Ixl S; 2/E
because 'Ij;( EX) = 0 if Ixi ;: : 2/ E. To summarize: by the method of multiplication
by cut-off functions, every distribution may be approximated by distributions of
compact support. To shC!w that every distribution can be approximated by test
functions, it remains to show that every distribution of compact support can be
approximated by test functions.
Now if T is a distribution of compact support and cp is a test function, then
cp *T is also a test function. In fact we have already seen that supp cp *t <;:; supp
cp+ supp T (in the sense of sum sets) so cp * T has compact support, and cp * T
is the function cp*T(x) = (T,T-xcp) (recall that cp(x) = cp(-x), which is Coo
94 The Structure of Distributions
because all derivatives can be put on cpo Let CP.(x) be an approximate identity (so
CP.(x) = cncp(c1x) where f cp(x) dx = 1). Then we claim limf--+o cp.*T = T.
Indeed, if'IjJ is any test function then (cp. * T, 'IjJ) = (T, I{J. * 'IjJ) and I{J. is also
an approximate identity, so I{J. * 'IjJ ---4 'IjJ hence (T, I{J. * 'IjJ) ---4 (T, 'IjJ).
Actually, this argument requires that I{J. *'IjJ ---4 'IjJ in V, as described in the last
section. That means we need to show that all the supports of I{J. * 'IjJ remain in a
fixed compact set, and that all derivatives (a/ax t (I{J. *'IjJ) converge uniformly
to (a/axt 'IjJ. But both properties are easy to see, since supp (I{J * 'IjJ) ~
supp I{J. + supp 'IjJ and supp I{J. shrinks as E ---4 0, while (a/ax t (I{J. * 'IjJ) =
I{J. * (a/ax) a 'IjJ and the approximate identity theorem shows that this converges
uniformly to (a/axt 'IjJ.
We can combine the two approximation processes into a single, two-step
procedure. Starting with any distribution T, form CP. * ('IjJ(Ex)T). This is a test
function, and it converges to T as E ---4 0 (to get a sequence, set E = 1/ k, k =
1,2, ... ). It is also true that we can do it in the reverse order: 'IjJ( EX)( CP. * T)
is a test function (not the same one as before), and these also approximate T as
E ---4 O. The proof is similar, but slightly different.
The same procedure shows that functions in V can approximate functions in
S or E, and distributions in 5' and E' , where in each case the approximation
is in the sense appropriate to the object being approximated. For example, if
f E S then we can regard f as a distribution, and so CP. * ('IjJ(Ex)f(x)) ---4 f as
a distribution: but we are asserting more, that the approximation is in terms of
S convergence, i.e.,
uniformly as E ---4 O.
Now we can explain more clearly why we are allowed to think of tempered
distributions as a subclass of distribution. Suppose T is a distribution (so (T, cp)
is defined initially only for cp E V), and T satisfies and estimate of the form
for all cp E V. Then I claim there is a natural way to extend T so that (T, cp)
is defined for all cp E S. Namely, we approximate cp E S by test functions
CPk E V, in the above sense. Because of the estimate that T satisfies, it follows
that (T, CPk) converges as k ---4 00, and we define (T, cp) to be this limit. It is
not hard to show that the limit is unique-it depends only on cp and not on the
particular approximating sequence. And, of course, T satisfies the same estimate
as before, but now for all cp E S. So the extended T is indeed continuous on
S, hence is a tempered distribution. The extension is unique, so we are justified
in saying the original distribution is a tempered distribution, and the existence
of an estimate of the given form is the condition that characterizes tempered
Local theory of distributions 95
For any cp E V(O, 1) this is a finite sum, since cp must be zero in a neighborhood
96 The Structure of Distributions
of 0, and it is easy to see that T is linear and continuous on V(O, 1). But clearly
T is not the restriction of any distribution in V' (Jm.I ) because the structure
theorem would imply that such a distribution has locally finite order, which T
manifestly does not. Another example is
(T, cp) = 11
o
cp(x) dx
x
for cp E V(O,I). Since cp vanishes near zero the integral is finite (i.e., l/x is
locally integrable on (0, 1)). We have seen how to extend T to a distribution
on Jm.1 , as for example
The trouble is that the finite sum L:.f=o Cjcp(j) (0) is always well defined, while
if we take cp with cp(O) =1= 0 the integral will diverge.
On the other hand, it is always possible to find an extension whose support is
a small neighborhood of n, provided of course that T is a restriction. Indeed,
if T = R.TI then also T = R.( 'ljJT1 ) where 'IjJ is any Coo function equal to 1
on n. We then have only to construct such 'IjJ that vanishes outside a small
neighborhood of n.
Turning to the second question, we write E'(n) for the distributions of com-
pact support whose support is contained in n. These are global distributions,
but it is also natural to consider them as local distributions. The point is that if
Local theory of distributions 97
Tl =1= T2 are distinct distributions in E'(S1), then also RTI =1= RT2, so that we do
not lose any information by identifying distributions in E' (S1) ~ V' (1m.n) with
their restrictions in V'(S1). Thus E'(S1) ~ V'(S1). Actually E'(S1) is a smaller
space than the space of restrictions, since the support of a distribution in E'(S1)
must stay away from the boundary of S1. Thus, for example,
(T,'P) =
lS2
r 'P(x)dO"(x)
in connection with the wave equation. A simpler example is
8
(T,'P) = -8 'P(1,0, ... ,0).
XI
o
FIGURE 6.S
x = rcos {} x = rcos{}sin¢
y = rsin{} y = r sin {} sin ¢
z = rcos¢
We obtain points on the sphere by setting r = 1, which leaves us one ({}) and
two ({}, ¢) coordinates for Sl and S2. In other words, (cos {}, sin (}) gives a point
on Sl for any choice of {}, and (cos {} sin ¢, sin {} sin ¢, cos ¢) gives a point on
the sphere for any choice of {}, ¢ (you can think of {} as the longitude and I - ¢
as the latitude.) Of course we must impose some restrictions on the coordinates
if we want a unique representation of each point, and here is where the locality
of the coordinates system comes in.
For the circle, we could restrict {} to satisfy 0 :'S {} < 27r to have one {} value
for each point. But including the endpoint 0 raises certain technical problems, so
we prefer to consider a pair of local coordinate systems to cover the circle, say
Local theory of distributions 99
o < () < 3; and -7r < () < ~, each given by an open interval in the parameter
space and each wrapping ~ of the way around the circle, with overlaps:
3n
0< ()< T
7t
-n< ()<-
2
FIGURE 6.6
Once we have V( sn-I) defined, we can define V' (sn-I) as the continuous
linear functionals on V( sn-I ). Continuity requires that we define convergence
in v(sn-I). From the extrinsic viewpoint, Iimj--+oo 'Pj = 'P in v(sn-I) just
means Iimj--+oo E'Pj = E'P in V(lJR n ). Intrinsically, it means 'Pj converges
uniformly to 'P and all derivatives with respect to the coordinate variables also
converge (a minor technicality arises in that the uniformity of convergence is
only local-away from the boundary of the local coordinate system). Because
the sphere is compact we do not have to impose the requirement of a common
compact support.
_ Now if T E v,(sn-I), we can also consider it as a distribution on IJR n , say
T, via the identity
where 'Plsn-t denotes the restriction of'P E V(lJR n ) to the sphere, and of course
'Plsn-t E v(sn-I). It comes as no surprise that T has support contained in
sn-I. But not every distribution supported on the sphere comes from a distri-
bution in v,(sn-I) in this way. For example (T,'P) = (8/ihl)'P(1,0, ... ,0)
involves a derivative in a direction perpendicular to the sphere
FIGURE 6.7
Theorem 6.7.1
Every distribution T E V'(lJRn) supported on the sphere sn-I can be written
T= L (8)k
N
k=O
ar Tk
Local theory of distributions 101
_a -"--1.._
n aX
ar - f,;r Ixlaxj'
but Ixl = 1 on the sphere, and of course a/ar is undefined at the origin.)
The same idea can be used to define distributions V' (E) for any smooth
surface I: ~ lffi.n . If I: is compact (e.g, on ellipsoid I:j=I AjX] = I,Aj > 0)
there is virtually no change, while if I: is not compact (e.g, a hyperboloid
xI xi -
+ xj = 1) it is necessary to insist that test functions in V(I:) have
compact support. (Readers who are familiar with the definition of an abstract
C= manifold can probably guess how to define test functions and distributions
on a manifold.)
Here is one more interesting structure theorem involving V' (sn-I) for n ~ 2.
A function that is homogeneous of degree a (remember this means f(AX) =
N' f(x) for all A > 0) can be written f(x) = Ixl'" F (x/Ix!) where F is a
function on the sphere. Could the same be true for homogeneous distributions?
It turns out that it is true if a > -n (or even complex a with Re a > -n).
Let T E v'(sn-l). We want to make sense of the symbollxl"'T (x/lxl). If T
were a function on the sphere we could use polar coordinates to write
Theorem 6.7.2
If Re a > -n then this expression defines a distribution on lffi.n that is homo-
geneous of degree a, for any T E V' (sn-l). Conversely, every distribution on
lffi.n that is homogeneous of degree a is of this form,for some T E V' (sn-l).
see what happens at the exceptional points, let's look at 0: = -no Then the
integral is
1o
00 dr
(T,<p(ry))-
r
which will certainly diverge if (T, <pC ry)) does not tend to zero as r -+ O. But
if T E V' (sn-I) has the property that (T, I) = 0 (1 is the constant function on
sn-I, which is a test function) then (T, <pC ry)) = (T, <pC ry) - <p(0) 1) and since
<p(ry) - <p(0) 1 -+ 0 as r -+ 0 (in the sense of v(sn-I) convergence) we do
have (T, <p(ry)) -+ 0 as r -+ 0, and the integral converges. Thus the theorem
Io
generalizes to say oo (T, <pc ry)) dr / r is a distribution homogeneous of degree
-n provided (T, 1) = O. However, the converse says that every homogeneous
distribution of degree -n can be written as a sum of such a distribution and a
multiple of the delta distribution.
In a sense, there is a perfect balance in this result: we pick up a one- dimen-
sional space, the multiples of 8, which have nothing to do with the sphere; at
the same time we give up a one-dimensional space on the sphere by imposing
the condition (T, 1) = O. An analogous balancing happens at 0: = -n - k, but
it involves spaces of higher dimension.
6.8 Problems
1. Show that supp <p' ~ supp <p for every test function <p on m.I . Give an
example where supp <p' =1= supp <po
20. Give an example of a test function 'P such that 'P+ and 'P- are not test
functions.
21. Show that any real-valued test function 'P can be written 'P = 'PI - 'P2
where 'PI and 'P2 are nonnegative test functions. Why does not this con-
tradict problem 20?
22. Define the Cantor function f to be the continuous, nondecreasing function
that is zero on ( -00,0], one on [I, 00), and on the unit interval is constant
on each of the deleted thirds in the construction of the Cantor set, filling
in half-way between
FIGURE 6.8
4
(so f{x) = on the first stage middle third [t,~] , f{x) = ~ and on i
the second stage middle thirds, and so on). Show that the Cantor measure
is equal to the distributional derivative of the Cantor function.
23. Show that the total length of the deleted intervals in the construction of
the Cantor set is one. In this sense the total length of the Cantor set is
zero.
24. Give an example of a function on the line for which 11'P1100 :-:; 10-6 ,
I/'P'I/oo :-:; 10- 6 but I/'P"I/oo ::::: 106 . (Hint: Try asinbx for suitable
constants a and b.)
25. Let 'P be a nonzero test function on IJR I . Does the sequence 'Pn(x) =
~'P{x - n) tend to zero in the sense of D? Can you find an example of a
distribution T for which Iim n -too (T, 'Pn) i- O? Can you find a tempered
distribution T with Iim n -too (T, 'Pn) i- O?
26. Let T be a distrbution and 'P a test function. Show that
I
00
'P(x)lxl- a dx
(Xj~
8Xk
-Xk~)
8xj
T
J
It shows that j is bounded if If (x) I dx is finite. But it turns out that something
more is true: j(~) goes to zero as ~ -+ 00. What do we mean by this? In one
dimension it means lim€--->+oo j(~) = 0 and lim€--->_oo j(~) = O. In more than
one dimension it means something slightly stronger than the limit of j(~) is
zero along every curve tending to infinity.
Definition 7.1.1
We say afunction g(~) defined on lffi.n vanishes at infinity, written lim~--->oo g(~) =
0, iffor everyE > 0 there exists N such that Ig(~)1 ~ E for alll~1 ~ N.
106
The Riemann-Lebesgue lemma 107
for any integrable function f. This is the meaning of (2) above, and we will not
go into the details because it involves integration theory. Notice that by using
our basic estimate we may conclude that in converges uniformly to i,
Lemma 7.1.2
If gn vanishes at infinity and gn -+ 9 uniformly, then 9 vanishes at infinity.
The proof of this lemma is easy. Given the error f > 0, we break it in half and
find gn such that Ign(~) - g(~)1 ~ ~ for all ~ (this is the uniform convergence).
Then for that particular gn, using the fact that it vanishes at infinity, we can find
N such that Ign(~)1 ~ ~ for I~I ~ N. But then for I~I ~ N we have
f f
<-+-=f
- 2 2
so 9 vanishes at infinity.
This first proof is conceptually clear, but it uses the machinery of the Schwartz
class S. Riemann did not know about S, so instead he used a clever trick. He
observed that you can make the change of variable x -+ x+7f/~ in the definition
of the Fourier transform (this is in one-dimension, but a simple variant works
in higher dimensions) to obtain
i(O = 1 f(x)e ix € dx
So what? Well not much, until you get the even cleverer idea to average this
with the original definition to obtain
. 21/
If(OI ~ If(x) - f(x + 7f/01 dx.
What happens when ~ ---) oo? Clearly I ---) 0, and it is not surprising that
7f ~
lim /f(X)Sinx. ~ dx = 0
f.->oo
and
lim
f.->oo
/f(x)cOSX.~dX=O
if f is integrable. In fact, these forms follow by considering }(O ± j( -~).
As an application of the Riemann-Lebesgue lemma we prove the equipartition
of energy for solutions of the wave equation. Remember that in section 5.3 we
showed that the energy
== ~el~12Ii(~W - ~k21~12cos2ktl~lli(~W
+ ~lg(~)12 + ~ cos2ktl~llg(OI2
~ ~kl~1 sin 2ktl~1 (i(~)g(~) + i(Og(~))
using the familiar trigonometric identities for double angles. The point is that
when we take the ~-integral we find that the kinetic energy is equal to the sum
of
~ (2!)n [/ cos2ktl~llg(~)12 d~
- e / cos 2ktl~1 11~liW 12 d~
-k / sin 2ktl~1 (1~li(Og(~) + 1~li(~)g(~)) d~]
which we claim must go to zero as t -+ ±oo by the Riemann-Lebesgue lemma.
This argument is perhaps simpler to understand if we consider the case n == 1
first. Consider the term
/ cos2ktl~IIY(OI2~.
We may drop the absolute values of I~I because cosine is even. Now the
requirement that the initial energy is finite implies J Ig(OI2 ~ is finite, hence
Ig(OI2 is integrable. Then limt-+±oo J cos 2kt~lg(~)12 d~ = 0 is the Riemann-
Lebesgue lemma for the Fourier cosine transform of Ig(~W. Similarly, we
must have la(~)12 integrable, hence the vanishing of J cos 2kt~la(~W d~ as
110 Fourier Analysis
t ~ ±oo. Finally, the function a(Og(~) is also integrable (this follows from
the Cauchy-Schwartz inequality for integrals,
r
JlRn cos2ktl~llg(~Wd~ = Jo{'X> cos2ktr J (r Sn-I
Ig(rw)1 2 d!.JJ) r n - I dr.
The fact that f Ig(~W d~ is finite is equivalent to r n - I fsn-I Ig(rwW d!.JJ being
integrable as a function of the single variable r, and then the one-dimensional
Riemann-Lebesgue lemma gives
pOSItive, it is easy to show that their convolution decays exactly at the rate
1~1,8-n. In fact we have the estimate
~ C r
11111'5.1
I~ _1JI,8-ne-11112/2 d1J
These examples (and the general negative result) show that the Riemann-
Lebesgue lemma is best possible, or sharp, meaning that it cannot be improved,
at least in the direction we have considered. Of course all such claims must be
taken with a grain of salt, because there are always other possible directions in
which a result can be improved. It is also important to realize that it is easy to put
conditions on I that will imply specific decay rates on the Fourier transform;
however, as far as we know now, all such conditions involve some sort of
smoothness on I, such as differentiability or HOlder or Lipschitz conditions.
The point of the Riemann-Lebesgue lemma is that we are only assuming a
condition on the size of the function.
Another theorem that relates the size of a function and its Fourier transform
is the Hausdorff-Young inequality, which says
= i: eiX~e-X7]f(x)dx
We see that this is just the ordinary Fourier transform of the function e- X 7] f(x).
The trouble is, if we are not careful, this may not be well defined because e- X 7]
grows too rapidly at infinity. But there is no problem if f has compact support,
for e- X 7] is bounded on this compact set. Then it is easy to see that j( () is
analytic, either by verifying the Cauchy-Riemann equations or by computing
the complex derivative
= .rx(ixe- X7]f(x))(O·
In fact, the same reasoning shows that if f is a distribution of compact support,
we can define j( () = (j, eixC ) and this is analytic with derivative (d/ d() j( () =
(j, ixe ixC ).
What kind of analytic functions do we obtain? Observe that j(() is an
entire analytic function (defined and analytic on the entire complex plane). But
Paley-Wiener theorems 113
not every entire analytic function can be obtained in this way, because i( ()
satisfies some growth conditions. Say, to be more precise, that f is bounded
and supported on Ixl :S A. Then
because e-XT/ achieves its maximum value eA1T/1 at one of the endpoints x = ±A.
Thus the growth of li«()1 is at most exponential in the imaginary part of (.
Such functions are called entire functions of exponential type (or more precisely
exponential typeA). Note that it is not really necessary to assume that f is
bounded-we could get away with the weaker assumption that f is integrable.
We can now state two Paley-Wiener theorems.
Theorem 7.2.1
p.w. 1: Let f be supported in [-A, A] and satisfy J~A If(xW dx < 00.
Then i(O is an entire function of exponential type A, and J~oo li(~Wd~ <
00. Conversely, if F( () is an entire function of exponential type A, and
i
J~oo IF( ~W d~ < 00, then F = for some such function f.
Theorem 7.2.2
p.w. 2: Let f be a Coo function supported in [-A, A]. Then i«()
is an entire
function of exponential type A, and i(~) is rapidly decreasing, i.e., li(~)1 :S
cN(I + IW- N for all N. Conversely, if F«) is an entire function of exponential
i
type A, and F( 0 is rapidly decreasing, then F = for some such function f.
One thing to keep in mind with these and other Paley-Wiener theorems is
that since you are dealing with analytic functions, some estimates imply other
estimates. Therefore it is possible to characterize the Fourier transforms in
several different, but ultimately equivalent, ways.
The argument we gave for i( () to be entire of exponential type A is valid
under the hypotheses of P.W. 1 because we have
A ( A ) 1/2 (A ) 1/2
iA If(x)1 dx:S iA If(xWdx iA 1 dx
A ) 1/2
:S (2A)I/2 ( iA If(xW < 00
proof. In both cases we know by the Fourier inversion formula that there exists
I(x), namely
such that 1(0 = F(~). The hard step is to show that I is supported in [-A, A].
In other words, if Ixl > A we need to show I(x) = o. To do this we argue
that it is permissible to make the change of variable ~ -+ ~ + iT/ in the Fourier
inversion formula, for any fixed T/. Of course this requires a contour integration
argument. The result is
for any fixed T/. Then we let T/ go to infinity; specifically, if x > A, we let
T/ -+ -00 and if x < -A we let T/ -+ +00, so that eX'" goes to zero. The fact
that F is of exponential type A means F(~ + iT/) grows at worst like eA1.,,1, but
eX'" goes to zero faster, so
Theorem 7.2.3
P. W. 3: Let I be a distribution supported in [-A, A]. Then 1 is an entire
function satisfying a growth estimate
which is not quite what we wanted, because of the f. (We cannot eliminate f
by letting it tend to zero because the constant c depends on f and may blow
up in the process.) The trick for eliminating the f is to make f and 'Ij; depend
on 'TJ, so that the product fl'TJl remains constant. The argument for the converse
involves convolving with an approximate identity and using P.W. 2.
All three theorems extend in a natural way to n dimensions. Perhaps the
only new idea is that we must explain what is meant by an analytic function of
several complex variables. It turns out that there are many equivalent definitions,
perhaps the simplest being that the function is continuous and analytic in each
of the n complex variables separately. If f is a function on lRn supported in
Ixl ::; A, then the Fourier transform extends to en ,
since e- x .1J has maximum value e AI1J1 on Ixl ::; A (take x = -A'TJ/I'TJI). Then
P.w. 1, 2, 3 are true as stated with the ball Ixl ::; A replacing the interval
[-A,A].
For a different kind of Paley-Wiener theorem, which does not involve compact
support, let's return to one dimension and consider the factor e- X1J in the formula
for j«). Note that this either blows up or decays, depending on the sign of
X'TJ. In particular, if 'TJ ~ 0, then e- X1J will be bounded for x ~ o. So if f
is supported on [0,00), then j( () will be well defined, and analytic, on the
upper half-space 'TJ > 0 (similarly for f supported on (-00, O} and the lower
half-space 'TJ < 0).
Theorem 7.2.4
P. w: 4 Let f be supported on [0, 00) with It W
If (x dx < 00. Then j «) is an
analytic function on 'TJ > 0 and satisfies the growth estimate
Furthermore, the usual Fourier transform j(O is the limit of j(~ + i1]) as
1]
--4 0+ in the following sense:
sup
'7>0
1 If(~ + i1])1 ~ <
00
-00
2 00.
are supported in Ixl :::; kltl. To apply P.w. 3 we need first to understand how
the functions cos ktl~1 and sin ktl~l/kl~1 may be extended to entire analytic
functions. Note that it will not do to replace I~I by 1(1 since 1(1 is not analytic.
The key observation is that since both cos z and sin z I z are even functions of
z, we can take cos v'Z and sin v'ZI v'Z and these will be entire analytic functions
of z, even though v'Z is not entire (it is not single-valued). Then the desired
analytic functions are
Indeed they are entire analytic, being the composition of analytic functions «( . (
is clearly analytic), and they assume the correct values when ( is real.
To apply P.W. 3 we need to estimate the size of these analytic functions; in
particular we will show IF(()I :::; ce ktl '7l for each of them. The starting point is
the observation
and similarly
~ iv) I : :; e1vl
Isin(u
u+zv
The Poisson summation formula 117
(this requires a separate argument for lu + ivl :::; 1 and lu + ivl ~ 1). To apply
these estimates to our functions we need to write ((. ()1/2 = U + iv (this does
not determine u and v uniquely, since we can multiply both by -1, but it does
determine Ivl uniquely), and then we have IF(()I :::; celktvl for both functions.
To complete the argument we have to show Ivl :::; 1171. This follows by algebra,
but the argument is tricky. We have ( . ( = (u + iv?, which means
u2 _ v 2 = 1~12 _ 11712
uv = ~·17
by equating real and imaginary parts, and of course I~ . 171 :::; I~I 1171 so we can
replace the second equation by the inequality
For simplicity let's assume n = 1. Let y vary over the integers (call it k) and
sum:
118 Fourier Analysis
The sum on the left (before taking the Fourier transform) defines a legitimate
tempered distribution, which we can think of as an infinite "comb" of evenly
spaced point masses. The sum on the right looks like a complicated function.
In fact, it isn't. What it turns out to be is almost exactly the same as the sum
on the left! This incredible fact is the Poisson summation formula.
Of course the sum L:~-oo eik € does not exist in the usual sense, so we will
not discover the Poisson summation formula by a direct attack. We will take a
more circular route, starting with the idea of periodization. Given a function f
on the line, we can create a periodic function (of period 27r) by the recipe
00
1 00 f2rr .
Ck = 27r . L io f(x - 27rj)e-<kx dx
J=-OO 0
1 j-2rr(j-I)
L
00
= -2 f(x)e- ikx dx
7r j=-oo -2rrj
= -1
27r
1 00
-00
f(x)e-<kx
. dx
1 A
= -f(-k)
27r
00 I 00
L f(27rk) = -
27r
L j(k).
k=-oo k=-oo
The Poisson summation fonnula 119
for! E S, so we have
and
For which functions ! does this second form hold? (Of course this is the
form that Poisson discovered more than a century before distribution theory.)
Certainly for! E S, but for many applications it is necessary to apply it to more
general functions, and this usually requires a limiting argument. Unfortunately,
it is not universally valid; there are examples of functions ! for which both
series converge absolutely, but to different numbers. A typical number theory
application involves taking j to be the characteristic function of the ball Ixl ~
R. Then the right side gives the number of lattice points inside the ball or,
equivalently, the number of representations of all integers ~ R2 as the sum of
n squares. This number is, to first-order approximation, the volume of the ball.
By using the Poisson summation formula, it is possible to find estimates for the
difference.
The Poisson summation formula yields all sorts of fantastic results simply by
substituting functions ! whose Fourier transform can be computed explicitly.
For example, take a Gaussian:
F (L'YEr
8(x -1')) = cr L
1" Ef'
8(x - ,')
is the Poisson summation formula for general lattices, where the constant cr
can be identified with the volume of the fundamental domain of The idea r/.
is that we have
F(L8(X-,)) (~)=Leih
'YEr 'YEr
= L ei€.Ak = L ei(A1r€).k
kEZ n kEZ n
= (27r)n L 8(Atr~ - k)
kEZ n
Iyl : : ; b parallel to the x-axis and then project the lattice points in this strip onto
the x-axis; shown in the figure .
•
•
• •
•
•
• •
•
•
FIGURE 7.1
,2 _
The result is just L:~ I<b 8(x - 'I) where, = (,I, '2) varies over r. This
is clearly a sum of iscrete "atoms," and it will not be periodic. The Fourier
transform is I:1r219 e irle and this can be given a form similar to the Poisson
summation formula. We start with I:r eirleeir2'1 = (271"? I:r' 8(~ -,;,17 -
,~),
multiply both sides by a function g(17), and integrate to obtain
Clearly we want to choose the function g such that g(t) = x(ltl : : ; b). Since
we know J~b eits dt = 2 sin sb/ s it follows from the Fourier inversion formula
that g( s) = (sin sb) / 71" s is the correct choice. This yields
transform as a discrete set of relatively bright stars amid a Milky Way of stars
too dim to perceive individually. The trouble is that this image is not quite
accurate, because of the slow decay of g( s). The sum defining the Fourier
transform is not absolutely convergent, so our Milky Way is not infinitely bright
only because of cancellation between positive and negative stars. This difficulty
can be overcome by choosing a smoother cut-off function as you approach the
boundary of the strip. This leads to a different choice of g, one with faster
decay.
if the sets Aj are disjoint. All the intuitive examples of probability distributions
on lRn satisfy these conditions, although it is not always easy to verify them.
f f
Associated to a probability measure J.t is an integral f dJ.t or f(x) dJ.t(x)
defined for continuous functions (and more general functions as well) which
gives the "expectation" of the function with respect to the probability measure.
For continuous functions with compact support, the integral may be formed in
the usual way, as the limit of sums I: f (x j ) J.t( I j ) where x j E I j and {Ij } is a
partition of ~n into small boxes. Another way of thinking about it is to use a
"Monte Carlo" approximation: choose points YI, Y2, . .. at random according to
the probability measure and take IimN-+oo k (J(yJ) + ... + f(YN)).
A probability measure can be regarded as a distribution, namely (J.t, r.p) =
f r.p dJ.t, and as we have seen in section 6.4, it is a positive distribution, meaning
f
(J.t, r.p) ~ 0 if r.p ~ O. The condition J.t(lRn ) = 1 is equivalent to I dJ.t = 1,
which we can write (J.t, 1) = 1 by abuse of notation (the constant function 1 is not
in V, so (J.t, 1) is not defined a priori). Of course this means Iimk-+oo (J.t, 'f/;k) = 1
where 'f/;k is a suitable sequence of test functions approximating the constant
function 1. The converse statement is also true: any positive distribution
with (J.t,I) = 1 is associated to a probability measure in this way. If f
is a positive, integrable function with f f(x) dx = 1 (ordinary integration),
then there is an associated probability measure, denoted f(x) dx, defined by
J.t(A) = fA f(x) dx, whose associated integral is just f r.pdJ.t = f r.p(x)f(x) dx.
Such measures are called absolutely continuous. Other examples are discrete
Probability measures and positive definite functions 123
J
IP(x)1 :::; 1 dfL(X) = 1
and
P(O) =J I dfL(X) = 1.
If fL = f dx then P = j in the usual sense. In general, P does not vanish at
infinity, as for example '6 = 1.
It turns out that we can exactly characterize the Fourier transforms of proba-
bility measures. This characterization is called Bochner s theorem and involves
the notion of positive definiteness. Before describing it, let me review the anal-
ogous concept for matrices.
Let Ajk denote an N x N matrix with complex entries. Associated to this
matrix is a quadratic form on eN, which we define by (Au, u) = 2: j ,k Ajkujuk
for u = (u I, ... , UN) a vector in eN. We say the matrix is positive definite if
the quadratic form is always nonnegative, (Au, u) ~ 0 for all u. Note that, a
priori, the quadratic form is not necessarily real, but this is required when we
write (Au, u) ~ O. Also note that the term nonnegative definite is sometimes
used, with "positive definite" being reserved for the stronger condition
Definition 7.4.1
A continuous (complex-valued) function F on ~n is said to be positive definite
if the matrix Ajk = F(xj - Xk) is positive definite for any choice of points
124 Fourier Analysis
L LF(xj - Xk)UjUk ~ 0
j k
Theorem 7.4.2
Bochner's Theorem: A function F is the Fourier transform of a probability
measure, if and only if
1. F is continuous.
2. F(O) = 1.
3. F is positive definite.
To prove the converse, we first need to transform the positive definite con-
dition from a discrete to a continuous form. The idea is that, in the condition
defining positive definiteness, we can think of the values Uj as weights asso-
ciated to the points Xj. Then the sum approximates a double integral. The
continuous form should thus be
In fact, the continuous form of positive definiteness says exactly (F * 'P, <p) ~ 0
for every 'P E V, and the same holds for every 'P E S because V is dense in S.
Now a change of variable lets us write this as
(F,(jH 'P) ~ 0
L L e-tx;e-txioe2tXjxUUjUk.
j k
When we substitute this into our computation we can take the m-summation
out front to obtain
126 Fourier Analysis
is a good index of the amount of "spread" of the measure, hence the "uncer-
tainty" in measuring the position of the particle. We will write Var( '1jJ) to indicate
the dependence on the wave function. An important but elementary observation
is that the variance can be expressed directly as
Thus
We will not be concerned here with the constants in the theory, including the
famous Planck's constant. For this operator, the computation of mean and
variance can be moved to the Fourier transform side (this was the way we
described it in section 5.4). For the mean, we have
and so the mean of the momentum observable for ?p is the same as the mean
of the position variable for (21f)-n/2~. Similarly, the variance for momentum
is Var((27r)-n/2~).
I Note that in this section we are using the complex conjugate in the inner product, contrary to
previous usage.
128 Fourier Analysis
Var('Ij;)Var«27r)-n/2-j;) ~ n 2 /4.
Thus if we try to make Var('Ij;) very small, we are forced to make Var«27r)-n/2-j;)
large, and vice versa.
To prove the Heisenberg uncertainty principle, consider first the case n = 1.
Let us write A for the operator i( d/ dx) and B for the operator of multiplication
by x. Note that these operators do not commute. If we write [A, Bj = AB - BA
for the commutator, then we find
or simply -irA, Bj = I (the identity operator). Also, both operators are Her-
mitian, (A'Ij;" 'lj;2) = ('Ij;" A'Ij;2) by integration by parts (the i factor produces a
minus sign under complex conjugation, canceling the minus sign from integra-
tion by parts), and similarly
or
by e- iax ) without changing the position computation. Thus applying the above
argument to e- iax '1jJ(x - b) for a = A and b = B yields a wave function with
both means zero, and the above argument applied to this wave function yields
the desired Heisenberg uncertainty principle for '1jJ. Another way to see the
same thing is to observe that the operators A - AI and B - BI satisfy the same
commutation relation, so the above argument applied to these operators yields
Var(A) = Var((27r)-1/2~)
for the momentum operator A = i(d/dx).
The second important lesson that emerges from the proof is that we can also
determine exactly when the inequality in the uncertainty principle is an equality.
In the special case that both operators have mean zero, the only place we used
an inequality was in the estimate
This gives a "snapshot" of the strength of the signal in the vicinity of the time
t = b and in the vicinity of the frequency a. The rough localization achieved by
the Gaussian is the best we can do in this regard, according to the uncertainty
principle.
We can always recover the signal f from its Gabor transform, via the inversion
formula
Hermite junctions 131
To establish this formula, say for f E S (it holds more generally, in a suitable
sense) we just substitute into the right side the definition of G(a, b) (renaming
the dummy variable s):
and using the Fourier inversion formula for the s and a integrals we obtain
then
Ff =..j2;j.
From a certain point of view, the Gaussian is just the first of a sequence of func-
tions, called Hermite functions, all of them satisfying an analogous condition
of being an eigenfunction of the Fourier transform (recall that an eigenfunction
for any operator A is defined to be a nontrivial solution of Af = ).f for a con-
stant). called the eigenvalue, where nontrivial means f is not identically zero).
Because the Fourier inversion formula can be written F2cp(X) = 2'Trcp( -x) it
follows that rcp = (2'Tr?cp, so the only allowable eigenvalues must satisfy
).4 = (2'Tr)2, hence
132 Fourier Analysis
From a certain point of view, the problem of finding eigenfunctions for the
Fourier transform has a trivial solution. If we start with any function cp, then
the function
A * A = H - 1 and AA * = H + 1,
as you can easily check. Then AH = AA * A + A while H A = AA * A - A so
[A, H) = 2A and similarly A* H = A* AA* - A* while HA* = A* AA* + A*
so [A*,H) = -2A*.
So what are the possible eigenvalues A for H? Suppose H cp = Acp. Then A
is real because H is self-adjoint:
He- X2 / 2 = e- x2 / 2 + A* Ae- x2 / 2 = e- x2 / 2 .
Hermite junctions 133
Lemma 7.6.1
Suppose cp is an eigenfunction of H with eigenvalue A. Then A *cp is an eigen-
function with eigenvalue A + 2, and Acp is an eigenfunction (as long as A =I- 1)
with eigenvalue A - 2.
and similarly
hence
where the positive constants Cn are chosen so that f h n (X)2 dx = 1 (in fact Cn =
but we will not use this formula). Then Hh n = (2n + l)h n
7r- 1/ 4(2 n n!)-1/2,
by the lemma. We claim that there are no other eigenvalues, and each positive
odd integer has multiplicity one (the space of eigenfunctions is one-dimensional,
consisting of multiples of h n ). The reasoning goes as follows: if we start with
any A not a positive odd integer, then by applying a high enough power of the
annihilation operator we would end up with an eigenvalue less than 1 (note that
we never pass through the eigenvalue 1), which contradicts our observation that
1 is the bottom of the spectrum. Similarly, the fact that the eigenspace of 1 has
134 Fourier Analysis
hn(x) -
_
C
( d
n dx - x
)n e _x /2 ,
2
1= L(f, hn)hn
n=O
called the Hermite expansion of 1. The coefficients (f, h n ) satisfy the Parseval
identity
f: l(f,
n=O
hnW = 11112 dx.
The expansion is very well suited to the spaces Sand S'. We have 1 E
S if and only if the coefficients are rapidly decreasing, l(f, hn)1 ~ cN(l +
n)-N all N. Notice that (f, h n ) is well defined for 1 E S'. In that case we
have l(f, hn)1 ~ c(l + n)N for some N, and conversely L, anh n represents a
tempered distribution if the coefficients satisfy such a polynomial bound.
Now it is not hard to see that the Hermite functions are eigenfunctions for the
Fourier transform. Indeed, we only have to check (from the ping- pong table)
the behavior of the creation and annihilation operators on the Fourier transform
side, namely
F Acp = -iAFcp.
Radial Fourier transfonns and Bessel/unctions 135
Fh n = cn F(A*)n e -x 2 /2
= cn(i)n(A*)n Fe- x2 /2
= i n J2;c n (A*)n e -x 2 /2 = inJ2;hn
:tU = ikHu
(and the n-dimensional analog) which is important in quantum mechanics.
f(x) = f( -x). The Fourier transform of an even function is also even, and is
1(0 = i:
given by the Fourier cosine formula:
eix € f(x) dx
= i: (cosx~+isinx~)f(x)dx
= 21 00
f(x)cosx~dx,
the sine term vanishing because it is the integral of an odd function.
There is a similar formula for n = 3, but we have to work harder to derive it.
We work in a spherical coordinate system. If f is radial we write f(x) = f(lx!)
by abuse of notation. Suppose we want to compute 1(0,0, R). Then we take
the z-axis as the central axis and the spherical coordinates are
x = r sin ¢> cos 0
y = r sin ¢> sin 0
z = rcos¢>
° ° °
with ~ r < 00, ~ 0 ~ 27r, ~ ¢> ~ 7r and the element of integration is
dx dy dz = r2 sin ¢> dr d¢> dO (r2 sin ¢> is the determinant of the Jacobian ma-
trix 8(x, y, z)/8(r, ¢>, 0), a computation you should do if you have not seen it
before). The Fourier transform formula is then
Now the O-integral produces a factor of 27r since nothing depends on O. The
¢>-integral can also be done explicitly, since
11<
o
eiRT cos <P sin ¢>d¢> = ___e_i.R_T_CO_S_<P
zRr
/1<
0
2sinrR
iRr rR
Thus we have altogether
A
f(O, 0, R) = 47r Jo
roo ~
sinrR
f(r)r 2 dr.
Since I is radial, this is the formula for I(R) (or, if you prefer, given any ~
with I~I = R, set up a spherical coordinate system with principle axis in the
direction of ~, and the computation is identical).
Superficially, the formula for n = 3 resembles the formula for n = 1, with
the cosine replaced by the sine. But the appearance of r R in the denominator
Radial Fourier transfonns and Bessel junctions 137
1 00
If(r)lrdr < 00.
Then the Riemann-Lebesgue lemma for the Fourier sine transform of the one-
dimensional function If(r)lr implies
lim
R->oo
10
00
sinrRf(r)rdr=O.
x = r cosO
y = r sin 0
j(R, 0) = 127r 1 00
eirR cos (J f(r)r dr dO.
J:
This time there is no friendly sin 0 factor, so we cannot do the O-integration in
terms of elementary functions. It turns out that 7r e i8 cos (J dO is a new kind
of special function, a Bessel function of order zero. As the result of historical
accident, the exact notation is
j(R) = 27r 1 00
Jo(rR)f(r)rdr.
This would fit the pattern of the other two cases if we could convince ourselves
that Jo behaves something like a cosine or sine times a power.
138 Fourier Analysis
Now there are whole books devoted to the properties of J o and its cousins,
the other Bessel functions, and precise numerical computations are available.
Here I will only give a few salient facts. First we observe that by substituting
the power series for the exponential we can integrate term by term, using the
elementary fact
1 (7r 2k (2k)!
27r 10 cos BdB = 22k(k!)2
to obtain
Jo(s) = ~
27r
rf
10
27r
(iscosB)k dB
k!
k=O
l)k 2k 1 127r
= L -(2k)!
00 (
s -
27r
cos
0
2k BdB
k=O
(the integrals of the odd powers are zero by cancelation). This power series
converges everywhere, showing that Jo is an even entire analytic function. Also
Jo(O) = 1. Notice that these properties are shared by cos s and sin s / s, which
are the analogous functions for n = 1 and n = 3.
The behavior of J o( s) as s --+ 00 is more difficult to discern. First let's make
a change of variable to reduce the integral defining Jo to a Fourier transform:
t = cos B. We obtain
10(s) ~ -1-1=
V27r -=
eist (1 - t)+1/2 dt
This is exactly what we were looking for: the product of a power and a sine
function. Of course our approximate calculation only suggests this answer, but
a more careful analysis of the error shows that this is correct; in fact the error
is of order s-3/2. In fact, this is just the first term of an asymptotic expansion
(as s -+ 00) involving powers of s and translated sines.
To unify the three special cases n = 1,2,3 we have considered, we need
to give the definition of the Bessel function J o of arbitrary (at least a > -4)
order:
Jo(s) = ro SO 11
-I
(1 - t 2 )0-1/2 eist dt
140 Fourier Analysis
(the constant /01 is given by /01 = 1/J7r2CXr(0: + !». Note that when 0: = !
it is easy to evaluate
= j!S-I/2 sin s.
Using integration by parts we can prove the following recursion relations for
Bessel functions of different orders:
d
ds(sCXJcx(s)) = scxJcx_l(s)
d
ds (s-CX Jcx(s)) = _s-cx Jcx+I(S).
!,
and more generally, if 0: = k + k an integer, then J k +! is expressible as a
finite sum of powers and sines and cosines. The function s-cx J 01 (s) is clearly
an entire analytic function. Also, the asymptotic behavior of J 01 (s) as s -+ +00
is given by
Now each of the formulas for the radial Fourier transform for n = 1,2, 3 can
be written
A
feR) = Cn
1o
00
J=J(rR)
2 n
(rR)-2
2 f(r)r n - I dr
and in fact the same is true for general n. The general form of the decay rate
for radial Fourier transforms is the following: if f is a radial function that is
integrable, and bounded near the origin, then feR) decays faster than R-(!!.yl).
Many radial Fourier transforms can be computed explicitly, using properties
of Bessel functions. For example, the Fourier transform of (1 - IxI2)+. in lffi.n
is equal to
J';:+cx(R)
c(n,o:) R';:+cx
~(x) ~
if O<xS;!
if !<xS;1
[ :'
otherwise.
11---
o 112
-1
FIGURE 7.2
The support of t/J is [0, 1]. We dilate t/J by powers of 2, so t/J(2i x) is supported
on [0,2- j ] (j is an integer, not necessarily positive) and we translate the dilate
by 2- j times an integer, to obtain
f = L L (1, 'f/;j,k)'f/;j,k.
j=-oo k=-oo
We refer to this as the Haar series expansion of f. The system is called complete
if the expansion is valid. Of course this is something of an oversimplification
for two reasons:
Since both these problems already occur with ordinary Fourier series, we should
not be surprised to meet them again here.
We claim in the case of Haar functions that we do have completeness. Before
demonstrating this, we should point out one paradoxical consequence of this
claim. Each of the functions 'f/;j,k has total integral zero, f~oo 'f/;j,k(X) dx = 0,
so any linear combination will also have zero integral. If we start out with a
function f whose total integral is nonzero, how can we expect to write it as a
series in functions with zero integral?
The resolution of the paradox rests on the fact that an infinite series L: ],. k Cjk
'f/;j,k(X) is not quite the same as a finite linear combination. The argument that
IL j,k
Cjk'f/;j,k(X) dx = L Cjk I 'f/;j,k(X) dx = L Cj,k ·0= 0
j,k j,k
Hoar junctions and wavelets 143
requires the interchange of an integral and an infinite sum, and such an inter-
change is not always valid. In particular, the convergence of the Haar series
expansion does not allow term-by-term integration.
We can explain the existence of Haar series expansions if we accept the
following general criterion for completeness of an orthonormal system: the
system is complete if and only if any function whose expansion is identically
zero (Le., (f, "pj,k) = 0 for all j and k) must be the zero function (in the sense
discussed in section 2.1). We can verify this criterion rather easily in our case,
at least under the assumption that f is integrable (this is not quite the correct
assumption, which should be that Ifl2 is integrable).
So suppose f is integrable and (f, "pj,k) = 0 for all j and k. First we claim
Id/ 2 f(x) dx = O. Why? Since (f, "po,o) = 0 we know
1o
1/2 f(x) dx = 11 f(x) dx,
1/2
so
11 f(x) dx = 12 f(x) dx
so
L (f, <PO,k)<PO,k
k=-oo
L (f, 'l/Jo,k)'l/JO,k
k=-oo
for j = 0,1,2, ... adds finer and finer detail on the scale of 2- j . One of the
advantages of this expansion is its excellent space localization. Each of the Haar
functions is supported on a small interval; the larger j, the smaller the interval.
If the function f being expanded vanishes on an interval f, then the coefficients
of f will be zero for all 'l/Jj,k whose support lies in f.
A grave disadvantage of the Haar series expansions is that the Haar functions
are discontinuous. Thus, no matter how smooth the function f may be, the
approximations obtained by taking a finite number of terms of the expansion
will be discontinuous. This defect is remedied in the smooth wavelet expansions
we will mention shortly.
Another closely related defect in the Haar series expansion is the lack of
localization in frequency. We can compute directly the Fourier transform of <P:
Aside from the oscillatory factors, this decays only like I~I-l, which means it is
not integrable (of course we should have known this in advance: a discontinuous
function cannot have an integrable Fourier transform). The behavior of ~ is
similar. In fact, since
we have
N
cp(x) = L akCP(2x - k).
k=O
The coefficients ak must be chosen with extreme care, and we will not be able
to say more about them here. The number of terms, N + 1, has to be taken fairly
large to get smoother wavelets (approximately 5 times the number of derivatives
desired).
The scaling identity determines cp, up to a constant multiple. If we restrict cp
to the integers, then the scaling identity becomes an eigenvalue equation for a
finite matrix, which can be solved by linear algebra. Once we know cp on the
integers, we can obtain the values of cp on the half-integers from the scaling
identity (if x = m/2 on the left side, then the values 2x - k = m - k on the
146 Fourier Analysis
FIGURE 7.3
right are all integers). Proceeding inductively, we can obtain the values of cp at
dyadic rationals m2- j, and by continuity this determines cp( x) for all x. Then
the wavelet 'ljJ is detennined from cp by the identity
N
'ljJ(x) = ~)-1)kaN_kcp(2x-k).
k=O
cp(~) = p(~/2)cp(~/2)
Hoar functions and wavelets 147
FIGURE 7.4
where
00
<p(~) = IT p(~/2j).
j=1
only 10 or 20 terms are needed to compute rp( 0 for small values of 0 Then
1/; is expressible in
terms of rp as
1/;(0 = q(~/2)rp(0
where
N
q(O = 2:1 ""
L..(-l) k aN-k e'"k€ .
k=O
It is interesting to compare the infinite product form of rp and the direct com-
putation of rp in the case of the Haar functions. Here p(O = + e i €) = t(1
e €/2 cos ~/2, so
i
00 00
j=2 j=2
an identity known to Euler, and in special cases, Francois Viete in the 1590s.
7.9 Problems
1. Show that
lim
t-+O
1
a
b
f(x) sin(tg(x)) dx = 0
for all 'f with Ilfllp < 00 cannot hold unless! + !q = 1, by considering
• • P
the dilates of a fixed functIOn.
Problems 149
4. Let I(x) '" L cne inx be the Fourier series of a periodic function with
J0271: I/(x)1 dx < 00. Show that
lim
n-+±oo
Cn = O.
5. Suppose I/(x)1 s;; ce- a1xl . Prove that} extends to an analytic function
on the region 11m (I < a.
6. Show that the distributions
and
are supported in Ixl s;; Itl. Use this to prove a finite speed of propagation
for solutions of the Klein-Gordon equation
8 2u _ J\ 2
8t 2 - I..l.xU - m u.
17. Compute
~ (Si~tkr
using the Poisson summation formula.
18. Is a translate of a positive definite function necessarily positive definite?
What about a dilate?
19. Show that e- 1xl is positive definite. (Hint: What is its Fourier transform?)
20. Show that the product of continuous positive definite functions is positive
definite.
21. Show that if I and 9 are continuous positive definite functions on JmI,
I(x)g(y) is positive definite on ]R2.
22. Show that I*I is always positive definite, where lex) = I( -x) and
IE S.
23. Let f..l be a probability measure on [0, 27r). Characterize the coefficients
of the Fourier series of f..l in terms of positive definiteness.
24. Compute Var("p) when "p is a Gaussian. Use this to check that the uncer-
tainty principle inequality is an equality in this case.
25. Show that if a distribution and its Fourier transform both have compact
support, then it is the zero distribution.
26. If"p has compact support, obtain an upper bound for Var("p).
27. Let "pt(x) = rn/2"p(x/t), for a fixed function"p. How does Var("pt}
depend on t? How does Var("pt}Var((27r)-n/2-¢t} depend on t?
28. Show that the Hermite polynomials satisfy
~ Hn(x)tn = e2xt-t2.
L...J n!
n=O
31. Show that H~(x) = 2nHn_I(X).
32. Prove the recursion relation
33. Compute the Fourier transform of the characteristic function of the ball
Ixl ~ b in ]R3.
Problems 151
and
d
d8 (8- 0 J o (8)) = _8- 0
J o +1(8).
l:/(X-k)/(x-m)dx={ ~ ~::
if and only if j satisfies
00
l: Ij(~ + 27rkW == 1.
k=-oo
38. Show that the three families offunctions 'Pj,k(X)'l/Jj,k' (y), 'l/Jj,k(X)'Pj,k' (y),
and 'l/Jj,k(X)'l/Jj,k'(Y) for j,k, and k' varying over the integers, form a
complete orthonormal system in Jm.2, where 'I/J is the Haar function and
'P the associated scaling function (the same for any wavelet and scaling
function).
39. Show that a wavelet satisfies the vanishing moment conditions
m=O,I, ... ,M
provided
I N
q(~) = "2l:(-I)kaN_keik€
k=O
has a zero of order M + I at ~ = o.
152 Fourier Analysis
40. Let Vj denote the linear span of the functions <Pj,k(X) as k varies over the
integers, where <P satisfies a scaling identity
N
<p(x) = L ak<P(2x - k).
k=O
Show that Vj ~ Vj+l' Also show that f(x) E Vj if and only if f(2x) E
Vj+l'
41. Let <P and 'Ij; be defined by cp(~) = X[-ll',ll'](O and
153
154 Sobolev Theory and Microlocal Analysis
J If(x)iP dx. But because the situation is reversed when If(x)1 < 1, there is no
necessary relationship between the values of J If(x)iP dx for different values
of p.
The standard terminology is to call (J If(x)iP dX)l/p the £P nann of f,
written IIflip. The conditions implied by the use of the word "norm" are as
follows:
1. IIfilp ~ 0 with equality only for the zero function (positivity)
2. IIcfllp = Icl IIfilp (homogeneity)
3. IIf + gllp ~ IIfilp + IIglip (triangle inequality).
The first two conditions are evident (with f = 0 taken in the appropriate sense)
but the proof of the triangle inequality is decidedly tricky (except when p = 1)
and we will not discuss it. The case p = 00, called the Loo norm, is defined
separately by
(Exercise: Verify conditions 1, 2, 3 for this norm.) The justification for this
definition is the fact that
to get honest derivatives (n is the dimension of the space). We can state the
result precisely as follows:
Theorem 8.1.1
(L I Sobolev inequality): Let 1 be an integrable function on Jm.n. Suppose
the distributional derivatives (a/ax) a 1 are also integrable functions for all
lal ~ n. Then 1 is continuous and bounded, and
1111100 ~ C L II (a/ax)'" 1111·
lal::;n
1111100 ~ 111%·
This result looks somewhat better than the LI Sobolev inequality, but it was
purchased by two strong hypotheses: differentiability and compact support. It
is clearly nonsense without compact support (or at least vanishing at infinity).
The constant function 1 == 1 has 1111100 = 1 and 111% = O. To remedy this,
we derive a consequence of our inequality by applying it to 't/J . 1 where 't/J is
a cut-off function. Since 't/J has compact support, we can drop that assumption
about 1. As long as 1 is Coo, 't/J. 1 E V and so
II't/Jl11oo ~ II('t/Jf)'lh·
156 Sobolev Theory and Microloeal Analysis
Now
('ljJf)' = 'IjJ' f + 'ljJf'
and
8f
-8 (x,y) =
jY -828f
8 (x,t)dt
x -00 x y
for any fixed x and y. We then integrate in the x-variable to obtain
f(x,y) = j " jY
-00 -00
82f
-88 (s,t)dtds.
x y
We can then estimate
If(x,y)l:::; LooL
2
oo 8 f
{'" (Y 8x8y(s,t) dtds ! !
This argument required compact support for f, and we remove this hypothesis
158 Sobolev Theory and Microloeal Analysis
This is our L' Sobolev inequality for n = 2. Notice that it is slightly better than
advertised, because it only involves the mixed second derivative 8 2 f 18x8y, and
neither of the pure second derivatives 8 2 JI 8x 2, 8 2 f 18y2. The same argument
in n dimensions yields the L' Sobolev inequality
where A is the set of all multi-indexes a = (a" . .. , an) where each aj assumes
the value 0 or 1.
The proof of the rest of the theorem in n dimensions is the same as before,
except that we have to trade in n derivatives because of the nth-order derivative
on the right in the L' Sobolev inequality.
The L' Sololev inequality is sharp. It is easy to give examples of functions
with fewer derivatives in L' which are unbounded. Nevertheless, if we have
fewer than n derivatives to trade in, we can obtain the same conclusion if these
derivatives have finite LP norm with larger values of p. In fact the rule is that
we need more than nip derivatives. We illustrate this with the case p = 2.
Theorem 8.1.2
( L 2 Sobolev inequality): Suppose
is finite for all a satisfying lal ~ N for N equal to the smallest integer greater
than nl2 (so N = ~ if n is odd and N = ~ if n is even). Then f is
continuous and bounded, with
More generally, if
We can give a rather quick proof using the Fourier transform. We will show
i
in fact that is integrable, which implies that f is continuous and bounded by
the Fourier inversion formula. Since
the hypotheses of the theorem imply I 1~"12Ii(~)12 d~ is finite for all lal ~ N.
Just taking the cases ~" = 1 and ~" = ~f and summing we obtain
We obtain
We have already seen that the first integral on the right is finite. We claim that
the second integral on the right is also finite, because 2N > n. The idea is that
2:';=1 l~jl2N ~ cl~12N, so
on all slightly smaller sets V, for lal ~ N +k. The reason is that for each x E U
we may apply the £2 Sobolev inequality to 'Ij; 1 where 'Ij; E V is supported on
V and 'Ij; == 1 on a neighborhood of x. Similarly, if
for all R < 00 and all lal ~ N + k, then 1 is C k on lm.n . Of course the same
remark applies to the LI Sobolev inequality.
The statement of the LP Sobolev inequality for p > 1 is similar to the case
p = 2, but the proof is more complicated and we will omit it.
Theorem 8.1.3
(LP Sobolev inequality): Let Np be the smallest integer greater than nlp,for
fixed p > 1. IfII (a I ax)'" 1II p is finite for all la I ~ N p , then 1 is bounded and
continuous and
The LP Sobolev inequalities are sharp in the sense that we cannot eliminate
the requirement Np > nip, and we should emphasize that the inequality must
be strict. But there is another sense in which they are flabby: if we are required
to trade in strictly more than nip derivatives, what have we gotten in return for
the excess Np - nip? It turns out that we do get something, and we can make
a precise statement as long as (3 = Np - nip < 1. We get Holder continuity of
order (3:
I/(x) - l(y)1
Ix - yjI/2
~c "
L.-
I (~)a III .
ax
lalsN2 2
and estimate
using the Cauchy-Schwartz inequality. Since the first integral is finite (domi-
nated by
as before), it suffices to show that the second integal is less than a multiple of
Ix - yl· To do this we use
(this is actually a terrible estimate for small values of I~I, but it turns out not to
matter). Then to estimate
162 Sobolev Theory and Microloeal Analysis
we break the integral into two pieces at I~I = Ix - yl-'. If I~I ~ Ix - yl-' we
dominate le- ix .€ - e- Y·€1 2 by 4, and obtain
1 le-ix.€ - e-y·€121~I-n-' d~
1€1:::::lx-yl-l
~ 41 1€1:::::lx-yl-l
I~I-n-' d~ = elx - yl
after a change of variable ~ --+ Ix - YI-'~ (the integral converges because
-n - 1 < -n). On the other hand, if I~I ~ Ix - yl-' we use the mean-value
theorem to estimate
hence
1 le-ix.€ - e-iY·€121~I-n-' d~
1€1::;lx-yl-l
(homology, Hardy space, hyperbolic space, for example, all with a good excuse
for the H). The L in our notation stands for Lebesgue. Other common notation
for Sobolev spaces includes Wp,k, and the placement of the two indices p and
k is subject to numerous changes.
Definition 8.2.1
The Sobolev space L~ (Jm.n) is defined to be the space of functions on Jm.n such
that
is finite for all lal ::; k. Here k is a nonnegative integer and 1 ::; p < 00. The
Sobolev space norm is defined by
Theorem 8.2.2
LHJm.n ) ~ L~(Jm.n) provided 1 ::; p < q < 00 and k - nip ~ m - nlq. We
have the coo responding inequality II III L~ ::; ell III q;-
The theorem is most interesting in the case when we have equality k - nip =
m - nlq, in which case it is sharp.
164 Sobolev Theory and Microlocal Analysis
s
Sobolev spaces 165
r iRnr lJ(x + y) -
iRn f(x)j2 1
Y
1~~2S dx
in finite. The idea is that this integral is equal to a multiple of
I li(~WI~12s d~
and it is not hard to see that J li(~)j2(I + 1~12)s d~ is finite if and only if
J li(~)12 d~ and J li(~)j21~12s d~ are finite.
We begin with the observation
The homogeneity follows by the change of variable y ---+ I~I-Iy and the fact
that the integal is radial (rotation invariant) follows from the observation that
the same rotation applied to y and ~ leaves the dot product y . ~ unchanged.
For s > 1 we write s = k + s' where k is an integer and 0 < s' < 1. Then
functions in L; are functions in L~ whose derivatives of order k are in L;" and
then we can use the previous characterization.
An interpretation of L; for s negative can be given in terms of duality, in the
same sense that V' is the dual of V. If say s > 0, then L:'s is exactly the space
of continuous linear functionals on L;. If / E L:'s and 'P E L;, the value of
the linear functional (f, 'P) can be expressed on the Fourier transform side by
Then we have
82
... +8-
2
~=-+
8XI 8x~
p= L aa (:x)a
lalsm
where the coefficients aa are constant (we allow them to be complex). The
number m, the highest order of the derivatives involved, is called the order of
the operator. As we have seen, the Fourier transform of Pu is obtained from u
by multiplication by a polynomial,
where
p(O = L aa( _i~)a.
lalsm
This polynomial is called the full symbol of P, while
(Pm)(~) = L aa(-i~)a
lal=m
is called the top-order symbol. (To make matters confusing, the term symbol
is sometimes used for one or the other of these; I will resist this temptation).
For example, the Laplacian has _1~12 as both full and top-order symbol. The
operator I - ~ has full symbol 1 + 1~12 and top-order symbol 1~12. Notice that
the top-order symbol is always homogeneous of degree m.
We have already observed that the nonvanishing of the full symbol is ex-
tremely useful, for then we can solve the equation Pu = f by taking Fourier
transforms and dividing,
A 1 A
Definition 8.3.1
An operator of order m is called elliptic if the top-order symbol Pm (~) has no
real zeroes except ~ = O. Equivalently, if the full symbol satisfies Ip( 0 I ~ cl~lm
for I~I ~ A, for some positive constants c and A.
the minimum of Ip(~)1 on the sphere I~I = 1 (since the sphere is compact, the
minimum value is assumed, so CI > 0). Since p(~) = Pm(~) + q(~) where q
is a polynomial of degree:::; n - 1, we have Iq(~)1 :::; c21~lm-1 for I~I ;::: 1, so
Ip(OI ~ IPm(~)1 - Iq(~)1 ~ tcd~1 if I~I ~ A for A = tC2/CI. Conversely, if
Ip(~)1 ~ cl~1 for I~I ~ A then a similar argument shows IPm(~)1 ;::: tcl~1 for
I~I ;::: A', and by homogeneity Pm(~) =f:. 0 for ~ =f:. o.
From the first definition, it is clear that ellipticity depends only on the highest
order derivatives; for an mth order operator, the terms of order:::; m - 1 can be
modified at will. We will now give examples of elliptic operators in terms of
the highest order terms. For m = 2, P = Llal=2 aa (8/ 8x) a with aa real will
be elliptic if the quadratic form Llal=2 aa~a is positive (or negative) definite.
For n = 2, this means the level sets Llal=2 aa~a = constant are ellipses,
hence the etymology of the term "elliptic." For m = 1, the Cauchy-Riemann
operator
8 1(88x +z.8)
8z = 2" 8y
in Jm.2 = <C is elliptic, because the symbol is -;i (~ + i'TJ). It is not hard
to see that for m = 1 there are no examples with real coefficients or for
n ~ 3. Thus both harmonic and holomorphic functions are solutions to ho-
mogeneous elliptic equations Pu = O. For our last example, we claim that
when n = 1 every operator is elliptic, because Pm(~) = am(-i~)m vanishes
only at ~ = O.
For elliptic operators, we would like to implement the division on the Fourier
tansform side algorithm for solving Pu = f. We have no problem with division
by p(O for large ~, but we still have possibly small problems if p(~) has zeroes
for small f Rather than deal with this problem head on, we resort to a strategy
that might be cynically described as "defining the problem away." We pose an
easier problem that can be easily solved by the technique at hand. Instead of
seeking a fundamental solution, which is a distribution E satisfying P E = 8
(so E * f solves Pu = f, at least for f having compact support, or some other
condition that makes the convolution well defined), we seek a parametrix (the
correct pronunciation of this almost unpronouncable word puts the accent on
the first syllable) which is defined to be a distribution F which solves PF ~ 8,
for the appropriate meaning of approximate equality. What should this be?
Let us write P F = 8 + R. Then the remainder R should be small in some
sense, you might think. But this is not quite the right idea. Instead of "small," we
want the remainder to be "smooth." The reason is that we are mainly interested
in the singularities of solutions, and convolution with a smooth function produces
a smooth function, hence it does not contribute at all to the singularities. This
point of view is one the key ideas in microlocal analysis and might be described
as the doctrine of micro local myopia: pay attention to the singularities, and
other issues will take care of themselves.
Elliptic partial differential equations (constant coefficients) 169
with kernel 'Ij;(x)R(x - y)cp(y), and Rg always has support in V since 'Ij;(x) has
support in V. We are temped to solve the equation g + Rg = 'Ij; f by the pertur-
bation series g = L,~o(-I)k(R)k('Ij;f). In fact, if we take the neighborhoods
U and V sufficiently small, this series converges to a solution (and the solution
is supported in V). The reason for this is that the kernel 'Ij;(x )R(x - y )cp(y)
actually does become small. The exact condition we need is
For a more subtle application of the parametrix we return to the question of lo-
cation of singularities. If Pu = f and f is singular on a closed set K, what can
we say about the singularities of u? More precisely, we say that a distribution
f is Coo on an open set U if there exists a Coo function F on U such that
(f, cp) = J F(x)cp(x) dx for all test functions supported in U, and we define the
singular support of f (written sing supp (f) to be the complement of the union
of all open sets on which f is Coo. (Notice the analogy with the definition of
"support," which is the complement of the union of all open sets on which f
vanishes.) By definition, the singular support is always a closed set. When we
ask about the location of the singularities of a distribution, we mean: what is
its singular support?
So we can rephrase our question: what is the relationship between sing supp
Pu and sing supp u? We claim that there is one obvious containment:
sing supp Pu ~ sing supp u.
The reason for this is that if u is Coo on U, then so is Pu, so the comple-
ment of sing supp Pu contains the complement of sing supp u; then taking
complements reverses the containment. However, it is possible that applying
the differential operator P might "erase" some of the singularities of u. For
example, if u(x, y) = hey) for some rough function h, then u is not Coo on
any open set, so sing supp u is the whole plane. But (a/ax)u = 0 so sing
supp (a/ax)u is empty. This example is made possible because a/ax is not
elliptic in Jm.2 . For elliptic operators we have the identity sing supp Pu = sing
supp u. This property is called hypoellipticity (a very unimaginative term that
means "weaker than elliptic"). It says exactly that every solution to Pu = f
wiJI be smooth whenever f is smooth (f is Coo on U implies u is Coo on
U). In particular, if f is Coo everywhere then u is Coo everywhere. Note that
this implies that harmonic and holomorphic functions are Coo. Even more, it
says that if a distribution satisfies the Cauchy-Riemann equations in the distribu-
tion sense, then it corresponds to a holomorphic function. This can be thought
of as a generalization of the classical theorem that a function that is complex
differentiable at every point is continuously differentiable.
To prove hypoellipticity we have to show that if Pu = f is Coo on an open
set U, then so is u. Now we wiJI immediately localize by multiplying u by
cp E V which is supported in U and identically one on a slightly smaller open
set V. Observe that cpu = u on V so P(cpu) = 9 with 9 = f on V. So 9 is
Coo on V, and we would like to conclude cpu is Coo on V, which implies u is
Coo on V (because u = cpu on V). Since V can be varied in this argument, it
follows that u is Coo on all of U. The point of the localization is that cpu and
9 have compact support.
Reverting to our old notation, it suffices to show that if u and f have compact
support and Pu = f, then f being Coo on an open set U implies u is Coo on
U. Since u and f have compact support, we can convolve with the parametrix
F to obtain F * Pu = F * f. Since u has compact support, we have F * Pu =
Elliptic partial differential equations (constant coefficients) 171
IxI2NF(x) = ~ { e-ix·e(_~e)N(p(O-I)d~
(211") Jlel~A
(we are ignoring the boundary terms at I~I = A which are all Coo functions).
Now from the fact that P is elliptic we can conclude that
Theorem 8.3.2
For P an elliptic operator of order m, if u E L2 and Pu E L~ then u E L~+m
with
To prove this a priori estimate we work entirely on the Fourier transform side.
We are justified in taking Fourier transforms because u E L2. Now to show
u E L~+m we need to show that J lu(~W(1 + 1~12)k+m d~ is finite. We break
the integral into two parts, I~I ~ A and I~I ~ A. For I~I ~ A we use the fact
that u E L2 and the bound (1 + 1~12)k+m ~ (1 + A2)k+m to estimate
r
JIt;I~A
lu(~W(1 + 1~12)k+m d~
~ (1 + A2)k+m r
JIt;I~A
lu(~)12 d~
~ cllulli2.
For I~I ~ A we use Pu E L~ and Ip(OI ~ cl~lm to estimate
r
J1t;I?A
lu(~W(1 + 1~12)k+m d~ ~ c r
J1t;I?A
Ip(Ou(~W(1 + 1~12)k d~
Together, these two estimates prove that u E L~+m and give the desired a priori
estimate.
The hypothesis u E L2 perhaps seems unnatural, but the theorem is clearly
false without it. There are plenty of global harmonic functions, ~u = 0 so
~u E L~ for all k, but none of them are even in L2. You should think of
u E L2 as a kind of minimal smoothness hypothesis (it could be weakened to
just u E L=-N for some N); once you have this minimal smoothness, the exact
Sobolev space L~+m for u is given by the exact Sobolev space L~ for Pu, and
we get the gain of m derivatives as predicted.
The a priori estimates can also be localized, but the story is more complicated.
Suppose u E L2, Pu = f, and f is in L~ on an open set U (just restrict all
integrals to U). Then u is in L~+m on a smaller open set V. The proof of
Elliptic partial differential equations (constant coefficients) 173
this result is not easy, however, because when we try to localize by multiplying
u by cp, we lose control of P(cpu). On the set where cp is identically one we
have P( cpu) = Pu = f, but in general the product rule for derivatives produces
many terms, P( cpu) = cpPu + terms involving lower order derivatives of u.
The idea of the proof is that the lower order terms are controlled by Pu also,
but we cannot give the details of this argument.
We have seen that the parametrix, and the ideas associated with it, yield a
lot of interesting information. However, I will now show that it is possible to
construct a fundamental solution after all. We wiII solve Pu = f for f E V as
u = F- 1(p-I j) with the appropriate modifications to take care of the zeroes
of p. It can be shown that the solution is of the form E * f for a distribution E
(not necessarily tempered) satisfying PE = 8, and then that u = E * f solves
Pu = f for any f E £'. Thus we really are constructing a fundamental solution.
We recall that the Paley-Wiener theorem tells us that j is actually an analytic
function. We use this observation to shift the integral in the Fourier inversion
formula into the complex domain to get around the zeroes of p. We write
e = (6,···, ~n) so ~ = (~I' e)· Then we wiII write the Fourier inversion
formula as
-B -A A B
FIGURE 8.1
converges absolutely. Indeed, for the portion of the integral that coincides with
the real axis we may use the estimate Ip(O-11 ~ cl~l-m and the rapid decay
of j(~) to estimate
for any N. For the remainder of the contour (the semicircle) we encounter no
zeroes of p, so the integrand is bounded and the path is finite.
The particular contour we choose will depend on e, so we denote it by ,Ce).
The formula for u is thus
When lei ~ A we choose ICe) to be just the real axis, while for Ifl ~ A
we choose ,Ce) among m + 1 contours as described above with B = A, A +
1, ... , A + m. Since the semicircles are all of distance at least one apart, at least
!
one of them must be at a distance at least from all the m zeroes of pC (, e).
We choose that contour (or the one with the smallest semi-circle if there is a
choice). This implies not only that pC (, e) is not zero on the contours chosen,
but there is a universal bound for IpC(, e)-II over all the semicircular arcs, and
there is a universal bound for the lengths of these arcs (they are chosen from a
finite number) and for the terms fC('e) and e- ix1 ( along these arcs. Thus the
integral defining uCx) converges and may be differentiated with respect to the x
variables. When we do this differentiation to compute Pu, we produce exactly
a factor of pC (,e). which cancels the same factor in the denominator:
Observe that we were able to shift the contour ,(e) only after we had applied
P to eliminate p in the denominator. For the original integral defining u, the
zeroes of p prevent you from shifting contours at will. Because we move into
the complex domain, the fundamental solution E constructed may not be given
by a tempered distribution. More elaborate arguments show that fundamental
solutions can be constructed that are in S', in fact for any constant coefficient
operator (not necessarily elliptic). Although we have constructed a fundamen-
tal solution explicitly, the result is too complicated to have any algorithmic
significance, except in very special cases.
with the coefficients a,,(x) being functions. We will assume a,,(x) are Coo
functions. (There is life outside the Coo category, but it is considerably harder.)
One very naive way of thinking of such an operator is to "freeze the coefficients."
We fix a point x and evaluate the functions a,,(x) at x to obtain the constant
coefficient operator
L
l"l~m
a,,(x) (:x)"
As x varies we obtain a family of constant coefficient operators, and we are
tempted to think that the behavior of the variable coefficient operator should
be an amalgam of the behaviors of the constant coefficient operators. This
kind of wishful thinking is very misleading; not only does it lead to incorrect
conjectures, but it makes us overlook some entirely new phenomena that only
show up in the variable coefficient setting. However, for elliptic operators this
approach works very well.
176 Sobolev Theory and Microlocal Analysis
(such an estimate always holds for fixed x, and in fact for x in any compact
set).
Suppose we pursue the frozen coefficient paradigm and attempt to construct
a parametrix. In the variable coefficient case we cannot expect convolution
operators, so we will define a parametrix to be an operator F such that
PFu = u+Ru
FPu = u+R1u
where RI is an operator of the same type as R. We could try to take for F the
parametrix for the frozen coefficient operator at each point:
where A(x) is chosen large enough that p(x, 0 has no zeroes in I~I ~ A(x). It
turns out that this guess is not too far off the mark.
The formula we have guessed for a parametrix is essentially an example of
what is called a pseudodifferential operator (abbreviated 'ljJDO). By definition,
a 'ljJDO is an operator of the form
classical symbols. (I will not attempt to justify the use of the term "classical" in
mathematics; in current usage it seems to mean anything more than five years
old.) A classical symbol of order r (any real number) is a C= function O"(x, 0
that has an asymptotic expansion
In other words, the difference decays at infinity in ~ as fast as the next order
in the asymptotic expansion. (There is also a related estimate for derivatives,
with the rate of decay increasing with each ~ derivative, but unchanged by x
derivatives.)
The simplest example of a classical symbol is a polynomial in ~ of degree r
(r a postive integer), in which case 0" r-k (x,~) is just the homogeneous terms
of degree r - k. The asymptotic sum is just a finite sum in this case, and we
have equality
1
(27r)n J .
O"(x,~)u(~)e-tx.€ d~ = L aa(X) (a )a
ax u(x).
lal:=;r
Thus the class of pseudodifferential operators is a natural generalization of the
differential operators.
The theory of pseudodifferential operators embraces the doctrine of microlocal
myopia in the following way: two operators are considered equivalent if they
differ by an integral operator with C= kernel (J R(x, y) u(y) dy for R a C=
function). Such an operator produces a C= output regardless of the input (u
may be a distribution) and is considered trivial. The portion of the symbol
corresponding to small values of ~ produces such a trivial operator. For this
reason, in describing a "pDO one frequently just specifies the symbol for I~I ~ I.
(In what follows we will ignore behavior for I~I ~ I, so when we say a function
is "homogeneous," this means just for I~I ~ 1.) Also, from this point of view,
178 Sobolev Theory and Microlocal Analysis
L a!1(8)'"
8x p(x,~) (8)'"
-i 8~ q(x, ~).
'"
Note that (8/ 8x)'" p(x, 0 is a symbol of order r, and
cancellation that the kernel actually exists and is C=, if the integral is
suitably interpreted (as the Fourier transform of a tempered distribution,
for example). As x approaches y, the kernel may blow up, although it
can be interpreted as a distribution on JR2n .
4. (lnvariance under change of variable) If h : JRn -4 JRn is C= with a
C= inverse h- l , then (P(u 0 h- I )) 0 h is also a '1jJDO of the same
order whose symbol is (T(h(x), (h'(x)tr)-IO + lower order terms. This
shows that the class of pseudodifferential operators does not depend on
the coordinate system, and it leads to a theory of '1jJDO's on manifolds
where the top-order symbol is interpreted as a section of the cotangent
bundle.
5. (Closure under adjoint) The adjoint of a '1jJDO is a '1jJDO of the same
order whose symbol is (T(x, ~)+ lower order terms.
6. (Sobolev space estimates) A '1jJDO of order r takes L2 Sobolev spaces L;
into L;_r' at least locally. If u E L;and u has compact support, then
'1jJPu EL;_r if'1jJ E V. With additional hypotheses on the behavior of
the symbol in the x-variable, we have the global estimate
L L?:
= 1(8)'" q-m-k(X,~) (-i 8~8)'" Pm-j(X,~)
a! 8x
m
where each term is homogeneous of degree -k - j -Ietl. Thus there are only a
finite number of terms of any fixed degree of homogeneity, the largest of which
being zero, for which there is just the one term q-m(X,~)Pm(X,~). Since we
want the symbol to be 1, which has homogeneity zero, we set q-m(x,~) =
I/Pm(x,O, which is well defined because P is elliptic, and has the correct
homogeneity -m. All the other terms involving q-m have lower order homo-
geneity. We next choose q-m-I to kill the sum of the terms of homogeneity
-1. These terms are just
q-m-IPm + q-mPm-1 + L
lal=1
(!) a q-m ( -i :~) a Pm,
and we can set this equal to zero and solve for q-m-I (with the correct homo-
geneity) again because we can divide by Pm. Continuing this process induc-
tively, we can solve for q-m-k by setting the sum of terms of order -k equal
to zero, since this sum contains q-m-kPm plus terms already determined.
This process gives us an asymptotic expansion of q(x, ~). By the symbolic
completeness of 'ljJDO symbols, this implies that there actually is a symbol
with this expansion and a corresponding 'ljJDO Q. By our construction QP has
symbol 1, hence it is equivalent to the identity as desired. A similar computation
shows that PQ has symbol 1, so Q is our desired parametrix.
Once we have the existence of the parametrix, we may deduce the equivalents
of the properties of constant coefficient elliptic operators given in section 8.3 by
essentially the same reasoning. The local existence follows exactly as before,
because once we localize to a sufficiently small neighbohood, the remainder term
becomes small. The hypoeIlipticity follows from the pseudo-local property of
the parametrix. Local a priori estimates follow from the Sobolev space estimates
for 'ljJDO's (global a priori estimates require additional global assumptions on the
coefficients). However, there is no analog of the fundamental solution. Elliptic
equations are subject to the "index" phenomenon: existence and uniqueness in
expected situations may fail by finite-dimensional spaces. We will illustrate this
shortly.
Most situations in which elliptic equations arise involve either closed mani-
folds (such as spheres, tori, etc.) or bounded regions with boundary. If n is a
bounded open set in Jm.n with a regular boundary r, a typical problem would
be to solve Pu = f on n with certain boundary conditions, the values of u
and some of its derivatives on the boundary, as given. Usually, the number of
boundary conditions is half the order of P, but clearly this expectation runs
into difficulties for the Cauchy-Riemann operator. We will not be able to dis-
cuss such boundary value problems here, except to point out that one way to
approach them is to reduce the problem to solving certain pseudodifferential
equations on the boundary.
One example of a boundary value problem in which the index phenomenon
shows up is the Neumann problem for the Laplacian. Here we seek solutions of
Hyperbolic operators 181
dvf(x) = lim
h-+O
-hI (f(x + hv) - f(x)).
If VI, •.• , Vn is any linearly independent basis of lm.n (not necessarily orthogonal),
then the chain rule implies that any first-order partial derivative is a linear
combination of dvl , ... , dvn . Thus any linear partial differential operator of
order m can be written as LI"'lsm b",(x)(dv)'" where
At a given point x, the characteristic directions V are those for which b", (x) = 0
if 0: = (m, 0, ... ,0) and VI = v, while the noncharacteristic directions are those
for which b", (x) i=- O. It is easy to see by the chain rule that this definition only
depends on the direction v = VI and not on the other directions V2, . .. , Vn in
the basis.
Still, it is rather awkward to have to recompute the operator in terms of a new
basis for every direction v in order to settle the issue. Fortunately, there is a
much simpler approach. We claim that the characteristic directions are exactly
those for which the top-order symbol vanishes, Pm(x, v) = O. This is obvious if
v = (1,0, ... ,0), becausepm(x,v) = LI"'I=ma",(x)(-iv)'" and for this choice
of v we have (-iv)'" = 0 if any factor V2, ... , Vn appears to a nonzero power.
Thus the single term corresponding to 0: = (m, 0, ... ,0) survives (lower order
terms are not part of the top-order symbol), Pm(X,V) = (-i)ma",(x) and we
are back to the previous definition. More generally, if
where f (defined on ]Rn) and go, g" ... , gm (defined on S) are called the Cauchy
data. The rationale for giving this sort of data is the following. Once we know
the value of u on S, we can compute all tangential first derivatives, so the
only first derivative it makes sense to specify is the normal one. Once all first
derivatives are known on S, we can again differentiate in tangential directions.
Thus, among second derivatives, only (8 2 /8n 2 )u remains to be specified on S.
If we continue in this manner, we see that there are no obvious relationships
among the Cauchy data, and together they determine all derivatives or order
:::; m - 1 on S. Now we bring in the differential equation. Because S is
noncharacteristic, we can solve for (8/ 8n) muon S in terms of derivatives
already known on S. By repeatedly differentiating the differential equation, we
eventually obtain the value of all derivatives of u on S, with no consistency
conditions arising from two different computations of the same derivative. In
other words, the Cauchy data gives exactly the amount of information, with no
redundancy, needed to determine u to infinite order on S.
There is still the issue of whether knowing u to infinite order on Sallows
us to solve the differential equation off S. In the real analyticcase this is the
content of the famous Cauchy-Kovalevska Theorem. We have to assume that
all the Cauchy data, and the coefficients of P as well, are real analytic functions
(that means they have convergent power series expansions locally). The theorem
says that then there exist solutions of the Cauchy problem locally (in a small
enough neighborhood of any point of S), the solutions being real analytic and
unique. This is an example of a theorem that is too powerful for its own good.
Because it applies to such a large class of operators, its conclusions are too
184 Sobolev Theory and Microlocal Analysis
weak to be very useful. In general, the solutions may exist in only a very small
neighborhood of S, it may not depend continuously on the data, and it may fail
to exist altogether if the data is not analytic (even for Coo data).
For this reason, it seems worthwhile to try to find a smaller class of operators
for which the Cauchy problem is well-posed. (By well-posed we mean that a
solution exists, is unique, and depends continuously on the data.)
This is the structural motivation for the class of hyperbolic equations. Of
course, to be useful, the definition of "hyperbolic" must be formalistic, only
involving properties of the symbol that are easily checked. In order to simplify
the discussion, we begin by assuming that the operator has constant coefficients
and contains only terms of the highest order m. In other words, the full symbol
is a polynomial p(O that is homogeneous of degree m, hence equal to its top-
order symbol.
To give the definition of hyperbolic in this context, we look at the polynomial
in one complex variable z, p( ~ + zv) for each fixed ~ and v a given direction.
This is a polynomial of degree ::; m, and is exactly of degree m when v is
noncharacteristic, the coefficient of zm being p( v). In that case we can factor
the polynomial
m
where Zj(O are the complex roots. We say P is hyperbolic in the direction v,
if v is a noncharacteristic direction and all the roots Zj(~) are real, for all ~.
The prototype example is the wave equation, which is hyperbolic in the t-
direction. The notation is slightly different, in that p( T, 0 = _T2 + k21~12, so
V = (1,0, ... ,0) is the t-direction. Then
u(x) = _1_
(271")n
Jj(~ ++
p(~
iAV) e-ix·(Hi,Xv) ~
iAV)
for A i- 0 real is well defined and solves Pu = f for any f E V. The idea is
that we do not encounter any zeroes of the symbol, and in fact we can estimate
p(~ + iAV) from below; since p(~ + iAV) = p(v) I1j:l (iA - Zj(~)) and Zj(~) is
real so liA-Zj(OI :::: IAI we obtain Ip(~+iAV)1 :::: Ip(v)IIAlm. This estimate and
the Paley-Wiener estimates for j(~ + iAV) guarantee that the integral defining u
converges and that we can differentiate with respect to x any number of times
inside the integral. Thus
Pu(x) = _1_
(271" )n
Jj(~ + iAv)e-ix·(Hi,Xv) d~.
Hyperbolic operators 185
By using the Cauchy integral formula we can shift the contour (one dimension
at a time) back to JRn, and then Pu = f by the Fourier inversion formula. In
fact, the same Cauchy integral formula argument can be applied to the integral
defining u to shift the value of A, provided we do not try to cross the A = 0
divide (where there are zeroes of the symbol). Thus our construction really
only produces two different fundamental solutions, corresponding to A > 0 and
A < o. We denote them by E+ and E-.
These turn out to be very special fundamental solutions. First, by a variant of
the Paley-Wiener theorem, we can show that E+ is supported in the half-space
x . v :::: 0, and E- is supported in the other half-space x . v :::; O. (By "support"
of E± we mean the support of the distributions that give E± by convolution.)
But we can say even more. There is an open cone r that contains the direction
v, with the property that P is hyperbolic with respect to every direction in r.
The fundamental solutions E± are the same for every v E r, so in fact the
support of E+ is contained in the dual cone r* defined by {x : x . v :::: 0 for
all v E r}, and the support of E- is contained in - r*. I use the word cone
here to mean a subset of JRn that contains an entire ray AX(A > 0) for every
point x in the cone. It can also be shown that the cones rand r* are convex.
The cone r* is referred to as the forward light cone, (or sometimes this term
is reserved for the boundary of r*, and r* is called the interior of the forward
light cone).
There are two ways to describe the cone r. The first is to take JRn and
remove all the characteristic directions; what is left breaks up into a number
of connected components. r is the component containing v. For the other
description, we again look at the zeroes of the polynomial p( ~ + zv), which are
all real. Then ~ is in r if all the real roots are negative (note that v is in r
according to this definition, since p(v + zv) = p(v)(z + 1)m which has z = -1
as the only root). It is not obvious that these two descriptions coincide nor that
P is hyperbolic with respect to all directions in r, but these facts can be proved
using the algebra of polynomials in JRn.
The fact that r is an open cone implies that the dual cone r* is proper, mean-
ing that it is properly contained in a half-space. In particular, this means that if
we slice the light cone by intersecting it with any hyperplane perpendicular to
v, we get a bounded set. This observation will have an interesting interpretation
concerning solutions of the hyperbolic equation.
We can illustrate these ideas with the example of the wave equation. The
characteristic directions were given by the equation T2 - k21~12 = 0 in JRn+1 (T E
JRI , ~ E JRn). The complement breaks up into 3 regions (or 4 regions if n = 1).
r is the region where T > 0 and kl~1 < ITI. The other two regions are -r(T < 0
and kl~1 < ITI) and the outer region where kl~1 > ITI.
Notice how the second description of r works in this example. The two
roots were computed to be -T - kl~1 and -T + kl~l. If these are both to be
negative, then by summing we see T > o. Then -T - kl~1 < 0 is automatic
and - T + kl~1 < 0 is equivalent to kl~1 < ITI.
186 Sobolev Theory and Microlocal Analysis
FIGURE 8.2
e
will be positive if Ixl :::; kt. However, if Ixl > kt we can choose in the direction
opposite x to make the inequality an equality, and then we get a negative value.
Thus r* is given exactly by the conditions t ?: 0 and Ixl :::; kt.
Using the fundamental solutions E±, we can solve the Cauchy problem for
any hypersurface S that is space like. The definition of space like is that the
normal direction must lie in r. The simplest example is the flat hyperplane
x . v = 0 whose normal direction is v. In the example of the wave equation,
if the surface S is defined by the equation t = hex), then the normal direction
is (1, -6..,h(x)), so the condition that S be spacelike is that IV'h(x)1 :::; 11k
everywhere. We will assume that the spacelike surface is complete (it cannot
be extended further) and smooth, and that it divides the remaining points of
~n into two pieces, a "past" and a "future," with the cone r pointing in the
future direction. In this situation it can be shown that the Cauchy problem is
well-posed. Because the argument is intricate we only give a special case.
Suppose we want to solve Pu = J,
where J is a COO function with support in the future. We claim that the solution
is simply u = E+ * J. We have seen that u solves Pu = J if J has compact
Hyperbolic operators 187
future portion
of x - f*
future
~
s
past
FIGURE 8.3
support. However, at any fixed point x, E+ * f only involves the values of fin
the set x - r*. For x in S or in the past, x - r* lies entirely in the past where
f is zero, so u vanishes on S and the past, hence vanishes to infinite order on
S. But for x in the future, x - r* intersects the future only in a bounded set,
as shown in Figure 8.3.
The convolution is well defined even if f does not have compact support. In
fact, this argument shows that the solution u at x in the future depends only
on the values of f in the future portion of x - r*. More generally, Jor the full
Cauchy problem Pu = f, (8/8n)k ul s = gk, k = 0, ... ,m - I, the solution
u(x) for x in the future depends only on the values of f in the future portion of
x - r* , and the values of gk on the portion of S that lies in x - r*. (A similar
statement holds for x in the past, with the past portion of x + r* in place of
the future portion of x - r*.) This is the finite speed of propagation property
of hyperbolic equations, which was already discussed for the wave equation in
section 5.3.
We have described completely the definition of hyperbolicity for constant
coefficient equations with no lower order terms. The story in general is too
complicated to describe here. However, there is special class of hyperbolic
operators, called strictly hyperbolic, which has the property that we can ignore
lower order terms and which can be defined in a naive frozen coefficients fashion.
Recall that for a noncharacteristic direction v, we required that the roots of the
polynomial p(( + zv) be real. For a strictly hyperbolic operator, we require in
addition that the roots be distinct, for ( not a multiple of v. Since we want
to deal with the case of variable coefficient operators, we take the top-order
symbol Pm (x, () and we say P is strictly hyperbolic at x in the direction v
if Pm (x, v) #- 0 and the roots of Pm (x, ( + zv) are real and distinct for all
( not a multiple of v. We then define a strictly hyperbolic operator P to be
one for which there is a smooth vector field v(x) such that P is hyperbolic
at x in the direction v(x). It is easy to see that the wave operator is strictly
188 Sobolev Theory and Microloeal Analysis
hyperbolic because the two roots - 7 - kl~1 and - 7 + kl~1 are distinct if ~ =I- 0
(this is exactly the condition that (7,~) not be a multiple of (1,0)). For strictly
hyperbolic operators we have a theory very much like what was described above,
at least locally. Of course the cone r may vary from point to point, and the
analog of the light cone is a more complicated geometric object. In particular,
we can perturb the wave operator by adding terms of order 1 and 0, even with
variable coefficients, without essentially altering what was said above. We can
even handle a variable speed wave operator (8 2 / 8t2) - k(x)2 C1 x where k(x) is
a smooth function, bounded and nonvanishing.
To say that the Cauchy problem is well-posed means, in addition to existence
and uniqueness, that the solution depends continuously on the data. This is
expressed in terms of a priori estimates involving L2 Sobolev spaces. A typical
estimate (for strictly hyperbolic P) is
where 'IjJ E V(JRn) is a localizing factor. Here, of course, the Sobolev norms
for u and f refer to the whole space Wn , while the Sobolev norms for gj refer
to the (n - 1)-dimensional space S (we have only discussed this for the case of
flat S). This estimate shows the continuous dependence on data when applied
to the difference of two solutions. If u and v have data that are close together in
the appropriate Sobolev spaces, then u and v are close in Sobolev norm. Using
the Sobolev inequalities, we can get the conclusion that u is close to v in a
pointwise sense, at least locally.
If u is in L~+m' then Pu will be in L~, since P involves m derivatives, and
so the appearance of the L~ norm for f on the right side of the a priori estimate
is quite natural. However, the Sobolev norms of the boundary terms seem to
be off by -1/2. In other words, since gj = (8/8n)j uls, we would expect
to land in L;"+k_j' not L;"+k-j-I/2. But we can explain this discrepency
because we are restricting to the hypersurface S. There is a general principle
that restriction of L2 Sobolev spaces reduces the number of derivatives by 1/2
times the codimension (the codimension of a hypersurface being 1).
To illustrate this principle, consider the restriction of f E L;(JRn) to the flat
hypersurface Xn = O. We can regard this as a function on JRn - l , and we want
to show that it is in L;_1/2(JRn-l) if s > 1/2. Write Rf(xI, ... , xn-d =
f(XI, ... , Xn-l, 0). By the one-dimensional Fourier inversion formula we have
Rf(XI, ... ,xn-d = 2~ J~oo J~oo f(x)eixnf.n dX n d~n' and from this we obtain
Rf (~l' ... ' ~n-d = 2~ J~oo j(6, ... , ~n) d~n. So the Fourier transform of
A
nonnal directions
/ -= I
/-= 0
FIGURE 8.4
the function at a point on the curve, it would seem to be the normal direction at
that point. It would also seem reasonable to say that the function is smooth in
the tangential direction. There is less intuitive justification for deciding about all
the other directions. We are actually going to give a definition that decides that
this function is smooth in all directions except the two normal directions. Here
is one explanation for that decision. Choose a direction v, and try to smooth
the function out by averaging it in the directions perpendicular to v (in ~n this
will be an (n - I)-dimensional space, but for n = 2 it is just a line). If you
get a smooth function, then the original function must have been smooth in the
v direction. Applying this criterion to our example, we see that averaging in
any direction other than the tangential direction wiII smooth out the singularity,
and so the function is declared to be smooth in all directions except the nonnal
ones. (Actually, this criterion is a slight oversimplification, because it does not
distinguish between v and -v.)
Let's consider another example, the function Ixl. The only singularity is at
the origin. Since the function is radial, all directions are equivalent at the origin,
so the function cannot be smooth in any direction there.
How do we actually define microlocal smoothness for a function (or distri-
bution)? First we localize in space. We take the function f and multiply by a
test function cp supported near the point x (we require cp(x) :f= 0 to preserve the
behavior of the function f near x). If f were smooth near x, then cpf would
be a Coo function of compact support, and so (cpf) (0 would be rapidly
A
decreasing,
Then we localize in frequency. There are many different directions to let ~ --+ 00.
In some directions we may find rapid decrease, in others not. If we find an open
cone r containing the direction v, such that
FIGURE 8.S
Suppose we know I(ep!) A (01 ::; CN(1 + I~)-N for all ~ Er. Then if we take
any smaller open cone r, ~ r, but still containing v, we will have a similar
estimate,
(The constant c'tv in this estimate depends on r" and blows up if we try to take
192 Sobolev Theory and Microlocal Analysis
J
complement of r
FIGURE 8.6
and this is dominated by a multiple of (1 + IW- N once I~I ~ 2A. Since -J; does
not actually have compact support, the argument is a little more complicated,
but this is the essential idea.
Although the definition of the wave front set is complicated, the concept turns
out to be very useful. We can already give a typical application. Multiplication
of two distributions is not defined in general. We have seen that it can be
defined if one or the other factor is a Coo function, and it should come as
no surprise that it can be defined if the singular supports are disjoint. If T,
and T2 are distributions with disjoint singular support, then T, is equal to a
Coo function on the complement of sing supp T 2, so T, . T2 is defined there.
Similarly T2 . T, is defined on the complement of sing supp T, because T2 is
equal to a Coo function there, and by piecing the two together we get a product
defined everywhere. Now I claim that we can even define the product if the
singular supports overlap, provided that the wave front sets satisfy a separation
condition. The condition we need is the following: we never have both (x, v)
The wave front set 193
in WF(Td and (x,-v) in WF(T2)' The occurance of the minus sign will be
clear from the proof.
To define TI ·T2 it suffices to define cp2T 1 ·T2 for test functions cp of sufficiently
small support, for then we can piece together TI . T2 = L: cpJT1 • T2 from such
test functions with L: cP~ == 1. Now we will try to define cp2TI . T2 as the inverse
Fourier transform of (2!)n (cpTI) * (CPT2) A The problem is that the integral
A •
defining the convolution may not make sense. Since cpTI and CPT2 have compact
support, we do know that they satisfy estimates
and these estimates will not make the integral converge. However, for any fixed
v, either (CPT2) (77) is rapidly decreasing in an open cone r containing v, or
A
by the separation condition on the wave front sets. This is good enough to make
the portion of the integral over r converge absolutely (for every O. In the first
alternative
for any N, which will converge if NI - N < -n; in the second alternative we
need the observation that for fixed ~, ~ - 77 will belong to - r if 77 is in a slightly
smaller cone r l , for 77 large enough, so
and this converges if N2 - N < -no By piecing together these estimates, using
a compactness argument to show that we can cover Jm.n by a finite collection
of such cones, we see that the convolution is well defined. These estimates
also show that (cpTd * (cpT2) is of polynomial growth in~, so the inverse
A A
or lower half-plane, so 'Pf is Coo. For a point (xo,O) on the x-axis, choose
'P(x,y) of the form 'P1(X)'P2(y) where 'PI(XO) = 1 and 'P2(0) = 1. Then
f(x,y)'P(x,y) = 'P1(x)h(y) where
Also a rotation preserves perpendicularity, so our conclusion that the wave front
set of the characteristic function of the upper half-plane points perpendicular to
the boundary carries over to any half-plane. But to understand regions bounded
by curves we need to consider nonlinear coordinate changes.
Let g : lH. n --t lH. n be a smooth mapping with smooth inverse g-I. Then for
any distribution T we may define Tog by
where leg) = Idetg'l is the Jacobian. The analogous statement for the wave
front set is the following: (x,e) E WF(Tog) ifandonlyif(g(x),(g'(x)-I)tre)
E W F(T). Note that this is consistent with the linear case because a linear
map is equal to its derivative.
This transformation law for wave front sets preserves perpendicularity to a
curve. Suppose x(t) is a curve and (x(to), e) is a point in W F(f 0 g) with e
perpendicular to the curve, x'(to)·e = O. Then (g(x(to)), (g'(X(tO))-1 )tre) is a
point in WF(f) with (g'(x(to))-I)tre perpendicular to the curve g(x(t)). This
follows because g'(x(t))x'(t) is the tangent to the curve at g(x(to)), and
rex) g
f =I ~
t fog= I
x(t)
f=O fog =0
FIGURES.7
The transformation property of the wave front set indicates that the second
variable e should be thought of as a cotangent vector, not a tangent vector. In
the more general context of manifolds, the wave front set is a subset of the
cotangent bundle. The distinction between tangents and cotangents is subtle,
196 Sobolev Theory and Microlocal Analysis
cone r containing the vector (0, 1), and similarly for - r containing (0, -I),
since by assumption T is microlocally smooth in these cotangent directions. To
define the restriction of cpT we will define its Fourier transform. But we have
seen that the Fourier transform of the restriction should be given by integrating
the Fourier transform of cpT. In other words, if R denotes the restriction, we
should have
FIGURE 8.8
What we claim is that this integral converges absolutely. This will enable us
to define R( cpT) A by this formula, and that is the definition of R( cpT) as a
distribution on ~ 1 . The proof of the convergence of the integal follows from
e,
the picture that shows that for any fixed the vertical line (e, 'TJ) eventually lies
in r or -r as 'TJ -+ ±oo.
Once inside f or -f, the rapid decay guarantees the convergence, and the
finite part of the line not in these two cones just produces a finite contribution
to the integral, since (cpT) is continuous. A more careful estimate shows that
A
well defined.
To prove the restriction property for a general curve in the plane, we simply
apply a change of coordinates to straighten out the curve. The hypothesis that the
normal cotangents not be in the wave front set is preserved by the transformation
property of wave front sets, as we have already observed. A similar argument
works for the restriction from ~n to any smooth k-dimensional surface. Note
that in this case the normal directions to a point on the surface form a subspace
of dimension n - k of the cotangent space.
A more refined version of this argument allows us to locate the wave front set
of the restriction. Suppose x is a point on the surface S to which we restrict T.
e
If we take the cotangents at x for which (x,O E W F(T) and project them
orthogonally onto the cotangent directions to S, then we get the cotangents that
may lie in the wave front set of the restriction of T to S at x. In other words,
if we have a cotangent 'TJ to S at x that is not the projection of any cotangent
in the wave front set of T of x, then RT is microlocally smooth at (x, 'TJ).
198 Sobolev Theory and Microlocal Analysis
WF(Pu) ~ WF(u).
This is analogous to the fact that P is a local operator (the singular support is
not increased), and so we say that any operator with this property is a micro local
operator. In fact it is true that pseudodifferential operators are also microlocal.
We illustrate this principle by considering the case of a constant coefficient
differential operator. The microlocal property is equivalent to the statement that
if u ismicrolocally smooth at (x,~) then so is Pu. Now the idea is that if
(cpu) is rapidly decreasing in an open cone r, then so is P( cpu) since this
A A
is just the product of (cpu) A with the polynomial p. This is almost the entire
story, except that what we have to show is that (cpPu) is rapidly decreasing
A
in r, and cpPu =1= P( cpu). In this case this is just a minor inconvenience,
since cpPu = P(cpu) + 'L,CpjQjU where Qj are lower order constant coefficient
differential operators and Cpj are test functions with the same support as cpo Thus
we prove the assertion by induction on the order of the differential operator, and
the induction hypothesis takes care of the cpjQ jU terms.
Since applying a differential operator does not increase the wave front set,
the next natural question to ask is, can it decrease it? For elliptic operators,
the answer is no. This is the microlocal analog of hypoellipticity and so is
called microlocal hypoellipticity. It is easy to understand this fact in terms of
the parametrix. If P is an elliptic operator and Q is the 'ljJDO parametrix, so
QPu = u+Ru with Ru a Coo function, then the microlocal nature ofQ implies
Theorem 8.7.1
W F (u) ~ W F (Pu) U char( P). Another way of stating this theorem is that if
Pu is microlocally smooth at (x, ~), and ~ is non characteristic at x, then u is
microlocally smooth at (x, ~).
using the chain rule shows that Pm is constant along a bicharacteristic curve,
because
d ~ (8 pm dXj(t)
dtPm(X, (t),~(t)) = ~ 8xj (x(t),~(t))dt
+ 8Pm(X(t) ~(t))d~j(t))
8~j , dt
We will only be interested in the null bicharacteristics, which are those for
which Pm is zero. Thus the set char(P) splits into a union of null bicharacteristic
curves. Note that the assumption that P is of principal type means that the x-
velocity is nonzero along null bicharacteristic curves. This guarantees that they
really are curves (they do not degenerate to a point), and the projection onto the
x coordinates is also a curve. We will follow the standard convention in deleting
the adjective "nuIl"; all our bicharacteristic curves will be null bicharacteristic
curves.
If P is a constant coefficient operator, then 8Pm/8xj == 0 and so the bichar-
acteristic curves have a constant ~ value. Also 8Pm/8~j is independent of x,
and so dXj(t)/dt = (8Pm/8~j)(~) is a constant; hence x(t) is a straight line in
the direction V Pm (0. We caIl these projections of bicharacteristic curves light
rays.
For example, suppose P is the wave operator (8 2 /8t 2 ) - k 2 .6.x. Then
Pm(7,~) = _7 2 + k21~12, and the characteristics are 7 = ±kl~l, or in other
words the cotangents (±kl~I,~) for ~ :j:. O. Starting at a point (f, x), and choos-
ing a characteristic cotangent, the corresponding light ray (with parameter s) is
given by t = f ± 2kl~ls and x = x + 2k 2 sf If we use the first equation to
eliminate the parameter sand reparametrize the light ray in terms of t, the result
IS
Theorem 8.7.2
Let P be an operator of principal type, with real coefficients, and suppose
Pu = f is C=. For each null bicharacteristic curve, either all points are in
the wave front set of u or none are.
Microlocal analysis of singularities 201
Let's apply this theorem to the wave equation. Suppose we know the solution
in a neighborhood of (t,x) and we can compute which cotangents (±kl~I,~)
are in the wave front set at that point. Then we know the singularity will
propagate along the light rays in the ±~/I~I direction passing through that point.
If (±kl~l, 0 is a direction of microlocal smoothness, then it will remain so along
the entire light ray. Thus the "direction" of the singularity dictates the direction
it will travel.
In dealing with the wave equation, it is usually more convenient to describe
the singularities in terms of Cauchy data. Suppose we are given u(x, 0) = I(x)
and Ut(x, 0) = g(x) for U a solution of
What can we say about the singularities of u at a later time in terms of the
singularities of I and g. Fix a point x in space JRn, and consider the wave front
set of u at the point (0, x) in spacetime JRn+l. We know that the cotangents
(T,O must be characteristics of the equation, so must be of the form (±kl~l, O.
Note that exactly two of these characteristic spacetime cotangents project onto
each space cotangent~. So if (x,~) belongs to WF(f) or WF(g), then ei-
ther ((O,x),(kl~I,~)) or ((O,x),(-kl~I,~)) belongs to WF(u) (usually both,
although it is possible to construct examples where only one does). Conversely,
if both I and 9 are microlocally smooth at (x, ~), then it can be shown that u
is microlocally smooth at both points ((0, x), (±kl~l, ~)).
Now fix a point in spacetime (t, x). Given a characteristic cotangent
(±kl~l, ~), we want to know if u is microlocally smooth there. The light ray
passing through (t, x) corresponding to this cotangent intersects the t = 0 sub-
space at the point x =f kf(~/IW. So if I and 9 are mircolocally smooth in the
direction ~ at this point, we know that u is microlocally smooth along the whole
bicharacteristic curve, hence at ((t, x), (±kl~I,~)). In particular, if we want to
know that u is smooth in a neighborhood of (t, x), we have to show that I and
9 are microlocally smooth at all points (x =f kf(~/IW, O. This requires that we
examine the sphere of radius kf about x, but it does not require that I and 9
have no singularities on this sphere; it just requires that the singularities not lie
in certain directions.
To illustrate this phenomenon further, let's look at a specific choice of Cauchy
data. Let, be a smooth closed curve in the plane, and let I be the characteristic
function of the interior of ,. Also take 9 = o. So the singular support of I is
" and the wave front set consists of (x, 0 where x is in , at ~ is normal to ,.
This leads to singularities at a later time t at the points x ± kf(~/IW.
Clearly it suffices to consider ~ of unit length, and in view of the choice
of sign we may restrict ~ to be the unit normal pointing outward. Thus the
202 Sobolev Theory and Microlocal Analysis
singularities at the time f lie on the two curves x ± kfn where n denotes this
unit normal vector.
FIGURE 8.9
In other words, starting from the curve 'Y, we travel in the perpendicular
direction a distance kf in both directions to get the two new curves. Furthermore,
at each point on the two new curves, the wave front set of u(f, x) as a function
of x is exactly in the n direction.
Incidentally, we can verify in this situation that the direction n is still per-
pendicular to the new curves. If x(s) traces out 'Y with parameter s, then
x' (s) is the tangent direction so n( s) satisfies x' (s) . n( s) = O. Also we have
n(s) . n(s) = 1 because it is of unit length. Differentiating this identity we
obtain n'(s) . n(s) = O. But now the new curves are parametrized by s as
x(s) ± kfn(s) and so their tangent directions are given by x'(s) ± kfn'(s).
Since (x'(s) ± kfn'(s)) . n(s) = x'(s) . n(s) ± kfn'(s) . n(s) = 0 we see that
n( s) is still normal to the new curves.
8.8 Problems
1. Show that if I E LP' and I E £P2 with PI < P2, then I E LP for
PI ::s; P ::s; P2· (Hint: Split I in two pieces according to whether or not
III ::s; 1.)
2. Let
P(uv) = L ~ ((~)a
ax
u) p(a)v
iai:'S:m a.
a )m
~
n (
aXj
15. Show that C1u = AU has no solutions of polynomial growth if A > 0, but
does have such solutions if A < O.
16. Let P" ... , Pk be first-order differential operators on JRn. Show that
P? + ... + P'f is never elliptic if k < n.
17. Show that a first-order differential operator on JRn cannot be elliptic if
either n 2': 3 or the coefficients are real-valued.
18. Show that a Coo function that is homogeneous of any degree must be a
polynomial. (Hint: A high enough derivative would be homogeneous of
negative degree, hence not continuous at the origin.)
19. Show that the Hilbert transform H f = ;:-1 (sgna(~)) is a 'lj;DO of order
zero in JR1. Show the same for the Riesz transforms
in JRn.
20. Verify that the asymptotic formula for the symbol of a composition of
'lj;DO's gives the exact symbol of the composition of two partial differ-
ential operators.
21. Verify that the commutator of two partial differential operators of orders
ml and m2 is of order ml + m2 - 1.
22. If P is a 'lj;DO of order r < -n with symbol p(x, ~), show that k(x, y) =
(2;)n J p(x, Oe-i(x-y).~ d~ is a convergent integral and Pu(x) =
J k( x, y )u(y) dy if u is a test function.
23. If u(x,~) is a classical symbol of order rand h : JRn -+ JRn is a Coo map-
ping with Coo inverse, show that u(h(x), (h'(x)tr)-I~) is also a classical
symbol of order r.
24. Show that u(x,~) = (I + 1~12)a/2 is a classical symbol of order a. (Hint:
Write (I + 1~12)a/2 = 1~la(1 + 1~1-2)a/2 and use the binomial expansion
for I~I > 1.)
25. If P is a 'lj;DO of order zero with symbol p(~) that is independent of x,
show that IIPul1L2 ~ Ilulip. (Hint: Use the Plancherel formula.)
26. If h is a Coo function, show that hC1 is elliptic if and only if h is never
zero.
27. Find a fundamental solution for the Hermite operator -(d2 /dx 2 ) + x 2 in
JRI using Hermite expansions. Do the same for -C1 + Ixl2 in JRn.
28. Let P be any second-order elliptic operator with real coefficients on JRn .
Show that (82/ 8t 2 ) - P is strictly hyperbolic in the t direction on JRn+ 1.
29. I~ which directions is the operator 8 2/ 8x8y hyperbolic in JR2?
Problems 205
30. Show that the surface y = 0 in JR3 is noncharacteristic for the wave
operator
82 82 82
8t2 - 8x2 - 8y2
but is not space like.
31. For a smooth surface of the form t = h(x), what are the conditions on h
in order that the surface be space like for the variable speed wave operator
(8 2j8t 2) - k(x?~x?
32. Show that if f E L~(JRn) with s > 1 then Rf(x" ... ,Xn-2) =
f(XI, .. . ,Xn-2, 0, 0) is welI defined in L~_I (JRn-2). (Hint: Interate the
codimension-one restrictions.)
33. Show that if n 2: 2, no operator can be both elliptic and hyperbolic.
34. Show that the equation Pu = 0 can never have a solution of compact
support for P a constant coefficient differential operator. (Hint: What
would that say about u?)
35. If u is a distribution solution of 8 2u j 8x8y = 0 and v is a distribution
solution of
8 2v 8 2v
---=0
8x 2 8y2
in JR2, show that the product is uv is welI defined.
36. For the anisotripic wave operator
82 82 82
8t2 - kf 8x 2
I
- k~ 8x22
in JR2,
compute the characteristics, the light cone, the bicharacteristic
curves and the light rays.
37. Show that the operator
both vanish at any point. Compute the characteristic and the bicharac-
teristic curves, and show that the light rays are solutions of the system
x'(t) = a(x(t),y(t)),y'(t) = b(x(t),y(t)).
Suggestions for Further Reading
For a more thorough and rigorous treatment of distribution theory and Fourier
transforms:
"Generalized functions, vol. I" by I.M. Gelfand and G.E. Shilov, Academic
Press, 1964.
207
Index
209
210 Index