100% found this document useful (1 vote)

271 views222 pages

A Guide To Distributions Theory and Fourier Transforms

Uploaded by

Fiorenzo Tassotti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

271 views222 pages

A Guide To Distributions Theory and Fourier Transforms

Uploaded by

Fiorenzo Tassotti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 222

--'.

A Guide to
Distribution Theory
and Fourier
Transfonlls

.{;T1Tnrt {; IN ADVANCED MATHEMATICS

ROBERT S. STRICHAR1Z
Cornell University, Department of Mathematics

A Guide to Distribution Theory

and Fourier Transforms

CRC PRESS
Boca Raton Ann Arbor London Tokyo
Library of Congress Cataloging-in-Publication Data

Strichartz, Robert.
A guide to distribution theory and Fourier transforms I by Robert Strichartz.
p. cm.
Includes bibliographical references and index.
ISBN 0-8493-8273-4
I. Theory of distributions (Functional analysis). 2. Fourier analysis. I. Title.
QA324.S77 1993
515'.782-dc20 93-36911
CIP

DISCLAIMER OF WARRANTY AND LIMITS OF LIABILITY: The author of this

book has used his best efforts in preparing this material. These efforts include the devel-
opment, research, and testing of the theories and programs to determine their effective-
ness. NEITHER THE AUTHOR NOR THE PUBLISHER MAKE WARRANTIES OF
ANY KIND, EXPRESS OR IMPLIED, WITH REGARD TO THESE PROGRAMS OR
THE DOCUMENTATION CONTAINED IN THIS BOOK, INCLUDING WITHOUT
LIMITATION WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PAR-
TICULAR PURPOSE. NO LIABILITY IS ACCEPTED IN ANY EVENT FOR ANY
DAMAGES, INCLUDING INCIDENTAL OR CONSEQUENTIAL DAMAGES, LOST
PROFITS, COSTS OF LOST DATA OR PROGRAM MATERIAL, OR OTHERWISE IN
CONNECTION WITH OR ARISING OUT OF THE FURNISHING, PERFORMANCE,
OR USE OF THE PROGRAMS IN THIS BOOK.

This book contains information obtained from authentic and highly regarded sources.
Reprinted material is quoted with permission, and sources are indicated. A wide variety
of references are listed. Reasonable efforts have been made to publish reliable data and
information, but the author and the publisher cannot assume responsibility for the validity
of all materials or for the consequences of their use.

Neither this book nor any part may be reproduced or transmitted in any form or by any
means, electronic or mechanical, including photocopying, microfilming, and recording,
or by any information storage or retrieval system, without prior permission in writing
from the publisher.

CRC Press, Inc.'s consent does not extend to copying for general distribution, for pro-
motion, for creating new works, or for resale. Specific permission must be obtained in
writing from CRC Press for such copying.

Direct all inquiries to CRC Press, Inc., 2000 Corporate Blvd., N.W., Boca Raton, Florida,
3111L.

© 1994 by CRC Press, Inc.

'jIJ claim to original U.S. government works

~ernational Standard Book Number 0-8493-8273-4

Library of Congress Card Number 93-36911

Printed in the United States of America I 234567890

Printed on acid-free paper

Contents

Preface vii
1 What are Distributions? 1
1.1 Generalized functions and test functions 1
1.2 Examples of distributions 5
1.3 What good are distributions? 7
1.4 Problems 9
2 The Calculus of Distributions 11
2.1 Functions as distributions 11
2.2 Operations on distributions 12
2.3 Adjoint identities 17
2.4 Consistency of derivatives 19
2.5 Distributional solutions of differential equations 21
2.6 Problems 23
3 Fourier Transforms 26
3.1 From Fourier series to Fourier integrals 26
3.2 The Schwartz class S 29
3.3 Properties of the Fourier transform on S 30
3.4 The Fourier inversion formula on S 35
3.5 The Fourier transform of a Gaussian 38
3.6 Problems 41
4 Fourier Transforms of Tempered Distributions 43
4.1 The definitions 43
4.2 Examples 46
4.3 Convolutions with tempered distributions 52
4.4 Problems 54

v
vi Contents

5 Solving Partial Differential Equations 56

5.1 The Laplace equation 56
5.2 The heat equation 60
5.3 The wave equation 62
5.4 Schrodinger's equation and quantum mechanics 67
5.5 Problems 69
6 The Structure of Distributions 73
6.1 The support of a distribution 73
6.2 Structure theorems 77
6.3 Distributions with point support 80
6.4 Positive distributions 83
6.5 Continuity of distribution 85
6.6 Approximation by test functions 92
6.7 Local theory of distributions 95
6.8 Problems 102
7 Fourier Analysis 106
7.1 The Riemann-Lebesgue lemma 106
7.2 Paley-Wiener theorems 112
7.3 The Poisson summation formula 117
7.4 Probability measures and positive definite functions 122
7.5 The Heisenberg uncertainty principle 126
7.6 Hermite functions 131
7.7 Radial Fourier transforms and Bessel functions 135
7.8 Haar functions and wavelets 141
7.9 Problems 148
8 Sobolev Theory and Microlocal Analysis 153
8.1 Sobolev inequalities 153
8.2 Sobolev spaces 162
8.3 Elliptic partial differential equations (constant coefficients) 166
8.4 Pseudodifferential operators 175
8.5 Hyperbolic operators 181
8.6 The wave front set 189
8.7 Microlocal analysis of singularities 198
8.8 Problems 202
Suggestions for Further Reading 207
Index 209
Preface

Distribution theory was one of the two great revolutions in mathematical anal-
ysis in the 20th century. It can be thought of as the completion of differential
calculus, just as the other great revolution, measure theory (or Lebesgue integra-
tion theory), can be thought of as the completion of integral calculus. There are
many parallels between the two revolutions. Both were created by young, highly
individualistic French mathematicians (Henri Lebesgue and Laurent Schwartz).
Both were rapidly assimilated by the mathematical community, and opened up
new worlds of mathematical development. Both forced a complete rethinking
of all mathematical analysis that had come before, and basically altered the
nature of the questions that mathematical analysts asked. (This is the reason
I feel justified in using the word "revolution" to describe them.) But there
are also differences. When Lebesgue introduced measure theory (circa 1903),
it almost came like a bolt from the blue. Although the older integration the-
ory of Riemann was incomplete-there were many functions that did not have
integrals-it was almost impossible to detect this incompleteness from within,
because the non-integrable functions really appeared to have no well-defined in-
tegral. As evidence that the mathematical community felt perfectly comfortable
with Riemann's integration theory, one can look at Hilbert's famous list (dating
to 1900) of 23 unsolved problems that he thought would shape the direction of
mathematical research in the 20th century. Nowhere is there a hint that com-
pleting integration theory was a worthwhile goal. On the other hand, a number
of his problems do foreshadow the developments that led to distribution theory
(circa 1945). When Laurent Schwartz came out with his theory, he addressed
problems that were of current interest, and he was able to replace a number of
more complicated theories that had been developed earlier in an attempt to deal
with the same issues.
From the point of view of this work, the most important difference is that in
retrospect, measure theory still looks hard, but distribution theory looks easy.
Because it is relatively easy, distribution theory should be accessible to a wide
audience, including users of mathematics and mathematicians who specialize in
other fields. The techniques of distribution theory can be used, confidently and

vii
viii Preface

effectively-just like the techniques of calculus are used-without a complete

knowledge of the formal mathematical foundations of the subject. The aim of
this book is thus very similar to the aim of a typical calculus textbook: to explain
the techniques of the theory with precision, to provide an intuitive discussion
of the ideas that underline the techniques, and to offer a selection of problems
applying the techniques.
Because the Lebesgue theory of integration preceded distribution theory his-
torically, and is required for the rigorous mathematical development of the the-
ory, it might be thought that a knowledge of the Lebesgue theory would have
to be a prerequisite for studying distribution theory. I do not believe that this is
true, and I hope this book makes a good case for my point of view. When you
see an integral sign in this book, you are free to interpret it in the sense of any
integration theory you have learned. If you have studied the Lebesgue theory
in any form, then of course think of the integrals as Lebesgue integrals. But if
not, don't worry about it. Let the integral mean what you think it means.
Distribution theory is a powerful tool, but it becomes an even more pow-
erful tool when it works in conjunction with the theory of Fourier transforms.
One of the main areas of applications is to the theory of partial differential
equations. These three theories form the main themes of this book. The first
two chapters motivate and introduce the basic concepts and computational tech-
niques of distribution theory. Chapters three and four do the same for Fourier
transforms. Chapter five gives some important and substantial applications to
particular partial differential equations that arise in mathematical physics. These
five chapters, part I of the book, were written with the goal of getting to the
point as quickly as possible. They have been used as a text for a portion of a
course in applied mathematics at Cornell University for more than 10 years.
The last three chapters, part II of the book, return to the three themes in greater
detail, filling in topics that were left aside in the rapid development of part I,
but which are of great interest in and of themselves, and point toward further
applications. Chapter six returns to distribution theory, explaining the notion of
continuity, and giving the important structure theorems. Chapter seven covers
Fourier analysis. In addition to standard material, I have included some topics of
recent origin, such as quasicrystals and wavelets. Finally, Chapter eight returns
to partial differential equations, giving an introduction to the modern theory
of general linear equations. Here the reader will meet Sobolev spaces, a priori
estimates, equations of elliptic and hyperbolic type, pseudodifferential operators,
wave front sets, and the ideology known as microlocal analysis. Part II was
written for this book and deals with somewhat more abstract material. It was
not designed for use as a textbook, but more to satisfy the curiosity of those
readers of part I who want to learn in greater depth about the material. I also
hope it will serve as an appetizer for readers who will go on to study these
topics in greater detail.
The prerequisites for reading this book are multidimensional calculus and an
introduction to complex analysis. A reader who has not seen any complex anal-
Preface ix

ysis will be able to get something out of this book, but will have to accept that
there will be a few mystifying passages. A solid background in multidimen-
sional calculus is essential, however, especially in part II.
Recently, when I was shopping at one of my favorite markets, I met a graduate
of Cornell (who had not been in any of my courses). He asked me what I was
doing, and when I said I was writing this book, he asked sarcastically "do you
guys enjoy writing them as much as we enjoy reading them?" I don't know
what other books he had in mind, but in this case I can say quite honestly that
I very much enjoyed writing it. I hope you enjoy reading it.

Acknowledgments
I am grateful to John Hubbard, Steve Krantz, and Wayne Yuhasz for encouraging
me to write this book, and to June Meyermann for the excellent job of typesetting
the book in ~Tp'.

Ithaca, NY
October 1993
What are Distributions?

1.1 Generalized functions and test functions

You have often been asked to consider a function f(x) as representing the value
of a physical variable at a particular point x in space (or space-time). But is
this a realistic thing to do? Let us borrow a perspective from quantum theory
and ask: What can you measure?
Suppose that f (x) represents temperature at a point x in a room (or. if you
prefer let f(x, t) be temperature at point x and time t.) You can measure
temperature with a thermometer, placing the bulb of the thermometer at the
point x. Unlike the point, the bulb of the thermometer has a nonzero size, so
what you measure is more an average temperature over a small region of space
(again if you think of temperature as varying with time also, then you are also
averaging over a small time interval preceeding the time t when you actually
read the thermometer). Now there is no reason to believe that the average is
"fair" or "unbiased." In mathematical terms, a thermometer measures

! f(x)<p(x) dx

where <p(x) depends on the nature of the thermometer and where you place
it-<p( x) will tend to be "concentrated" near the location of the thermometer
bulb and will be nearly zero once you are sufficiently far away from the bulb.
To say this is an "average" is to require
<p( x) ~ 0 everywhere, and

! <p(x) dx = 1 (the integral is taken over all space).

However, do not let these conditions distract you. With two thermometers you
can measure

! f(x)<pt(x)dx and

1
2 What are Distributions?

and by subtracting you can deduce the value of f f(x)[ipt (x) - ip2(X)] dx. Note
that ipt(x) - ip2(X) is no longer nonnegative. By doing more arithmetic you
can even compute f f(x)(atipt (x) - a2ip2(x)) dx for constants at and a2, and
atipt(x) - a2ip2(x) may have any finite value for its integral.
The above discussion is meant to convince you that it is often more meaningful
physically to discuss quantities like f f(x)ip(x) dx than the value of f at a
particular point x. The secret of successful mathematics is to eliminate all
unnecessary and irrelevant information-a mathematician would not ask what
color is the thermometer (neither would an engineer, I hope). Since we have
decided that the value of f at x is essentially impossible to measure, let's stop
requiring our functions to have a value at x. That means we are considering a
larger class of objects. Call them generalized functions. What we will require
of a generalized function is that something akin to f f(x)ip(x) dx exist for a
suitable choice of averaging functions ip (call them test functions). Let's write
(f, ip) for this something. It should be a real number (or a complex number if
we wish to consider complex-valued test functions and generalized functions).
What other properties do we want? Let's recall some arithmetic we did before,
namely

We want to be able to do the same sort of thing with generalized functions,

so we should require at (f,ipt) - a2(f,ip2) = (f,atipt - a2ip2). This property
is called linearity. The minus sign is a bit silly, since we can get rid of it by
replacing a2 by -a2. Doing this we obtain the condition

Notice we have tacitly assumed that if ipt, ip2 are test functions tHen at ipt +a2ip2
is also a test function. I hope these conditions look familiar to you-if not,
please read the introductory chapter of any book on linear algebra.
You have almost seen the entire definition of generalized functions. All you
are lacking is a description of what constitutes a test function and one technical
hypothesis of continuity. Do not worry about continuity-it will always be
satisfied by anything you can construct (wise-guys who like using the axiom of
choice will have to worry about it, along with wolves under the bed, etc).
So, now, what are the test functions? There are actually many possible
choices for the collection of test functions, leading to many different theories of
generalized functions. I will describe the space called D, leading to the theory
of distributions. Later we will meet other spaces of test functions.
The underlying point set will be an n-variable space jRn (points x stand for
»
x = (Xt, ... , x n or even a subset n c jRn that is open. Recall that this means
Generalized functions and test junctions 3

every point x E n is surrounded by a ball {y : Ix - yl < f} contained in n,

where f depends on x, and
Ix - YI = V(XI - Yl)2 + ... + (x n - Yn)2).
Of course n = ~n is open, as is every open ball {y : Ix - YI < r}. Intuitively,
an open set is just a union of open balls.
The class of test functions V(n) consists of all functions <p(x) defined in n,
vanishing outside a bounded subset of n that stays away from the boundary of
n, and such that all partial derivatives of all orders of <p are continuous.
For example, if n = I, n = {O < x < I}, then

is in 'lJ
o

is not in 'lJ
o

FIGURE 1.1

The second example fails because it does not vanish near the boundary point 0,
and the third example fails because it is not differentiable at three points. To
actually write down a formula for a function in V is more difficult. Notice that
no analytic function (other than <p == 0) can be in V because of the vanishing
requirement. Thus any formula for <p must be given "in pieces." For example,
in ~l
_1/x2
x>O
1j;(x) = {
~
xS;O
has continuous derivatives of all order:

( ~)
dx
k -1/x 2 _
e -
polynomial in x -1/x2
polynomial in x e
4 What are Distributions?

and as x ---+ 0 this approaches zero since the zero of e- t / x2 beats out the pole
1. ,.ID x • Thus (f")(x)
of potynOlrua .,..
= 1j;(x)1j;(1 - x) has continuous derivatives of all
orders (we abbreviate this by saying cp is COO) and vanishes outside 0 < x < 1,
so cp E V(JR'); in fact, cp E V(a < x < b) provided a < 0 and b > 1 (why not
a ::; 0 and b 2': 1?). Once you have one example you can manufacture more by

1. moving it about cp(x + xo)

2. changing vertical scale acp(x)
3. changing horizontal scale cp( ax)
4. taking linear combinations a,cp,(x) + a2CP2(x) if CP"CP2 E V.
5. taking products CPt (x" . .. , x n ) = cp(x, )cp(X2) ... cp(xn ) to obtain exam-
ples in higher dimensions.

Exercise: Draw the pictures associated with operations 1-4.

These considerations should convince you that you can make a test function
in V do anything you can draw a picture of a smooth function doing. You can
make it take on prescribed values at a finite set of points, make it vanish on any
open set and even take on a constant value (say 1) on a bounded open set away
from the boundary (this requires a little more work).
O.K. That is what V(O) is. We can now define the class of distributions on 0,
denoted V'(O), to be all continuous linear functionals on V(O). By functional
I mean a real- (or complex-) valued function on V(O), written (f, cp). By linear
I mean it satisfies the identity

(Yes, Virginia, a, CPt + a2CP2 is in V(O) if CPt and CP2 are in V(O).) By c6ntin-
uous I mean that if CPt is close enough to cP then (f, cp,) is close to (f, cp)-
the exact definition can wait until later. Continuity has an intuitive physical
interpretation-you want to be sure that different thermometers give approxi-
mately the same reading provided you control the manufacturing process ade-
quately. Put another way, when you repeat an experiment you do not want to
get a different answer because small experimental errors get magnified. Now,
whereas discontinuous functions abound, linear functionals all tend to be con-
tinuous. This happy fact deserves a bit of explanation. Fix cp and CPt and call
the difference CPt - cp = CP2· Then CPt = cp+ CP2· Now perhaps (f, cp) and (f, CPt)
are far apart. So what? Move CPt closer to cP by considering cP + tCP2 and let t
get small. Then (f, cP + tCP2) = (f, cp) + t(f, CP2) by linearity, and as t gets small
this gets close to (f, cp). This does not constitute a proof of continuity, since
the definition requires more "uniformity," but it should indicate that a certain
amount of continuity is built into linearity. At any rate, all linear functionals on
V(O) you will ever encounter will be continuous.
Examples of distributions 5

1.2 Examples of distributions

Now for some examples. Any function gives rise to a distribution by setting
(f, r.p) =In f(x)r.p(x) dx, at least if the integral can be defined. This is certainly
true if f is continuous, but actually more general functions will work. Depending
on what theory of integration you are using, you may make f discontinuous
and even unbounded, provided the improper integral converges absolutely. For
instance ~xl~r Ixl- t dx converges for t < n (in n dimensions) and diverges
for t :::: n. Thus the function Ixl- t for t < n gives rise to the distribution in
V'(lJRn)(f,r.p) = Imnr.p(x)lxl-tdx (the actual range of integration is bounded
since r.p vanishes outside a bounded set).
A different sort of example is the Dirac O'-function: (0', r.p) = r.p(O). In this
case we have to check the linearity property, but it is trivial to verify:

An even wilder example is 0" (now in V'(JR')) given by (O",r.p) = -r.p'(O).

Exercise: Verify linearity.

These examples demand a closer look, some pictures, and an explanation of
the minus sign. Consider any function Jk(x) (for simplicity we work in one
dimension) that satisfies the conditions

1. fk(X) = 0 unless Ixl :::; -Ie

2. I~~~k fk(X) dx = 1.

The simplest examples are

kl2

-11k o 11k -11k 11k

FIGURE 1.2
6 What are Distributions?

but we may want to take fk smoother, even in V

- - - area = I

-Uk Uk
FIGURE 1.3

Now the distribution

(/k, cp) = J fk{X)CP{X) dx

is an average of cp near zero, so that if cp does not vary much in -11k ~ x ~

11k, it is close to cp{O). Certainly in the limit as k -+ 00 we get Uk, cp} -+
cp(O) = (8, cp). Thus we may think of 8 as Iimk-+oo Ik. (Of course, pointwise
. { 0 if x =F 0
hm
k-+oo fk (x) = +00 'f x-- 0
1

for suitable choice of fk' but this is nonsense, showing the futility of pointwise \

thinking.) ,
Now suppose we first differentiate fk and then let k -+ oo?

if .fie is thenfk'is

-Uk 11k

FIGURE 1.4
What good are distributions? 7

and

(f 'rn)""CP(-f,J-cp(f,.-)
k,.,- "" 11k

(the points - 2~ and 2~ are the midpoints of the intervals and the factor (1/k)-1
= k is the area) which approaches -cp'(O) as k -+ 00. We obtain the same
answer formally by integrating by parts:

f fk(x)cp(x)dx = - f /k(x)cp'(x)dx = -(Jk,Cp') -+ -cp'(O)

as k -+ 00. Here we might assume that fk is continuously differentiable. Note

that there are no boundary terms in the integration-by-parts formula because cp
vanishes for large values of x. Thus if fk -+ 0 then f~ -+ 0', which justifies
the notation of 0' as the derivative of o.
/

1.3 What good are distributions?

Enough examples for now. We will have more later. Let us pause to consider
the following question: Why this particular (perhaps peculiar) choice of test
functions V? The answer is not easy. In fact, the theory of generalized func-
tions was in use for almost twenty years before Laurent Schwartz proposed the
definition of V. So it is not possible to use physical or intuitive grounds to say
V is the only "natural" class of test functions. However, it does yield an elegant
and useful theory-so that after the fact we may thank Laurant Schwartz for
his brilliant insight. You can get some feel for what is going on if you observe
that the smoother you require the test functions cp to be, the "rougher" you can
allow the generalized functions f to be. To define (0, cp), cp must be at least
continuous, and to define (0', cp), you must require cp to be differentiable. Later
I will show you how to define derivatives for any distribution-the key point
in the definition will be the ability to differentiate the test functions.
The requirement that test functions in V vanish outside a bounded set and
near the boundary of n is less crucial. It allows distributions to "grow arbitrarily
rapidly" as you approach the boundary (or infinity). Later we will consider a
smaller class of distributions, called tempered distributions, which cannot grow
as rapidly at infinity, by considering a larger class of test functions that have
weaker vanishing properties.
Another question you should be asking at this point is: What good are dis-
tributions? Let me give a hint of one answer. Differential equations are used
to construct models of reality. Sometimes the reality we are modeling suggests
that some solutions of the differential equation need not be differentiable! For
8 What are Distributions?

example, the "vibrating string" equation

f} 2u(x, t) = e f}2U(X, t)
f}t 2 f}x 2

has a solution u(x, t) = f(x - kt) for any function of one variable f, which
has the physical interpretation of a "traveling wave" with "shape" f(x) moving
at velocity k.

f(x) = u(x, 0)
f(x - k) = u(x, 1)

FIGURE 1.5

There is no physical reason for the "shape" to be differentiable, but if it is

not, the differential equation is not satisfied at some points. But we do not
want to throwaway physically meaningful solutions because of technicalities.
You might be tempted therefore to think that if a function satisfies a differential
equation except for some points where it is not differentiable, it should be
admitted as a solution. The next example shows that such a simplistic idea does
not work.
Laplace's equation ~u = 0 where ~ (called the Laplacian and soml(times
written V'2) is defined by

f} 2u 2
AU = ! l 2
L..l
+ f}!l2u \·n TT1l2
IN.
uX\ uX 2

and
2
f} u f} u 2
f} u 2
_+_+_ in ~3.
f}xy f}x~ f}x~

A solution to Laplace's equation has the physical interpretation of a potential

in a region without charge. A straightforward (albeit long) calculation shows
that u(X\, X2) = log(xf + x~) and u(x\, X2, X3) = (xf + x~ + x~)-\/2 are
solutions at every point except the origin, where they fail to be differentiable.
But they must be rejected on physical grounds because potentials away from
charges must be continuous (even smooth), while these functions have poles.
The moral of the story is that distribtution theory allows you to distinguish
between these two cases. In the first case u(x, t) = f(x - kt), as a distribution
Problems 9

in jR2, satisfies the vibrating string equation. But U(Xl' X2) = log(xi + x~) as a
distribution in jR2 does not satisfy the Laplace equation. In fact

a2 u a2 u
axl2 +aX 22 =c6
(and in jR3 similarly
a2V a2v a2v
aXl2 + aX 22 + ax32 = c l 6
for V(Xl' X2, X3) = (xi + x~ + X~)-1/2) for certain constants c and Cl. These
facts are not just curiosities-they are useful in solving Poisson's equation

as we shall see.

1.4 Problems
1. Let H be the distribution in V' (jRl) defined by the Heaviside function

H(x) = {Io ~f
If
x> 0
x:::; O.
Show that if hn(x) are differentiable functions such that f hn(x)cp(x) dx
---+ (H, cp) as n ---+ 00 for all cp E V, then

f h~(x)cp(x) dx ---+ (6, cp).

Does the answer change if you redefine H(x) to be 1 at X = O?

2. Let f n be the distribution (fn, cp) = n (cp (~) - cp ( - ~) ) . What distri-
bution is limn->oo(fn,Cp)?
3. For any a > 0 show that

(fa, cp) = r- a + roo cp(x) dx + r cp(x) - cp(O) dx

Loo la Ixl La Ixl
is a distribution. (Hint: the problem is the convergence of the last integral
near X = o. Use the mean value theorem.)
4. Show (fa, cp) = f~oo iW dx for any cp E V(jRl) for which cp(O) = O.
5. What distribution is fa - fb for a > b> O?
10 What are Distributions?

6. For any a > 0 show

is a distribution.
7. Prove that ga does not depend on a, and the resulting distribution may
also be gi ven by

lim
a-+O+ }
r-
-00
a
+ 1
a
00
cp(x) dx.
X

8. Suppose I
is a distribution on Jm.1. Show that (F, cp) = (f, cpy), for cp E
V(Jm.2 ) ,where cpy (x) = cp( x, y) (here y is any fixed value), defines a
distribution on Jm.2.
9. Suppose I is a distribution on Jm.1. Show that (G, cp) = I~oo (f, Cpy) dy for
cp E V(Jm.2 ) defines a distribution on Jm.2. Is G the same as F in problem 8?
10. Show that

n=1

defines a distribution on Jm.1• (Hint: Is it really an infinite series?)

11. Why doesn't (f, cp) = cp(O? define a distribution?
12. Show that the line integral Ie
cpds along a smooth curve C in the plane
defines a distribution on Jm.2. Do the same for surface integrals in Jm.3.
13. Working with complex-valued test functions and distributions, define I to
be real-valued if (f, cp) = (f,rp). Show that (Re/,cp) = !((f, cp)+(f, rp))
and (Im/, cp) = i(
(f, cp) - (f, rp)) define real-valued distributions, and
I=Re/+ilmf. .
14. Suppose I and 9 are distributions such that (f, cp) = 0 if and hnly if
(g, cp) = O. Show that (f, cp) = c(g, cp) for some constant c.
The Calculus of Distributions

2.1 Functions as distributions

The next thing you have to learn is how to calculate with distributions. By
now you should have the idea that distributions are some sort of "function-like"
objects, some actually being functions and some like 8 and 8', definitely not
functions. Things that you are used to doing with functions, such as adding
them or differentiating them, should be possible with distributions. You should
also expect to get the same answer if you regard a function as a function or a
distribution.
Now let me backtrack a little and discuss more carefully the "identification"
of some functions with some distributions. If I is a function such that the
integral J I{x)cp{x) dx exists for every test function (the integral must exist
in an absolutely convergent sense-J I/{x)cp{x)1 dx must be finite-although it
may be an improper integral because I becomes infinite at some points), then

(j, cp) = I I{x)cp{x) dx

defines a distribution. Do two different functions define the same distribution?

They may if they are equal except on a set that is so small it does not affect
the integral (in terms of Lebesgue integration theory, if they are equal almost
everywhere). For instance, if

I (x) = {O1 ~f x
If x
#
=0
0
J
then I{x)cp{x) dx = 0 for all test function, so I and the zero function define
the same distribution (call it the zero distribution). Intuitively this makes sense-
a function that vanishes everywhere except at the origin must also vanish at the
origin. Anything else is an experimental error. Already you see how distribution
theory forces you to overlook useless distinctions! But if the functions 11 and
h are "really" different, say It > h on some interval, then the distributions It

11
12 The Calculus of Distributions

and h are really different: (JI, 'P) > (h, 'P) if'P is a nonnegative test function
vanishing outside the interval.
Now for some notation. A function f(x) defined on 0 for which I f(x)'P(x)dx
is absolutely convergent for every 'P E D(O) is called locally integrable, de-
noted f E Lloc(O). A criterion for local integrability is the finiteness of the
integral IB If(x)1 dx over all sets B ~ 0 that are bounded and stay away
from the boundary of O. To explain this terminology let me mention that an
integrable function on OU E LI(O)) is one for which InIf(x)1 dx is finite.
Clearly an integrable function is locally integrable but not conversely. For ex-
ample f(x) == 1 is locally integrable but not integrable on jRn, and f(x) = ;
is locally integrable but not integrable on 0 < x < 00.
For every locally integrable function f we associate a distribution, also de-
noted f, given by U,'P) = I f(x)'P(x)dx. We say that two locally integrable
functions are equivalent if as distributions they are equal. Thus by ignoring the
distinction between equivalent functions, we can regard the locally integrable
functions as a subset of the distributions, Lloc(O) ~ D'(O). This makes precise
the intuitive statement that the distributions are a set of objects larger than the
set of functions, justifying the term "generalized functions."
There are many interesting functions that are not locally integrable, for in-
stance 1/ x and 1/ Ixl on jRI (you should not be upset that I said recently that
l/x is locally integrable on 0 < x < 00; the concept of local integrability
depends on the set 0 being considered-O determines what "local" means). In
the problems you have already encountered distributions associated with these
functions; for example, a distribution satisfying

(J,'P) = ! 'P(x)
Txfdx,
whenever 'P(O) = O. However, this is an entirely different construction. You
should never expect that properties or operations concerning the function carry
over to the distribution, unless the function is locally integrable. For instance,
you have seen that more than one distribution corresponds to l/lxl-in fact, it is
impossible to single out one as more "natural" than another. Also, although the
function 1/ Ixl is nonnegative, none of the associated distributions are nonnega-
tive (a distribution f is nonnegative if (j, 'P) 2: 0 for every test function 'P that
satisfies 'P(x) 2: 0 for all x E 0). In contrast, a nonnegative locally integrable
function is nonnegative as a distribution. (Exercise: Verify this.) Thus you see
that this is an aspect of distribution theory where it is very easy to make mistakes.

2.2 Operations on distributions

Now we are ready to move forward. We have Lloc(O) ~ D'(O) and hence
D(O) ~ D'(O) since every test function is integrable. Any property or operation
Operations on distributions 13

on test functions may be extended to distributions. Now why did I say "test
functions" rather then "locally integrable functions"? Simply because there are
more things you can do with them: for example, differentiation. How does
the extension to distributions work? There are essentially two ways to proceed.
Both always give the same result.
At this point I will make the convention that operation means linear operator
(or linear transform), that is, for any cP E V, Tcp E V and

At times we will want to consider operations Tcp that may not yield functions
in V, but we will always want linearity to hold.
The first way to proceed is to approximate an arbitrary distribution by test
functions. That this is always possible is a remarkable and basic fact of the
theory. We say that a sequence of distributions 11,12, ... converges to the
distribution I if the sequence (fn, cp) converges to the number (f, cp) for all
test functions. Write this In -+ I, or say simply that the In approximate f.
Incidentally, you can use the limiting process to construct distributions. If Un}
is a sequence of distributions for which the limit as n -+ 00 of (fn, cp) exists
for all test functions cp, then (f, cp) = limn--+oo (fn, cp) defines a distribution and
of course In -+ f.

THEOREM 2.2.1
Given any distribution IE V'(O), there exists a sequence {CPn} of test functions
such that CPn -+ I as distributions.

Accepting this as true, if T is any operation defined on test functions we may

extend T to distributions as follows:

Meta-Definition 1: TI = lim n--+ oo TCPn if {CPn} is a sequence of test functions

such that I = limn--+oo cpn.
Of course for this to make sense, lim n--+ oo TCPn must be independent of the
choice of the sequence {CPn} approximating I. Fortunately, this is the case for
most interesting operators-it is a consequence of continuity.
To see how this definition works, let us consider two examples. In the first,
let T be a translation Tcp(x) = cp(x + y),y a fixed vector in JlRn, and cP E
V(JlRn). Now let I E V(JlRn ) be a distribution. How shall we define T I? We
must say what (T I, cp) is for an arbitrary test function cpo So we approximate
I = limn--+ oo CPn for CPn E V(JlRn ), which means

(f, cp) = nl!."!x,! CPn(X)CP(X) dx.

14 The Calculus of Distributions

According to the meta-definition, TI = limn->oo Tcpn, so

(TI,cp) = lim jTCPn(X)CP(X)dX.

n->oo
Since TCPn(x) = CPn(x + y) we can write this as

lim jCPn(X
n->oo
+ y)cp(x) dx.
If I is the Dirac O'-function I = 0', (0', cp) = cp(O), we may take the sequence
CPn to look like

-lin lin

FIGURE 2.1

Then CPn(x + y) as a function of x is concentrated near x = -y so that

lim j CPn(x
n->oo
+ y)cp(x) dx = cp( -y).
Thus (TO', cp) = cp( -y). (This is sometimes called the O'-function at -y.)
In general we can do a similar simplification by making a change of variable
x-+x-y:

(TI, cp) = nl~1lJo j CPn(x + y)cp(x) dx

= n->oo
lim J CPn(x)cp(x - y) dx

= (J, cp(x - y)).

Paraphrased: To translate a distribution through y, translate the test function

through -Yo
If you are puzzled by the appearance of the minus sign, please go over the
above manipulations again. It will show up in later computations as well (also
look back at the definition of 0").
Operations on distributions 15

Next example: T = d/dx (for simplicity assume n = 1). This is the infinites-
imal version of translation. If we write Ty cp( x) = cp( x + y) for translation and
Icp(x) = cp(x) for the identity, then

d . 1
- = hm-(Ty - 1).
dx y ..... Oy

So we expect to have

Since

(this was the first example) we have

But

(there's that minus sign again). So it stands to reason that

(notice that this defines a distribution since (d/ dx)cp is a test function whenever
cp is).
Let's do that again more directly. Let CPn -4 / , CPn E V(JRI ). Then ix CPn -4
d~ /, meaning

To simplify this, we integrate by parts (this is analogous to the change of variable

x -4 x - y in the first example):
16 The Calculus of Distributions

(there are no boundary tenns because cp(x) = 0 outside a bounded set). Substi-
tuting in we obtain

\ d~f, cp) = -;.~~ J CPn(x) d~ cp(x) dx

= - \f, d~ cp).

Again it is instructive to look at the special case f = b. Then we have CPn --4 b
where

({In

f ({In (x)d.x = 1

({In (0):= n

The shaded area is

-lhr

FIGURE 2.2

So J cP,:,(x)cp(x) dx ~ n (cp (2~)) - cP ((2~)) --4 -cp'(O) as n --4 00.

Exercise: Verify that for f E 1"(0),0 <:;;: ~n, we have

Adjoint identities 17

2.3 Adjoint identities

Now let us turn to the second method of defining operations on distribution, the
method of adjoint identities. An adjoint identity looks like)

! T'1jJ(x)cp(x)dx = ! '1jJ(x)Scp(x)dx

for all cp, '1jJ E V. Here T is the operator we are interested in, and S is an
operator that makes the identity true (in other words, there is in general no
recipe for S in terms of T). Suppose we are lucky enough to discover an
adjoint identity involving T. Then we may simply define (T f, cp) = (J, Scp).
The adjoint identity says this is true if f E V, and since any distribution f is
the limit of test functions, CPn --+ f, CPn E V, and we have (Tcpn, cp) = (CPn, Scp)
by the adjoint identity, we obtain

= (J, Scp).

Thus, if an adjoint identity exists, then we must have

(T f, cp) = (J, Scp)

by the first definition. This also shows that the answer does not depend on the
choice of the approximating sequence, something that was not obvious before.
In most cases we will use this second definition and completely by-pass the
first definition. But that means we first have to do the work of finding the adjoint
identity. Another point we must bear in mind is that we must have Scp E V
whenever cP E V, otherwise (J, Scp) may not be defined.
Now let us look at the two previous examples from this point of view. In both
cases the final answer gives us the adjoint identity if we specialize to f E V :

! Ty'1jJ(X)CP(x) dx = ! '1jJ(X)Lycp(X) dx

and

Of course both of these are easy to verify directly, the first by the change of

I Note that in Hilbert space theory the adjoint identity involves complex conjugates, while in
distribution theory we do not take complex conjugates even when we deal with complex-valued
functions.
18 The Calculus of Distributions

variable x -+ x - y and the second by integration by parts. From these we get

back

and

One more example: multiplication by a function m(x). The adjoint identity is

obvious:
J (m(x)1j;(x))<p(x) dx = J 1j;(x) (m(x)<p(x)) dx.

However, here we must be careful that m(x)<p(x) E V whenever <p(x) E V.

This is not always the case-what we need to assume is that m(x) is infinitely
differentiable (it is not necessary to assume anything about the "size" of m(x)
at 00 or near the boundary of fl). In that case we have
(m· J,.<p) = (J,m· <p).
Note that ifm is discontinuous at x = 0, say m(x) = sgn x, then m· 8 cannot
be sensibly defined, for by rights we should have (m . 8, <p) = <p(O)m(O), but
there is no consistent way to define m(O). Thus the product of two arbitrary
distributions is undefined.
Another thing to keep in mind is that the product m . ! for m E Coo may not
be what you first think it is. For instance, let's compute m . 8'. We have
(m . 8', <p) = (8', m . <p)
= - \8, d~(m<p))
= -(8,m'<p) - (8,m<p')
= -m'(O)<p(O) - m(O)<p'(O)
= (-m'(0)8 + m(0)8',<p)
so m . 8' = m(0)8' - m'(0)8 (not just m(0)8' as you may have guessed).
Now let us pause to consider the momentous consequences of what we have
done. We start with the class of locally integrable functions. We enlarge it
to the class of distribution. Suddenly everything has derivatives! Of course
if J E Lloc is regarded as a distribution, then the distribution (8/8xk)J need
not correspond to a locally integrable function. Nevertheless, something called
(8/8xk)! exists, and we can perform operations on it and postpone the question
of "what exactly is it" until all computations are completed. This is the point of
view that has revolutionized the theory of differential equations. In this sense we
are justified in claiming that distribution theory is the completion of differential
calculus.
Consistency of derivatives 19

2.4 Consistency of derivatives

At this point a very natural question arises: if (8/8xk)J exists in the ordinary
sense, does it agree with (8/ 8Xk) J in the distribution sense? Fortunately the
answer is yes, as long as (8/8xk)J exists as a continuous derivative.
To see this we had better change notation temporarily. Suppose J(x) E Lloc,
and let

g(x) = lim -hI (f(x + hek) - J(x))

h--O
(ek is the unit vector in the xk-direction) exist and be continuous. Then

J g(x)cp(x) dx = - J J(x) 8~k cp(x) dx

for any test function by the integration-by-parts formula in the xk-integral. How-
ever, the distribution (8/8xk)J is defined by

(8~k J,CP) = - (J, 8~k cp)

= - J J(x) 8~k cp(x) dx
so (8/8xk)J = g as distributions. Thus the two notions coincide.
You can go a little farther, as the following theorem asserts:

THEOREM 2.4.1
Let J(x) E Lloc (lRn) and let g( x) as above exist and be continuous except as a
single point y, and let g E LlocCllRn) (define it arbitrarily at the point y). Then,
if n ~ 2 the distribution derivative 8 J /8Xk equals g, while if n = 1 this is true
if, in addition, J is continuous at y.

The hypothesis of continuity when n = I is needed because d/ dx sgn x = 0

for all x =I 0 in the ordinary sense but d/ dx sgn x = 28 in the distribution
sense.
Let me indicate a brief proof. For simplicity, take y = O. First for n = I we
have

(d~J,CP) = - (J, d~ cp)

= - i: J(x)cp'(x) dx

= -lim r-€ + 1 J(x)cp'(x) dx.

e....... O}_oo f
00
20 The Calculus of Distributions

We have cut away a neighborhood of the exceptional point. Now we can apply
integration by parts:

I: j(x)cp'(x) dx = j( -€)cp( -E) - I: g(x)cp(x) dx

and

1= j(x)cp'(x) dx = - j(E)cp(€) -1= g(x)cp(x) dx.

This time there are boundary terms, but they appear with opposite sign. Adding,
we obtain

j( -€)cp( -E) - j(€)cp(€) - (I: + 1= g(x)cp(x) dX) .

Now let € -+ O. The first two terms approach j(O)cp(O) - j(O)cp(O) = 0 because
j is continuous. Thus

(d~j,CP) = !~l: + 1= g(x)cp(x)dx

= I: g(x)cp(x) dx

because 9 is locally integrable.

Exercise: What happens if j has a jump discontinuity at O?

For n 2: 2 the argument is simpler. Let's do n = 2 and aI aXI for instance.
We have

(a~/' cp) = - J j(x) :~ (x) dx.

We write the double integral as an iterated integral, doing the xI-integration
first:

Now for every X2, except X2 = 0, j(XI' X2) is continuously differentiable in XI,
so we integrate by parts:

Putting this back in the iterated integral gives what we want since the single
point X2 = 0 does not contribute to the integral.
Distributional solutions of differential equations 2I

2.5 Distributional solutions of differential equations

Now we are in a position to discuss solutions to differential equations from a
distribution point of view. Remember the vibrating strings equation:
8 2u(x, t) = k 28 2u(x, t)
?
8t2 8x 2
Both sides of the equation are meaningful if u is a distribution. If the equality
holds, we call u a weak solution. If u is twice continuously differentiable and
the equality holds, we call u a classical solution. By what we have just seen, a
classical solution is also a weak solution. From a physical point of view there
is no reason to prefer classical solutions to weak solutions.
Now consider a traveling wave u(x, t) = f(x - kt) for f E LlocCJlf). It is
easy to see that u E Lloc(JR2) so it defines a distribution. Is it a weak solution?

and

We want this to be zero. To see this we introduce new variables y = x - kt

and z = x + kt, so '

8(y,z) = ( 1
dx dt = 2k dy dz.
8(x, t)
Then
8 8 8 8 8 8
-
8x
= 8y
- +8z
- and - = - k - +k-
8t 8y 8z
so

Thus

JJf(x - kt) (:t22 - k2::2 )<p(x, t) dx dt

=-2k If 82<p
f(y)8y8z(y,z)dzd y .
22 The Calculus of Distributions

We claim the z integration already produces zero, for all y. To see this note that

r 8y8z
ia
82cp
b
(y, z) dz
8cp 8cp
= 8y (y, b) - 8y (y, a).

Thus

1 00

-00
82cp
-88 (y,z)dz = 0
y z
since cp and hence 8cp/8y vanishes outside a bounded set. Thus u(x, t) =
I(x - kt) is a weak solution.
In contrast, let's see if log(x 2+y2) is a weak solution to the Laplace equation

For u to be a weak solution means

((::2 + ::2) U(X, y), cp(X, y))

(::2 + ::2)
( U, cp ) = 0

for all test functions cpo It will simplify matters if we pass to polar coordinates,
in which case
82 82 8218182
-+-=-+--+--
8x 2 8y2 8r2 r 8r r2 80 2
and dxdy = rdrdO. Then for u = logx 2 +y2 = logr2 the question boils down
to
r21r
io
roo logr 2((82
io
18 182)
8r2 + ;- 8r + r2 802 cp(r,O)
)
r dr dO = o?
To avoid the singularity of u at the origin we extend the r-integration from E to
00 for E > 0 and let E --+ O. Before letting E --+ 0 we integrate by parts to put
all the derivatives on u. Since u is harmonic away from the origin we expect
to get zero from the integral term, but the boundary terms will not be zero and
they will determine what happens. So let's compute.

since 8cp / 80 is periodic. So that term is always zero. Next

Problems 23

Finally

Now
8 2
and -logr2 =-
8r r
so adding everything up we obtain

r21r fooflog r2 (8r2 + ;: 8r + r2 802

10
82 18 1 82 )
<p( r, O)r dr dO

= 121r 1 00
( _ ~ + ~) <p(r, 0) dr dO
+ 121r ( _ log f2 + log f2 + 2)<p( f, 0) dO

r 21r 8<p
+ 10 (-d Ogf2) 8r (f, 0) dO.
The first term is zero, as we expected, so

Since <p is continuous, <p( f, 0) approaches the value of <p at the origin as
f ---+ O. Thus the first term approaches 411"(0', <p). In the second term 8<p/8r
remains bounded while flog f2 ---+ 0 as f ---+ 0 so the limit is zero. Thus
~ log(x 2 + y2) = 411"0', so log(x 2 + y2) is not a weak solution of ~u = O.
Actually the above computation is even more significant, for it will enable us
to solve the equation ~u = f for any function f. We will return to this later.

2.6 Problems
1. Extend to distributions the dilations dr<p(x) = <p(rxt, rX2, ... , rXn)' (Hint:
J(dr1/J(x))<p(x) dx = r- n J 1/J(x)dr -t<p(x) dx).) Show that f(x) = Ixlt
24 The Calculus of Distributions

for t > -n is homogeneous of degree t (meaning dr f = rt I). Show that

the distribution

(f, <p) = r + /00 <p(x) dx

Loo
l

1 X

is homogeneous of degree -1, but

(g,<p) = r +/00 <p(x)

Loo
l

Ixl
dx +r <p(x) - <p(0) dx
1LI Ixl
l

is not homogeneous of degree - 1.

2. Prove that 8 is homogeneous of degree -no
3. Prove that

for any distribution f.

4. Prove
8 8m 8
-(m·1) = - ·f+m-f
8xj 8xj 8xj
for any distribution f and Coo function m.
5. Show that if f is homogeneous of degree t then Xj . f is homogeneous of
degree t + 1 and (81 8x j ) f is homogeneous of degree t - 1.
6. Define cp(x) = <p( -x) and (1, <p) = (f, cp). Show this definition is consis-
tent for distributions defined by functions. What is 8? Call a distribution
i i
even if = f and odd if = - f. Show that every distribution is uniquely
the sum of an evenand odd distribution (Hint: Find a formula for the even
and odd part.) Show that (f, <p) = 0 if f and <p have opposite parity (i.e.,
one is odd and the other is even).
7. Let h be a Coo function from lm,1 to lm,1 with h'(x) =I 0 for all x. How
would you define f 0 h to be consistent with the usual definition for
functions?
8. Let
R _ (cosO - sinO)
o- sin 0 cos 0

denote a rotation in the plane through angle O. Define (f 0 Ro, <p) =

(f, <p 0 R-o). Show that this is consistent with the definition of foRo for
functions. If (f, <p) = 1::'00 <p(X, 0) dx, what is f 0 R 1r / 2?
Problems 25

9. Say that f is radial if f 0 R(J = f for all O. Is 0 radial? Show that

(JR, <p) = 2~ f0211: (J 0 R(J, <p) dIJ defines a radial distribution. Show that f
is radial if and only if f = fR.
10. Show that if f is a radial locally integrable function then f is radial as a
distribution.
11. Show that if f is a radial distribution then
8f 8f
x - - y - =0.
8y 8x
12. Show that if f is a homogeneous distribution of degree t on jRn then

13. Show that (O(k), <p) = (-1 )k<p(k) (0) in jRI.

14. In jR2 define (J,<p) = f~oofooo<p(x,y)dxdy. Compute 8f/8x and

8f/8y.
15. In jR2 define (J,<p) = fooo fooo <p(x,y)dxdy. Compute 8 2 f/8x8y.
16. In jRI compute mO" explicitly for m a Coo function.
17. Show that every test function <p E V(jRI) can be written <p = 'IjJ' + c<po
where <Po is a fixed test function (with f~oo <po(x) dx #- 0), 'IjJ E V and c a
constant. (Hint: Choose 'IjJ(x) = f~oo (<p( t)-c<po( t)) dt for the appropriate
choice of c.) Use this to prove: If f is a distribution on jRI satisfying
l' = 0 then f is a constant.
18. Generalize problem 17 to show that if f is a distribution on jRI satisfying
f(k) = 0 then f is a polynomial of degree < k.
19. Generalize problem 17 to show that if f is a distribution on jRn satisfying
(8/8xk)f = 0, k = 1, ... , n then f is constant.
20. Show that every test function <p E V(jRI) can ;e written <p(x) = x'IjJ(x) +
c<po(x) where <Po is a fixed test function (witt. (0) #- 0), 'IjJ E V, and c is
a constant. Use this to show that if f is a distribution satisfying xf = 0,
then f = co.
21. Generalize problem 20 to show that if f is a distribution on jRn satisfying
x j f = 0, j = 1, ... , n, then f = cO.
Fourier Transforms

3.1 From Fourier series to Fourier integrals

Let's recall the basic facts about Fourier series. Every continuous function f(x)
defined on]Rl and periodic of period 27r (f(x )Jmay take real or complex values)
has a Fourier series written
00

f(x) '" L ak eikx

k=-oo

(We write '" instead of = because without additional conditions on f, such as

differentiability, the series may diverge for particular values of x.) The coeffi-
cients in this expansion are given by

(the integral may be taken over any interval of length 27r ~ince both f(x) and
e- ikx are periodic) and satisfy Parseval's identity

Perhaps you are more familiar with these formulas in terms of sines and
cosines:

f '" ~ +f bk coskx + Ck sinkx.

k=l

It is easy to pass back and forth from one to the other using the relation eikx =
cos kx + i sin kx. Although the second form is a bit more convenient for dealing
with real-valued functions (the coefficients bk and Ck must be real for f(x) to be

26
From Fourier series to Fourier integrals 27

real, whereas for the coefficients ak the condition is a_k = ak), the exponential
form is a lot more simple when it comes to dealing with derivatives.
Returning to the exponential form, we can obtain a Fourier series expansion
for functions periodic of arbitrary period T by changing variable. Indeed, if
f(x + T) = f(x) then setting

we have

so F is periodic of period 211". The Fourier series for F is

-00

ak = 21111" . dx
F(x)e- tkX
11" -11"

Substituting

and making the change of variable x -+ (211" jT)x yields

-00

Now suppose f(x) is not periodic. We still want an analogous expansion.

Let's approximate f(x) by restricting it to the interval S x -f
and s f
extending it periodically of period T. In the end we will let T -+ 00. Before
28 Fourier Transforms

passing to the limit let's change notation. Let ~k = 21rk/T and let g(~k) =
(T/21r)ak. Then we may rewrite the equations as

1 jT/2 .
g(~k) = -2 f(x)e-'X~k dx,
1r -T/2

f(x) '" f
-00
g(~k)eixek ~ = f
-00
g(~k)eix~k (~k - ~k-d,

Written in this way the sums look like approximations to integrals. Notice
that the grid of points {~d gets finer as T --+ 00, so formally passing to the
limit we obtain

where I have written '" instead of = since we have not justified any of the
above steps. But it turns out that we can write = if we impose some conditions
on f. The above formulas are referred to as the Fourier transform, the Fourier
inversion formula, and the Plancherel formula, although there is some dispute
as to which name goes with which formula.
The analogy with Fourier series is clear. Here we have f(x) represented as
an integral of the "simple" functions eixf., ~ E a
instead of a sum of "simple"
functions eikx , k E Z, and the weighting factor g(~) is given by integrating
f(x) against the complex conjugate of the "simple" function eix~ just as the
coefficient ak was given, only now the integral extends over the whole line
instead of over a period. The third formula expresses the mean-square of f(x)
in terms of the mean-square of the weighting factors g(~).
Notice the symmetry of the first two formulas (in contrast to the asymmetry
of Fourier series). We could also regard g(~) as the given function, the second
formula defining a weighting factor f(x) and the first giving g(~) as an integral
of "simple" functions e-ix~; here ~ is the variable and x is a parameter indexing
the "simple" functions.
Let us take this point of view and simultaneously interchange the names of
some of the variables.
The Schwartz class S 29

Definition (formal): Given a complex-valued function f(x) of a real

variable, define the Fourier transform of f, denoted Ff or j, by j(~) =
f~oo f(x)e ixe dx and the inverse Fourier transform, denoted F- I f or j, by
j(~) = 2~ f~oo f(x)e- ixe dx (note that j(~) = 2~j(-~))·
Then the formal derivation given above indicates that we expect
F-I(Ff) = f, F(F-If) =f (Fourier inversion formula)

and

i: If(x)1 2 dx = 2~ i: Ij(~)12 d~ (Plancherel formula).

Warning: There is no general agreement on whether to put the minus sign with
j or f, or on what to do with the 1/21r. Sometimes the 1/21r is put with F- I ,
sometimes it is split up so that a factor of 1/ V2ii is put with both F and F- I ,
and sometimes the 211" crops up in the exponential (f e27rixe f(x) dx), in which
case it disappears altogether as a factor. Therefore, in using Fourier transform
formulas from different sources, always check which definition is being used.
This is a great nuisance-for me as well as for you.
Returning now to our Fourier inversion formula FF- I f = f (or F- IF f =
f; they are equivalent if you make a change of variable), we want to know for
which functions f it is valid. In the case of Fourier series there was a smoothness
condition (differentiability) on f for f(x) = ~akeikx to hold. In this case
we will need more, for in order for the improper integral f~oo f(x)e ike dx to
exist we must have f(x) tending to zero sufficiently rapidly. Thus we need a
combination of smoothness and decay at infinity. We now tum to a discussion
of a class of functions that has these properties (actually it has much more than
is needed for the Fourier inversion theorem).

3.2 The Schwartz class S

Eventually we will want to deal with functions of several variables f(x)
f (XI, ... , x n ). We say such a function is rapidly decreasing if there are constants
Mn such that

for N = 1,2,3, ....

Another way of saying this is that after multiplication by any polynomial
p(x), p(x)f(x) still goes to zero as X -+ 00. A Coo function is of class S(JRn)
if f and all its partial derivatives are rapidly decreasing. Any function in V(JRn)
also belongs to S(JRn), but S(JRn) is a larger class of functions (it is sometimes
called "the Schwartz class"). A typical function in S(JRn) is e- 1xI2 ; this does
not have bounded support so it is not in V(JRn). (To verify that e- 1x12 E S(JRn),
30 Fourier Transfonns

note that any derivative is a polynomial times e- 1xI2 , and e- 1x12 decreases fast
enough as x -+ 00 that it beats out the growth of any polynomial.)
We need the following elementary properties of the class S(]Rn):

1. S(]Rn) is a vector space; it is closed under linear combinations.

2. S(]Rn) is an algebra; the product of functions in S(]Rn) also belongs to
S(]Rn) (this follows from Leibniz' formula for derivatives of products).
3. S(]Rn) is closed under multiplication by polynomials (this is not a special
case of 2) since polynomials are not in S.
4. S(]Rn) is closed under differentiation.
5. S(]Rn) is closed under translations and multiplication by complex expo-
nentials eix.~.
6. S(]Rn) functions are integrable: JlR.n If(x)1 dx < 00 for f E S(]Rn). This
follows from the fact that If(x)1 ~ M(l + Ixl)-(n+l) and

f (1 + Ixl)-(n+l) dx = (polar coordinates)

J]Rn

c 1
00
(1 + r)-n-lrn-I dr < 00

(decreases like r- 2 at infinity).

3.3 Properties of the Fourier transform on S

The reason S(]Rn) is useful in studying Fourier transforms is that j E S(]Rn)
whenever f E S(]Rn).
Actually I have only given you the definition of j for ]RI, but for ]Rn the
situation is similar:

j(~) = f f(x)eix.~ dx
J]Rn
where x,~ E]Rn ,x· ~ = xl6 + ... + xn~n and

!(~) = (2~)n kn f(x)e-ix.~ dx.

It takes a little bit of work to show that :F preserves the class S(]Rn). In
the process we will derive some very important formulas. First observe that the
Fourier transform is at least defined for f E S(]Rn) since the integrand f(x)e ix .{
has absolute value If(x)1 and this has finite integral. Next we have to see how
Properties of the Fourier transform on S 31

j changes as we do various things to f. As we do this let's keep in mind the

symmetry of the Fourier transform and its inverse. If doing "ping" to f does
"pong" to j, then we should expect doing "pong" to f does "ping" to j, except
for those factors of 27r and that minus sign.

1. Translation:

Tyf(x) = f(x + y)
F(Tyf)(~) = f f(x + y)eix.f. dx
Ixw.n
= r
}IJRn
f(x)ei(x-y).f. dx

(change of variable x --+ x - y).

But ei(x-y).f. = e(ix.f.-iy.f.) = e-iy·f.eix.f. and we can take the factor e-iy.f.
outside the integral since it is independent of x. So

(translation by the vector y goes over to multiplication by the exponential e-iy.f.

on the Fourier transform side).

2. Dually, we expect multiplication by an exponential to go over to translation

on the Fourier transform side. Note the minus sign is gone.

F(e ix .yf(x))(~) = J eix .y f(x)eix.f. dx

= J f(x)eix.(f,+y) dx

= Ff(~ + y) = TyFf(x).

3. Differentiation:

F (8~k f) (~) = J ::k

eix.f. (x) dx

= - Jf(x)~(eiX.f.)
8Xk dx

= -i~k J f(x)eix.f. dx

= -i~kFf(~)
where we have integrated by parts in the Xk variable (the boundary terms are
zero because f is rapidly decreasing).
32 Fourier Transforms

Exercise: Derive the same formula by using I above and

Thus differentiation goes over to multiplication by a monomial. More gener-

ally, by iterating this formula, partial derivatives of higher order go over to
multiplication by polynomials. We can write this concisely as

F (p (:X) f) = p(-iOFf(~)
for any polynomial p, where p (a/ax) means we replace Xk by a / aXk in the
polynomial.

4. Dually, we expect multiplication by a polynomial to go over to differentiation:

a~k Ff(~) = a~k ! f(x)eix.~ dx

= ! f(x)~eik'~
a~k
dx

= ! ixk!(x)eix.~ dx = iF(Xkf(x))

(the interchange of the derivative and integral is valid because f E S). Iterating,
we obtain

F(p(x)f(x))(O = p (-i :~) Ff(O·

5. Convolution: If f, 9 E S, define the convolution f * 9 by

f * g(x) = ! f(x - y)g(y) dy.

By a change of variable y --+ x - y this is equal to f f(y)g(x - y) dy so

f * 9 = 9 * f·
Exercise: Verify the associative law for convolutions f * (g * h) = (f * g) * h
by similar manipulations.
Because of the rapid decrease of f and g, the convolution integral is defined.
It is possible (although tricky) to show that if f, 9 E S then f * 9 E S. One
Properties of the Fourier transform on S 33

important property of convolutions is that derivatives may be placed on either

factor (but not both):

8 8Xk 1 * g(x) = 88
Xk
! I(x - y)g(y) dy

= ! :!k (x - y)g(y) dy

= 881 * g(x)
Xk

but also

88Xk 1 * g(x) = 88
Xk
! I(y)g(x - y) dy

= ! ::k
I(y) (x - y) dy

8g
=I*-(x).
8Xk

Convolutions are important because solutions to differential equations are

often given by convolutions where one factor is the given function and the
other is a special "kernel."
Under Fourier transforms the convolution goes over to multiplication. To see
this, start with the definition

F(I*g)(O = !! I(x-y)g(y)dyeix·edx.

Write eix·e = ei(x-y)·e eiy·e and interchange the order of integration:

For each fixed y make the change of variable x --+ x +y in the inner integral
and you have just the Fourier transform of I. Thus

F(I * g)(~) = ! FI(~)g(y)eiy·e dy

= FI(~)Fg(~).

6. Dually, we expect multiplication to go over to convolution. This cannot be

verifed directly but must use the Fourier inversion formula! We can rewrite
FI·Fg=F(I*g) as

F-1/' F-1g = (2!)n F-1(I * g),

34 Fourier Transforms

since
-I 1 (
F F(x) = (21r)nFF -x).

Substituting j for f and 9 for g yields

f . g = J-r-If" . J-r-I"g = --J
1 -r-I(f" * g") .
(21r)n
Thus

F(f· g) = F C2~)n F-1(j * g))

1 "
= (27r)n f * g.

Exercise: Use this kind of argument to pass from 1 to 2 above.

We now collect together all these identities in a "ping-pong" table:

II Function (x) Fourier transform ( 0 II

f(x) ](~)
I I
f(x + y) e-iY'~jw

eix .y f(x) j(~ + y)

af (x) - i~kj(~)
ax/;.
p (:xl f(x) p( -i~k)j(~)
II
xkf(x) -i aj (~)
as", II
p(x)f(x) p ( -i :sl j(~) II
f * g(x) j(~)g(~)
II
1 "
f(x)g(x) (27r}n f *g(O
II II

Many of the above identities are valid for more general classes of functions
than S. We will return to this question later.
We should pause now to consider the significance of this table. Some of the
operations are simple, and some are complicated. For instance, differentiation is
The Fourier inversion formula on S 35

a complicated operation, but the corresponding entry in the table is multiplication

by a polynomial, a simple operation. Thus the Fourier transform simplifies
operations we want to study. Say we want to solve a partial differential equation
p (%x) U = f, f given. Taking Fourier transform this equation becomes

p( -i~)u(~) = j(~)'.
This is a simple equation to solve for u; we divide
1 A

u(~) = p( -iO f(~)·

If we want u we invert the Fourier transform

From the table we know

1 A

p( -i~/(~) = F(g * 1)

i.e., if

g = F- 1 (P(~i~)) .
Thus u = g * f. Thus the problem is to compute

F-I(_1
p(-iO
).
In many cases this can be done explicitly. Of course this is an oversimplification
because in general it is difficult to make sense of F-I(I/p(-i~)), especially
if p( -i~) has zeroes for ~ E lIRn. At any rate, this is the basic idea of Fourier
analysis-by using the above table many diverse sorts of problems may be
reduced to the problem of computing Fourier transforms.

3.4 The Fourier inversion formula on S

Now we return to our goal of showing that if f E S then j E S. We have all
the ingredients ready except for the following simple estimate:

Ij(~)1 ::; J If(x)1 dx

36 Fourier Transforms

whenever f integrable. This is because

11(01 = II f(x)eix·e dxl

:; I If(x)eix·el dx

and leix·el = 1.
1
Now let f E S. We want to show that is rapidly decreasing. That means that
p(~)I(~) remains bounded for any polynomial p. But by the table p(~)I(O =
F(p(itx)f) and so

But

is rapidly decreasing so the integral is finite. To summarize: if f E S then 1 is

1
rapidly decreasing. To show E S we must also show that all derivatives of 1
are also rapidly decreasing. But by the table any derivative of 1 is the Fourier
transform of a polynomial times f :

p (:~) I(~) = F(p(ix)f(x)).

But multiplying f by a polynomial produces another function in S : p(ix)f(x) =

g(x) E S. By what we have just done g(~) is rapidly decreasing so p( te )I(~) =
g(~) is rapidly decreasing. Thus f E S.
The essence of the above argument is the duality under the Fourier transform
of differentiability and decrease at infinity. Roughly speaking, the more deriva-
tives f has, the faster 1 decreases at infinity, while the faster f decreases at
1
infinity the more derivatives has.
More precisely the estimates look like this:
(a) If f has derivatives of orders up to N that are all integrable, then 1
decreases at infinity like I~I-N.
(b) If f decreases at infinity like I~I-N then 1 has continuous and bounded
derivatives of all orders up to N - n - 1.
The trouble with estimates (a) and (b) is that they are not comparable-there
is a loss of n + 1 derivatives in (b), and while in (a) the derivatives must be
integrable, in (b) they tum out to be just bounded. These defects disappear when
we consider the class S because there are an infinite number of derivatives and
a decrease at infinity faster than I~I-N for any N.
The Fourier inversion formula on S 37

Now that we know j E S whenever f E S we can be sure that F- 1 j is

defined, and so the Fourier inversion formula F- 1 j = f (and also F = 1) i
is at least plausible. We have already seen a derivation of this formula from
Fourier series. At this point I want to give one more explanation of why it is
true, based on the idea of "summability," which is important in its own right.
For simplicity I will restrict the discussion to the case n = 1.
We could attempt to write the Fourier inversion formula as a double integral,
namely
f(x) = ~ [00 [00 e-i(x-y).~ f(y) dy df
21r J-00 J-oo
The trouble with this double integral is that it is not absolutely convergent; if
you take the absolute value of the integrand you eliminate the oscillating factor
e-i(x-y).~ and the resultant integral

clearly diverges. This means we cannot interchange the order of integration in

the usual sense, although later we will do exactly this in the distribution sense.
Proceeding more cautiously, we throw a Gaussian factor e-tl~12 into the inte-
gral:
~ [00 [00 e-i(x-Y)'~f(y)e-tl~12 dyd~.
21r Loo Loo
The result is no longer equal to f(x), of course, but if we let t -+ 0 the factor
e-tl~12 tends to one, so we should obtain f(x) in the limit. Of course nothing
has been proved yet. But the double integral is now absolutely convergent and
so we can evaluate it by taking the iterated integral in either order. If we take
the indicated order we obtain

~ [00 e-ix.~ j(~)e-*12 d~,

27r J- oo
while if we reverse the order of integration we obtain

where
Gt(x _ y) = ~ [00 e-i(x-Y)'~e-tl~12 d~,
27r Loo
which just means that G t is the inverse Fourier transform of the Gaussian
function e-tl~12 .
38 Fourier Transforms

In a moment we will compute Gt(x) =k e-lxI2/4t. Granted this, we have

~
271"
1 00

-00
e- ix .€ j(~)e-tl€12 d~ = 1
00

-00
f(y)Gt(x - y) dy

for all t > 0, and we take the limit as t -+ O. The limit of the left side is
2~ f e-ix'€j(Od~, which is just F-'Ff(x). The limit on the right side is
f (x) because it is what is called a convolution with an approximate identity,
G t * f(x). The properties of Gt that make it on approximate identity are the
following:

1. f~oo Gt(x) dx = 1 for all t

2. Gt(x) ~ 0
3. IimG t (x) = 0 if x =f 0, uniformly in Ixl ~ f for any f> O.
t-+O

Indeed 2 and 3 are obvious, and 1 follows by a change of variable from the
formula

i: e- x2 dx = v'7r
which we will verify in the process of computing the Fourier transform of the
Gaussian.
Thus the Fourier inversion formula is a direct consequence of the approximate
identity theorem: if Gt is an approximate identity then Gt * f -+ f as t -+ 0
for any f E S. To see why this theorem is plausible we have to observe that the
convolution Gt * f can be interpreted as a weighted average of f (by properties
1 and 2, with Gt(x - y) as the weighting factor (here x is fixed and y is the
variable of integration). But by property 3 this weighting factor is concentrated
around y = x, and this becomes more pronounced as t -+ 0, so that in the limit
we get f(x).

3.5 The Fourier transform of a Gaussian

Now we compute the Fourier transform and inverse Fourier transform of a
Gaussian. We can do the computation in n dimensions, and in fact there is an
advantage to allowing general n. You see, the Gaussian function e-tlxl2 (with
t > O-we may even allow it to be complex provided Re t > 0) is a function in
S(JRn) that is radial (it depends only on Ixi) and is a product of one-dimensional
functions

It is possible to show that these two properties essentially characterize this

function (every radial function in S(JRn) that is a product of one-dimensional
The Fourier transform of a Gaussian 39

functions is of the form ce-tlxl2 for some constant c and complex t with Re t >
0). Since the Fourier transform preserves these properties we expect the Fourier
transform to have the same form. But we give a more direct computation.
The method used is the calculus of residues. We have

f(x) = e-tlxl2

j(O = { e-tlxl2 eix.~ dx

JlRn
= foo .. ,foo e-txi+ixlele-tx~+ix26 ... e-tx~+ixn~n dXl ... dX n
-00 -00

(i: e-txi+ixl~1 (i: e-tx~+iX26

dXl) dX2)

... (i: e-tx~+ixnen dxn)

so the problem is reduced to a one-dimensional Fourier transform

Now the exponent -tx2 + ix~ is a quadratic polynomial in x, so it is natural

to "complete the square":

-tx
2.
+ zx~ = -t (i~)2
x - 2t
e
4t

Now it is tempting to make the change of variable x ---+ x + i~/2t in this last
integral to eliminate the dependence on ~. But this is a complex change of
variable and must be justified by a contour integral. Consider the contour C R in
the following figure for ~ < 0 (if ~ > 0 the rectangle will lie below the x-axis).
Since e- tz2 is analytic,

by the Cauchy integral theorem. The integral along the x-axis side is
40 Fourier Transforms

5
2t

-R o +R

FIGURE 3.1

The integral along the parallel side is

_jR e- t (x- i e/ 2t )2 dx,

-R
the minus sign coming from the fact that along C R this side is traced in the
negative direction. The integrals along the vertical sides tend to zero as R --+
00 : e- tz2 = e- t (x 2_ y 2)-2itxy so le- tz2 1 :::; e- t (x 2- y \ on the vertical sides
e- tx2 = e- , which tends to zero while e is at most et (e/ 2t )2 which is
tR2 ty2
bounded independent of R, so the absolute value of the integrand tends to zero
and the length of the sides is fixed.
Thus as R --+ 00 we get

I: e- tx2 dx - I: e- t (x- i e/ 2t )2 dx = O.

Our computation will be complete once we evaluate g(t) = J::"oo e- tx2 dx.
There is a famous trick for doing this:

g(t)2 = I: I: e- tx2 dx . e- ty2 dy

= 11
00

-00
00

-00
e- t (x2+y2) dx dy.

Switching to polar coordinates,

g(t)2 = 1211" 1 00
e- tr\ dr dO

= 271" 1 00
e- tr2 r dr

~-2~e~:T t
Problems 41

so g(t) = vfi7t. Thus we have

and so

and
_1_ { e-tl~12e-ix.~ d~ = 1 e-lxI2/4t.
(27r)n iRn (47rt)n/2
An interesting special case is when t = 1/2; then if f(x) = e-lxI2/2 we have
Ff = (27r)n/2f.
This formula is the starting point for many Fourier transform computations,
and it is useful for solving the heat equation and SchrOdinger's equation.

3.6 Problems
1. Compute e-slxl2 * e-tlxl2 using the Fourier inversion formula.
2. Show that F(drf) = r-ndl/rFf for any f E S where drf(x) =
f(rxI, . .. , rx n ).
3. Show that f(x) is real valued if and only if j( -~) = j(~).
4. Compute F(xe- tx2 ) in JR1.
5. Compute F( e-ax2+bx+c) in JRI for a > O. (Hint: Complete the square
and use the ping-pong table.)
6. Let

f(x) = { ~ if - a < x
otherwise
<a

in JR1 • Compute F f (note f is not in S but it is integrable).

7. Compute

in JRI in terms of Ff.

8. Let
Ro = ( C?s B - sin B )
sm B cos B
be the rotation in JR2 through angle B. Show F(f 0 Ro) = (Ff) 0 Ro.
Use this to show that the Fourier transform of a radial function is radial.
42 Fourier Transforms

9. If f E S(ll~l) is even, show that 1(lxl) E S(JRn).

10. Show that e-Ixl a
is in S(JR1 ) if and only if a is an even positive integer.
11. Show that frr~n I * g(x) dx = frr~n I(x) dx . frr~.n g(x) dx.
12. Compute the moments J~oo xk I(x) dx of a function I E S(JR1 ) in terms
of the Fourier transform j.
13. Show that Fj = FI for j(x) = f( -x).
14. Show that F4 I = (21r )2n f. What does this say about possible eigenvalues
A for F (i.e., FI = AI for nonzero IE S(JRn ))?
15. Show that (i+ x) 1= 0 has ce- x2 / 2 as its solution. Using the unique-
ness theorem for ordinary differential equations, we can conclude that
there are no other solutions. What does this tell us about F( e- x2 /2)?
16. What is the Fourier transform of (d: - x)k e- 2? x2 /

17. Let I E S(JR1 ) and let

g(x) = L I(x + 21rk).

k=-oo
Then 9 is periodic of period 271". What is the relationship between the
Fourier series of 9 and j?
18. Show that J~oo x 1/(xW dx 2 = 2~ J~oo 11'(~)12 d~ and J~oo 1f'(xW dx =
2~ J~oo elj(~w df
-- I A -A-

19. Show that frr~n I(x)g(x) dx = (211")n frr~.n 1(~)g(O df

20. If I E S is positive, show that Ij(~)12 achieves its maximum value at
~ = o.
21. Let <p(x) = l(x,O) for I E S(JR2 ). What is the relationship between <p
and j?
Fourier Transforms of
Tempered Distributions

4.1 The definitions

You may already have noticed a similarity between the spaces Sand V. Since
distributions were defined to be linear functionals on V, it seems plausible that
linear functionals on S should be of interest. They are, and they are called
tempered distributions. As the nomenclature suggests, the class of tempered
distributions (denoted S'(JRn )) should be a subclass of the distributions V' (JRn ).
This is in fact the case: any f in S' (JRn ) is a functional (J, cp) defined for all
cp E S(JRn ). But since V(JRn ) ~ S(JRn ) this defines by restriction a functional
on V(JRn), hence a distribution in V' (JRn ) (it is also true that the continuity of
f on S implies the continuity of f on V, but I have not defined these concepts
yet). Different functionals on S define different functionals on V (in other
words (J, cp) is completely determined if you know it for cp E V) so we will be
sloppy and not make any distinction between a tempered distribution and the
associated distribution in V'(JRn). We are thus thinking of S'(JRn) ~ V'(JRn).
What is not true is that every distribution in V' (JRn ) corresponds to a tempered
distribution. For example, the function e x2 on JRl defines a distribution

(this is finite because the support of cp is bounded). But e- x2 / 2 E S(JR1 ) and

we would have

rOO 2
= i- oo eX /2 dx = +00

43
44 Fourier Transforms of Tempered Distributions

so there is no way to define f as a tempered distribution. (In fact, it can be

shown that a locally integrable function defines a tempered distribution if

r
A"I~A
If(x)1 dx S cAN as A ~ 00

for some constants c and N, and this condition is necessary if f is positive.)

Exercise: Verify that if f satisfies this estimate then In~.n If(x)cp(x)1 dx < 00
for all cp E S(JRn ) so In~n f(x )cp(x) dx is a tempered distribution.
Now why complicate things by introducing tempered distributions? The an-
swer is that it is possible to define the Fourier transform of a tempered dis-
tribution as a tempered distribution, but it is impossible to define the Fourier
transform of all distributions in V' (JRn ) as distributions.
Recall that we were able to define operations on distributions via adjoint
identities. If T and S were linear operations that took functions in V to functions
in V such that

J (T'IjJ(x))cp(x) dx = J 'IjJ(x)Scp(x) dx

for 'IjJ, cp E V we defined

(T f, cp) = (f, Scp)

for any distribution f E V'. The same idea works for tempered distributions.
The adjoint identity for 'IjJ, cp E S is usually no more difficult than for 'IjJ, cp E V.
The only new twist is that the operations T and S must preserve the class S
instead of V. This is true for the operations we discussed previously with one
exception: Multiplication by a COO function m(x) is allowed only if m(x) does
not grow too fast at infinity; specifically, we require m(x) S clxl N as x ~ 00
for some c and N. This includes polynomials but excludes e1x12, for e1xl2 e-1x12 /2
is not in S while e-lxI2/2 E S.
But in dealing with the Fourier transform it is a real boon to have the class
S: If cp E S then Fcp E S, while if cp E V it may not be true that Fcp E V
(surprisingly it turns out that if both cp and Fcp are in V then cp = O!!) So all
that remains is to discover an adjoint identity involving F. Such an identity
should look like

J ?j;(x)cp(x) dx = J 'IjJ(x)Scp(x) dx

where S is an as-yet-to-be-discovered operation.

The definitions 45

To get such an identity we substitute the definition "j;( x) = J 'IjJ(y)e ix .y dy

and interchange the order of integration

/ "j;(x)cp(x) dx = / / 'IjJ(y)e ix .y dy cp(x) dx

= / 'IjJ(y) ( / cp(x)e ix .y dX) dy

= J 'IjJ(y)cjJ(y) dy.

We may rename the variable y to obtain

J "j;(x)cp(x) dx = J 'IjJ(x)cjJ(x) dx.

This is our adjoint identity.

In passing let us note that the Plancherel formula is a simple consequence
of this identity. Just take A'IjJ(X) = cjJ(x). We have 'IjJ(x) = J cp(y)e- ix .y dy =
(21r)n .1'-1 (<.O)(x). Then 'IjJ(x) = (21r)n.1'.1'- 1<.O = (21r)ncp(x), so the adjoint
identity reads

(21rt J Icp(x)1 2 dx = J IcjJ(x)1 2 dx.

Now the adjoint identity allows us to define the Fourier transform of a tem-
pered distribution 1 E S'(Jm.n ) : (j, cp) = (J, cjJ). In other words, j is that
functional on S that assigns to cp the value (J, cjJ). If 1 is actually a function
in S then j is the tempered distribution identified with the function j (x). In
other words ,this definition is consistent with the previous definition, since we
are identifying functions 1(x) with the distribution

(f, cp) = J I(x)cp(x) dx.

In fact, more is true. If 1 is any integrable function we could define the

Fourier transform of 1 directly:

j(~) = / I(x)eix·e dx.

Now j is a bounded continuous function, so both 1 and j define tempered

distributions. The adjoint identity continues to hold:

J j(x)cp(x) dx = J l(x)cjJ(x) dx

for cp E S so that j is the distribution Fourier transform of I.

The Fourier inversion formula for tempered distributions takes the same
form as for functions in·S : .1'-1.1'1 = 1 and .1'.1'-11 = 1 with .1'-11 =
46 Fourier Transforms of Tempered Distributions

(27r)-n(Fjt where the operation j for distributions corresponds to j(x) =

j( -x) for functions and is defined by (j, ep) = (f, rp). To establish the Fourier
inversion formula we just do some definition chasing: since ep = FF-1ep for
ep E S we have
(f,ep) = (f,FF-1ep) = (Fj,F-1ep)
= (27r)-n(Fj, (Fepn = (27r)-n((Ff)~ Fep)
= (F- 1 j, Fep) = (FF- 1j, ep).

So FF- 1 j = j, and similarly for the inversion in the reverse order.

4.2 Examples
1. Let j = O. What is I?
We must have c,O(O) = (0, c,O) = (I,
ep). But by
definition c,O(O) = f ep(x) dx so 1 == 1. In this example j is not at all smooth,
1
so has no decay at infinity. But j has rapid decay at infinity so is smooth. 1
2. Let j = 0' (n = 1). Since j = (djdx)o and g = 1 we would like to use our
"ping-pong" table to say I(~) = -i~ . g(~) = -i~. This is possible-in fact,
the entire table is essentially valid for tempered distributions (for convolutions
and products one factor must be in S). Let us verify for instance that

F (8~k j) = (-iXk)1
for any j E S(Jm.n ). By definition

By definition

( 8~k j, c,O ) = - (j, 8~k c,O ) .

But c,O E S so by our table (8j8xk)c,O = F(ixkep). Thus

(F (8~k j) ,ep) = -(f,F(ixkep)).

Now use the definitions of F f:

-(f,F(ixkep)) = -(/,ixkep).
Finally, use the definition of multiplication by -iXk:

-(/,iXkep) = (-iXkl,ep).
Examples 47

Altogether then,

which is to say F( a~J) = -iXk j. The other entries in the table are verified
by similar "definition chasing" arguments. Note that 8' is somewhat "rougher"
than 8, so its Fourier transform grows at infinity.

3. Let f(x) = eislxl2, s=/:.O real. Then f is a bounded continuous function,

hence it defines a tempered distribution, although f is not integrable so that
f f(x)eix.~ dx is not defined.
In this example the definition of j is not very helpful. To be honest, you
almost never compute j from the definition-instead you use some other method
to find out what j is and then go back and show it satisfies (j, cp) = (j, rp).
That is what we will do in this case.
Recall that we computed (e-tl~12n~) = (1r/tr/ 2 e-I~12/4t. We would like to
substitute t = -is. But note that there is an ambiguity when n is odd, namely
which square root to take for -1r /isn/2. This can be clarified by thinking of
t as a complex variable z. We must keep Re z 2: 0 in order that e-zlxl2
not grow too fast at infinity. But for Re z 2: 0 we can determine the square
root zl/2 uniquely by requiring arg z to satisfy -1r /2 ~ arg z ~ 1r /2. This is
consistent with taking the positive square root when z is real and positive. So
-1r/is = (1ri/s) becomes e""i/2(1r/'si) when s > 0 and e-1I"i/2(1r/,si) when
s < 0 so

( ,;, ) n/2 e1l"ni/4

s>O
(~) n/2
-M
( ,;, ) n/2 e-1I"ni/4
s < O.

With this choice we expect (eislxI2) = -1r /isn/2e-ilxI2 /4s.

Having first obtained the answer, how do we justify it from the definition?
We have to show

which is to say

for all cp E S, both integrals being well defined. Now our starting point was the
48 Fourier Transforms of Tempered Distributions

fact that F(e-tlxI2) = (fr/ 2 e-lxI2/4t, which via the adjoint identity gives

j e- t1xl <jJ(x)dx = (7r)n/2

2
t j e-lxl/4trp(x)dx. 2

Now the substitution t = -is may be accomplished by analytic continuation.

We consider

and

G(z) (7r)n/2j e- 1xl /4Zrp(X) dx

=;; 2

for fixed rp E S. For Re z > 0 the integrals converge (note that 1/ z also has
real part> 0) and can be differentiated with respect to z so they define analytic
functions in Re z > o.
We have seen :hat F and G are equal if z is real (and> 0). But an analytic
function is determined by its values for z real so F(z) = G(z) in Re z > o.
Finally F and G are continuous up to the boundary z = is for s :j:. 0,

F(is) lim F(E + is)

= <--+0+
and similarly for G (this requires some justification since we are interchanging
a limit and an integral), hence F( is) = G( is) which is the result we are after.
This example illustrates a very powerful method for computing Fourier trans-
forms via analytic continuation. It has to be used with care, however. The
question of when you can interchange limits and integrals is not just an aca-
demic matter-it can lead to errors if it is misused.

4. Let f(x) = e- t1xl , t > O. This is a rapidly decreasing function but it is not in
S because it fails to be differentiable at x = O. For n = 1 it is easy to compute

i:
the Fourier transform directly:

1(0 = e-tlxleix~ dx

= rO
etx +ix € dx + roo e- tx +ix € dx
i-oo io
= eX(t+i€)] 0 e X(-t+iO ] 00
t + h,°C
-00
+ -t + ~."°C 0
1 1 2t
-t+-~-o~ - -t + i~ - t 2 + ~2·
Examples 49

From the Fourier inversion formula,

e-tlxl = .!.joo _t_e-ix~ df

7r -00 t2 + ~2
Exercise: Verify this directly using the calculus of residues.
For n > 1 we will use another general method which in outline goes as
follows: We try to write e- tlxl as an "average" of Gaussians e-slxI2. In other
words we try to find an identity of the form

e- tlxl = 10 00
g(s)e-slxI2 ds (g depends on t).

If we can do this then (reasoning formally) we should have

F(e- tlxl ) = 10 00
g(s)F(e-slxI2) ds

= 10 00
g(s) (~) n/2 e-lxI2/4s ds.
We then try to evaluate this integral (even if we cannot evaluate it explicitly, it
may give more information than the original Fourier transform formula).
Now the identity we seek is independent of the dimension because all that
appears in it is Ixl which is just a positive number (call it A for emphasis). So
we want
e-t>, = 10 00
g(s)e- s)..2 ds

for all A > O. We will obtain such an identity from the one-dimensional Fourier
transform we just computed. We begin by computing

1 o
00
e
-st2 -se
e
_
ds -
e- s(t 2+e)
(2
- t
(;2)
+ ..
lOO
o
1
- t2 +e'
Since we know e- tlxl = ~ J~oo t2!~2e-ix~ d~ we may substitute in for 1/(t2 +
e) and get

so
50 Fourier Transforms of Tempered Distributions

Putting in A for Ixl we have

which is essentially what we wanted. (We could make the change of variable
S -+ 1/4s to obtain the exact form discussed above, but this is unnecessary.)
Now we let x vary in JRn and substitute Ixl for A to obtain

The last step is to eyaluate2 this inte9ral. 2 We fi~st try to remove the depend~nce
on~. Note that e- 8t e- 81E1 = e- 8(t +IEI ). ThIs suggests the change of van able
S -+ sl(t 2 + 1~12). Doing this we get

-r"( e -tlxl)(c)
.r ." -_ t
(t 2 + 1~12)
.!!.±!
2
1
0
00
_1 -(4 'irS )n/2 e- 8 ds.
()1/2
'irS

This last integral is just a constant depending on n, but not on t or~. It

can be evaluated in terms of the r-function as 2n 'lr n 2' rc!!fl) (when n is odd
rc!!fl) = (!!fl)! while ifn is even r(!!fl) = n 21 • n 23 ••• ~Vi)· Thus

Note this agrees with our previous computation where n = 1. Once again we see
the decay at infinity of e- t1x I mirrored in the smoothness of its Fourier transform,
while the lack of smoothness of e- t1xl at x = 0 results in the polynomial decay
at infinity of the Fourier transform.
Actually we will need to know F-l(e-tlxl), which is (2;)nF(e-tlxl)(-x),
so

5. Let f(x) = Ixl"'. For a > -n (we may even take a complex if Re
a > -n), f is locally integrable (it is never integrable) and does not increase
Examples 51

too fast so it defines a tempered distribution. To compute its Fourier transform

we use the same method as in the previous example. We have

(we have made the change of variable s -+ slxl- 2). Of course for this integral
to converge, the singularity at s = 0 must be better than s-I, so we require
a < O. Thus we have imposed the conditions -n < a < 0 to obtain

Now we may compute

= 7rn/2
("')
r -2
1 0
00
'"
s-2 n
- 2-le-I~12/48 ds.

Now to evaluate the integral make the change of variable s -+ 1~12 /4s, ds -+

lif ds so
4s 2

Note that -a - n satisfies the same conditions as a, namely -n < -n - a < O.

We mention one special case that we will use later: n = 3 and a = -1. Here
we have

F(lxrl) = 7r3/;~2~(1) l~r2

= 47rI~r2

which we can write as

52 Fourier Transforms of Tempered Distributions

4.3 Convolutions with tempered distributions

Many applications of the Fourier transform to solve differential equations lead
to convolutions where one factor is a tempered distribution. Recall

v,*'IjJ(x) = J rp(x-y)'IjJ(y)dy

if rp, 'IjJ E S defines a function in Sand F( rp * 'IjJ) = rp .;j;. Since products are
not defined for all distributions we cannot expect to define convolutions of two
tempered distributions. However if one factor is in S there is no problem. Fix
'IjJ E S. Then convolution with 'IjJ is an operation that preserves S, so to define
'IjJ * f for f E S' we need only find an adjoint identity. Now

J* 'IjJ rpl (X)rp2(X) dx = JJ 'IjJ(x - y)rpl (y)rp2(X) dy dx.

If we do the x-integration first,

J 'IjJ(x - y)rp2(X) dx = {; * rp2(y)

where {;(x) = 'IjJ( -x). Thus

J* 'IjJ rpl (X)rp2(X) dx = J rpl (y){; * rp2(y) dy,

which is our adjoint identity. Thus we define 'IjJ *f by

('IjJ * f, rp) = (j, {; * rp).

From this we obtain F( 'IjJ * f) = ;j; . j by definition chasing:

(F('IjJf),rp) = ('IjJf,rp) = (j,{; * rp) = (j,F- 1({;* rp)).

Now F- 1({; * rp) = ((27r)n F- 1{;). rp and (27r)n F- 1{; = ;j; so (F('IjJ * f), rp) =
(j,;j;. rp) = (;j;. j, rp), which shows F('IjJ * f) =;j;. j.
There is another way to define the convolution, however, which is much more
direct. Remember that if f E S then

'IjJ * f(x) = J 'IjJ(x - y)f(y) dy.

It is suggestive to write this

'IjJ * f(x) = (j,T- x {;)

where Lx{;(Y) = 'IjJ(x - y) is still in S. But written this way it makes sense
for any tempered distribution f. Of course this defines 'IjJ * f as a function, in
fact a Coo function, since we can put all derivatives on 'IjJ. What ought to be
Convolutions with tempered distributions 53

true is that the distribution defined by this function is tempered and agrees with
the previous definition. This is in fact the case. What you have to show is that
if we denote by g(x) = (f,T-x,(fi) then J g(x)<p(x)dx = (f,,(fi * <p). Formally
we can derive this by substituting

j g(x)<p(x)dx = j(f,T-x,(fi)<P(X)dX

= \1, !(T-x,(fi)<P(X)dX)

and then noting that

What this shows is that convolution is a smoothing process. If you start with
any tempered distribution, no matter how rough, and take the convolution with
a test function, you get a smooth function.
Let us look at some simple examples. If 1 = 6 then

1/J * 6(x) = (6, T-x,(fi) = 1/J(x - y)ly=o = 1/J(x)

so 1/J * 6 = 1/J. This is consistent with F( 1/J * 6) = ,(fi . g = ,(fi since g = 1. If we
differentiate this result we get
a a
-a 1/J(x) = -a (1/J * 6)
Xk Xk

= 1/J * - 6
a
aXk

(the derivative of a convolution with a distribution can be computed by putting

the derivative on either factor). This may also be computed directly. What it
shows is that differentiation is a special case of convolution-you convolve with
a derivative of the 6-function.
We can also reinterpret the Fourier inversion formula as a convolution equa-
tion. If we write the double integral for F- 1F 1 as an iterated integral in the
reverse order,

F-1F1(x) = (CXJ 1(y)( ~ (CXJ e-i(x-y).~ d~) dy,

LCXJ 27r LCXJ

then it is just the convolution of 1 with 2~ J~CXJ e-ix.~ d~ which is the inverse
Fourier transform of the constant function 1. But we recognize from g := 1 that
2~ J~CXJ e-ix.~ d~ = 6(x) (in the distribution sense, of course) so that

F- 1F1=1*6=j.
54 Fourier Transforms of Tempered Distributions

In a sense, the identity

~ (CXJ e-ix.~ ~ = 6(x)

211' J-CXJ
is the Fourier inversion formula for the distribution 6, and this single inversion
formula implies the inversion formula for all tempered distributions. We will
encounter a similar phenomenon in the next chapter: to solve a differential
equation Pu = f for any f it suffices (in many cases) to solve it for f = 6. In
this sense 6 is the first and best distribution!

4.4 Problems
1. Let

f(x) = { ~-tx x>o

x ::; O.
Compute }(~).
2. Let
f(x) = xdxl o in lm,n for 0: > -n - 1.
Compute }(O. (Hint: Compute (dldxJ)lxl o +2 .)
3. Let f(x) = (1 + IxI 2 )-O in lm,n for 0: > O. Show that

f(x) = CO 1 CXJ
to-le-te-tlxI2 dt

where Co is a positive constant. Conclude that j(O is a positive function.

4. Express the integral

F(x) = [xCXJ f(t) dt for f ES

as the convolution of f with a tempered distribution.

5. Let f(x) be a continuous function on lm,l periodic of period 211'. Show
that }(O = L:=-CXJ bnTn6 and relate bn to the coefficients of the Fourier
series of f.
6. What is the Fourier transform of xk on lm,l ?
7. Show that F(drf) = r-ndl/rFf for tempered distributions (cf. problem
3.6.2).
8. Show that if f is homogeneous of degree t then F f is homogeneous of
degree -n - t.
Problems 55

9. Let
(f, rp) = lim ( rp(x) dx in Jm.1 .
<-+0 J1xl>< x
Show that :Ff (~) = c sgn ~ because :Ff is odd and homogeneous of degree
zero. Compute the constant c by using d/d~ sgn ~ = 26. (Convolution
with f is called the "Hilbert transform".)
10. Compute the Fourier transform of sgnx e- t1xl on Jm.1 . Take the limit as
t -+ 0 to compute the Fourier transform of sgnx. Compare the result with
problem 9.
11. Use the Fourier inversion formula to "evaluate" J~oo si:x dx (this integral
converges as an improper integral

·
I1m
N-+oo
j-N
N .
smx dx
--
X

to the indicated value, although this formal computation is not a proof).

(Hint: See problem 3.6.6.) Use the Plancherel formula to evaluate

i: ci:Xr
12. Use the Plancherel formula to evaluate
dx.

{'X> 1
Loo (1 + x2)2 dx.
13. Compute the Fourier transform of 1/(1 + x 2 )2 in Jm.1 •
14. Compute the Fourier transform of
x>o
x ::; O.
Can you do this even when k is not an integer (but k > O)?
Solving Partial Differential Equations

5.1 The Laplace equation

Recall that ~ stands for

in JR2 or
82 82 82
- +8X2
8X2
- +8X2
-
I 2 3
in JR3. First we ask if there are solutions to the equation ~u = f for a given
f. If there are, they are not unique, for we can always add a harmonic function
(solution of ~u = 0) without changing the right-hand side.
Now suppose we could solve the equation ~P = 8. Then

~(P * f) = ~P * f = 8 * f = f
so P * f is a solution of ~u = f. Of course we have reasoned only formally,
but if P turns out to be a tempered distribution and f E S, then every step is
justified.
Such solutions P are called fundamental solutions or potentials and have been
known for centuries. We have already found a potential when n = 2. Remember
we found ~ 10g(XI + x~) = 8/41r so that 10g(XI + xD/41r is a potential (called
the logarithmic potential).
When n = 3 we can solve ~P = 8 by taking the Fourier transform of both
sides. We get
F(~P) = _1~12p(~) = F8 = 1.
SO P(~) = _1~1-2 and we have computed (example 5 of section 4.2)

P(x) = _F-I(I~I-2) = __1_

41rlxI

56
The Laplace equation 57

(this is called the Newtonian potential). We could also verify directly using
Stokes' theorem that tlP = 6 in this case.
Now there is one point that should be bothering you about the above compu-
tation. We said that the solution was not unique, and yet we came up with just
one solution. There are two explanations for this. First, we did cheat a little.
From the equation _1~12p = 1 we cannot conclude that P = -1/1~12 because
the multiplication is not of two functions but a function times a distribution.
Now it is true that _1~12 . (_1~1-2) = 1 regarding _1~1-2 as a distribution. But
if we write P = _1~1-2 + g then _1~12p = 1 is equivalent to _1~12g = 0 and
this equation has nonzero solutions. For instance, g = 6 is a solution since

This leads to the fundamental solution

1 1
-47rlxl
- - +{27r)3·
--
More generally we are allowed to take all possible solutions of _1~12g = 0.
It is apparent that such distributions must be concentrated at ~ = 0, and later
we will show they all are finite linear combinations of derivatives of the 6-
function (note however that only some distributions of this form satisfy -I~ 12g =
0; g = (8 2 /8xT)6 does not). Taking F- l ( _1~1-2 + g) we obtain the Newtonian
potential plus a polynomial that is harmonic.
This is still not the whole story, for we know that the general solution of
tlu = 6 is the Newtonian potential plus a harmonic function. There are many
harmonic functions that are not polynomials. So how have these solutions
escaped us? To put the paradox more starkly, if we attempt to describe all
harmonic functions on ]R3 (the same argument works for ]R2 as well) by using
the Fourier transforms to solve tlu = 0, we obtain -1~12u(~) = 0, from which
we deduce that u must be a polynomial. That seems to exclude functions that
are harmonic but are not polynomials, such as eX! COSX2.
But there is really no contradiction because eX! cos X2 is not a tempered distri-
bution; it grows too fast as Xl ---+ 00, so its Fourier transform is not defined. In
fact what we have shown is that any harmonic function that is not a polynomial
must grow too fast at infinity to be a tempered distribution. Stating this in the
contrapositive form, if a harmonic function on]R2 or]R3 is of polynomial growth

(lu(x)1 ~ clxl N as X ---+ 00)

then it must be a polynomial. This is a generalization of Liouville's theorem (a

bounded entire analytic function is constant).
The two points I have just made bear repeating in a more general context:
(1) when solving an equation for a distribution by division, there will be extra
solutions at the zeroes of the denominator, and (2) when using Fourier transforms
to solve differential equations you will only obtain those solutions that do not
grow too rapidly at infinity.
58 Solving Partial Differential Equations

Now we return to the Laplace equation. We have seen that ~(P * f) = f

if f E S. Actually this solution is valid for more general functions f, as long
as the convolution can be reasonably defined. For instance, since P is locally
integrable in both cases, P * f(x) = J P(x - y)f(y) dy makes sense for f
continuous and vanishing outside a bounded set, and ~(P * f) = f.
Usually one is interested in finding not just a solution to a differential equation,
but a solution that satisfies certain side conditions which determine it uniquely.
A typical example is the following: Let D be a bounded domain in Jm.2 or Jm.3
with a smooth boundary B (in Jm.2 B is a curve, in Jm.3 B is a surface). Let f be
a continuous function on D (continuous up to the boundary) and 9 a continuous
function on B. We then seek solution of ~u = f in D with u = 9 on B.
To solve this problem first extend f so that it is defined outside of D. The
simplest way to do this is to set it equal to zero outside D-this results in a
discontinuous function, but that turns out not to matter. Call the extension F
and look at P * F. Since ~(P * F) = F and F = f on D we set v = P * F
restricted to D and so ~v = f on D. Calling w = u - v we see that w must
satisfy
~w =Oon D

w = g- h on B
where h = P * F restricted to B. Now it can be shown that h is continuous
so that the problem for w is the classical Dirichlet problem: find a harmonic
function on D with prescribed continuous values on B. This problem always
has a unique solution, and for some domains D it is given by explicit integrals.
Once you have the unique solution w to the Dirichlet problem, u = v + w is
the unique solution to the original problem.
Next we will use Fourier transforms to study the Dirichlet problem when
D is a half-plane (D is not bounded, of course, but it is the easiest domain
to study). It will be convenient to change notation here. We let t be a real
variable that will always be 2: 0, and we let x = (XI, ... , x n ) be a variable in
Jm.n (the cases of physical interest are n = 1,2). We consider functions u(x, t)
for X E Jm.n , t 2: 0 which are harmonic

[::2 + :~i + ... + ::;] u [:t = 22 + ~x] U = 0

and which take prescribed values on the boundary t = 0 : u(x,O) = f(x). For
now let us take f E S(Jm.n).
The solution is not unique for we may always add ct, that is a harmonic
function that vanishes on the boundary. To get a unique solution we must add
a growth condition at infinity, say that u is bounded.
The method we use is to take Fourier transforms in the x-variables only (this
is sometimes called the partial Fourier transform). That is, for each fixed t 2: 0,
we regard u(x, t) as a function of x. Since it is bounded it defines a tempered
The Laplace equation 59

distribution and so it has a Fourier transform that we denote Fxu(~, t) (some-

times the more ambiguous notation u(~, t) is used). The differential equation

82
at 2u(x, t) + b.xu(x, t) = 0

becomes
82
8t2Fxu(~, t) -1~12Fxu(~, t) = 0

and the boundary condition u(x,O) = f(x) becomes Fxu(~,O) = j(O.

Now what have we gained by this? We have replaced a partial differential
equation by an ordinary differential equation, since only t-derivatives are in-
volved. And the ordinary differential equation is so simple it can be solved
directly. For each fixed ~ (this is something of a cheat, since Fxu(~, t) is only
a distribution, not a function of ~; however, in this case we get the right answer
in the end), the equation

82
at2Fxu(~, t) -1~12Fxu(~, t) = 0

has solutions CI etl~1 + c2e-tl~1 where CI and C2 are constants. Since these
constants can change with ~ we should write

for the general solution. Now we can simplify this formula by considering the
fact that we want u(x, t) to be bounded. The term CI (~)etlel is going to grow
with t as t -+ 00, unless CI(~) = O. So we are left with Fxu(~, t) = c2(~)e-tl~l.
From the boundary condition Fxu(~,O) = j(~) we obtain C2(0 = j(O so
Fxu(~, t) = j(~)e-tl~l, hence

u(x, t) = F;I(j(~)e-tl~l) = F;I(e- t1el ) * f(x).

Now in example (4) of 4.2 we computed

F-I(e-tl~l) =7r-(~)r (n+ 1) (t + IxI2)~

2
t
2

u(x, t) = 7r-(~)r (n +2 1) jlRn( f(y) t!!of!

(t 2 +(X-y)2)
dy.

This is referred to as the Poisson integral formula for the half-space. The
integral is convergent as long as f is a bounded function and gives a bounded
harmonic function with boundary values f. The derivation we gave involved
some questionable steps; however, the validity of the result can be checked and
the uniqueness proved by other methods.
60 Solving Partial Differential Equations

Exercise: Verify that u is harmonic by differentiating the integral. (Note that

the denominator is never zero since t > 0.)
The special case n = 1

u(x, t) =:;1 1 00

-00 fey) t2 + (xt _ y)2 dy

can be derived from the Poisson integral formula for the disk by conformal
mapping.

5.2 The heat equation

We retain the notation from the previous case: t ~ 0, x E Jm.n, u(x, t). In this
case the heat equation is

where k is a positive constant. You should think of t as time, t = 0 the initial

time, x a point in space (n = 1, 2, 3 are the physically interesting cases), and
u(x, t) temperature. The boundary condition u(x,O) = f(x), f given in S,
should be thought of as the initial temperature. From physical reasoning there
should be a unique solution. Actually for uniqueness we need some additional
growth condition on the solution-boundedness is more than adequate (although
it requires some work to exhibit a nonzero solution with zero initial conditions).
We can find the solution explicitly by the method of partial Fourier transform.
The differential equation becomes

and the initial condition becomes Fx(~, 0) = j(~). Solving the differential
equation we have
Fxu(~, t) = c(~)e-ktl~12
and from the initial condition c(~) = j(~) so that

u(x, t) = F-l(e-ktl~12 j(~))

= F-l(e-ktl~12) *f
= 1
(47rkt)n/2
f e-lx-yI2/4kt fey) dy.
Exercise: Verify directly that this gives a solution of the heat equation.
The heat equation 61

One interesting aspect of this solution is the way it behaves with respect to
time. This is easiest to see on the Fourier transform side:

decreases at infinity more rapidly as t increases. This decrease at infinity cor-

responds roughly to "smoothness" of u(x, t). Thus as time increases, so does
the smoothness of the temperature. The other side of the coin is that if we
try to reverse time we run into trouble. In other words, if we try to find the
solution for negative t (corresponding to times before the initial measurement),
the initial temperature f(x) must be very smooth (so that j(~) decreases so
fast that j(~)e-ktl~12 is a tempered distribution). Even if the solution does exist
for negative t, it is not given by a simple formula (the formula we derived is
definitely nonsense for t < 0).
So far, the problems we have looked at involve solving a differential equation
on an unbounded region. Most physical problems involve bounded regions. For
the heat equation, the simplest physically realistic domain is to take n = 1 and
let x vary in a finite interval, so 0 :s x :s 1. This requires that we formulate
some sort of boundary conditions at x = 0 and x = 1. We will take periodic
boundary conditions

u(O, t) = u(l, t) all t

which correspond to a circular piece of wire (insulated, so heat does not transfer
to the ambient space around the wire). In the problems you will encounter other
boundary conditions.
The word "periodic" is the key to the solution. We imagine the initial
temperature f(x), which is defined on [0,1] (to be consistent with the peri-
odic boundary conditions we must have f(O) = f(I)), extended to the whole
line as a periodic function of x, f(x + 1) = f(x) all x, and similarly for
u(x, t), u(x + I, t) = u(x, t) all x (note that the periodic condition on u implies
the boundary condition just by setting x = 0).
Now if we substitute our periodic function f in the solution for the whole
line, we obtain

u(x t) = 1 fOO e-(X-Y)

2
/4ktf(y) dy
, (41l'kt)I/2_00
00 1
= "" 1 (e-(x+j-y)2 /4kt f(y) dy
. L.J {41l'kt)I/2
J=-OO
Jo
(break the line up into intervals j :s y :s j + 1 and make the change of variable
y ---+ y - j in each interval, using the periodicity of the f to replace f(y - j)
62 Solving Partial Differential Equations

by fey)). We can write this

u(x t) =
, 1 ~
10t ( (47rkt)I/2.~ e-(X+j-Y)2/ 4kt ) fey) dy
J=-OO

because the series converges rapidly. Observe that this formula does indeed
produce a periodic function u(x, t), since the substitution x ---+ x + 1 can be
erased by the change of summation variable j ---+ j - 1.
Perhaps you are more familiar with a solution to the same problem using
Fourier series. This, in fact, was one of the first problems that led Fourier to the
discovery of Fourier series. Since the problem has a unique solution, the two
solutions must be equal, but they are not the same. The Fourier series solution
looks like

u(x, t) = L
00

ane-411'2n2kte211'inx
n=-oo

an = 11 f(x)e- 211'inx dx.

Both solutions involve infinite series, but the one we derived has the advantage
that the terms are all positive (if f is positive).

5.3 The wave equation

The equation is (a 2/ ae)u(x, t) = k 2C1 xu(x, t) where t is now any real number
and x E jRn. We will consider the cases n = 1, 2, 3 only, which describe
roughly vibrations of a string, a drum, and sound waves in air, respectively.
The constant k is the maximum propagation speed, as we shall see shortly. The
initial conditions we give are for both u and au/at,
au
u(x,O) = f(x), at (x,O) = g(x).

As usual we take f and 9 in S, although the solution we obtain allows much

more general choice. These initial conditions (called Cauchy data) determine a
unique solution without any growth conditions.
We solve by taking partial Fourier transforms. We obtain

a ( ~,t) = -k 2 I~I 2 Fxu ( ~,t)

2
at2Fxu
The wave equation 63

The general solution of the differential equation is

c] (~) cos ktl~1 + C2(~) sin ktl~1

and from the initial conditions we obtain

• • sin ktl~1
Fxu(~, t) = fW cos ktl~1 + g(~) kl~1 .

Before inverting the Fourier transform let us make some observations about
the solution. First, it is clear that time is reversible--except for a minus sign
in the second term, there is no difference between t and -to So the past is
determined by the present as well as the future.
Another thing we can see, although it requires work, is the conservation of
energy. The energy of the solution u(x, t) at time t is defined as

The first term is kinetic energy and the second is potential energy. Conservation
of energy says that E(t) is independent of t.
To see this we express the energy in terms of Fxu(~, t). Since

Fx (8~j u ) (~, t) = -i~jFxu(~, t)

we have

by the Plancherel formula. Now

IFxu(~, tW = (1(0 cos ktl~1 + g(~) sink~~I~I)

. (1(0 cos ktl~1 + g(~) sink~ti~l)
and

1:t Fxu(~, t) 12 = (-kl~ll(~) sin ktl~1 + g(~) cos ktlW

. (-kl~ll(~) sin ktl~1 + g(~) cos kt(~))
so that
64 Solving Partial Differential Equations

(the cross terms cancel and the sin 2 + cos 2 terms add to one). Thus

E(t) = J
2(2~)n (el~12Ij(~W + Ig(~W) d~

= U (k't.la~/(X)I' + I') 19(x) dx

independent of t.
Now to invert the Fourier transform. When n = 1 this is easy since cos ktl~1 =
!-(
eikt~ + e-ikt~) so

I 1
+ kt) + f(x -
A

;::- (cos ktl~lf(O)(x) = Z(f(x kt)).

Similarly

_I (A sin ktl~l) 1 (kt

;:: g(~) kl~1 = 2kt Lkt g(x + s) ds.
When n = 3 the answer is given in terms of surface integrals over spheres.
Let (7 denote the distribution

((7, cp) = ( cp(x) d(7(x)

J1xl=1

where d(7(x) is the element of surface integration on the unit sphere. In terms
of spherical coordinates (x,y,z) = (cosOJ,sinOlcos(h,sinOlsin02) for the
sphere, with °: ;
01 ::; 7r and °: ;
O2 ::; 27r, this is just

((7, cp) = 1 1'"

2
'" cp(cos 01 , sin 01 cos O2 , sin 01 sin ( 2 ) sin 01 dOl, d02·

Now to compute 8"(0 we need to evaluate this integral when cp(x) = eix.f.. To
make the computation easier we use the observation (from problem 3.8) that 8" is
radial, so it suffices to compute 8"(6,0,0), for then 0'(6,6,6) = 8"(1~1,0,0).
But

((7, eixI~I) = 127r 1'" ei~1 cos iii sin 01 dOl d02

= 27r 1'" e iel cos iii sin 01 dOl

= -27re iel COSOI I'"

i~1 0

47r sin 6
~I
The wave equation 65

and so 8-(0 = 47rsin I~I/I~I. Similarly, if (jr denotes the surface integral over
the sphere Ixl = r of radius r, then 8-r(~) = 47rr sin rl~I/I~1 and so

(
1 ) sin ktl~1
F 47rk2t (jkt = kl~1 .
Thus

F
_I (Ag(~) sinktl~l)
kl~1
1
= 47rk 2t(jkt * g(x).
Furthermore, if we differentiate this identity with respect to t we find

F- I(1(0 cos kt(IW = :t (47r~2t (jkt * f(X))

(we renamed the function j). Thus the solution to the wave equation in n =3
dimensions is simply

a
u(x, t) = at ( 47rk1 2t (jkt * f(x) ) + 47rk1 2t (jkt * g(x).
The convolution can be written directly as

(jkt * f(x) = r
J1yl=kt
f(x + y) d(jkt(Y),
or it can be expressed in terms of integration over the unit sphere

(jkt * f(x) = r f(x + kty) d(j

ee
J1yl=1

=ee Jor Jor f(xl +ktcos(h,X2+ktsinOlcos02,

27r

X3 + kt sin 01 sin ( 2) sin 01 dOl d0 2.

When n = 2 the solution to the wave equation is easiest to obtain by the so-
called "method of descent". We take our initial position and velocity f(XI, X2)
and g(x), X2) and pretend they are functions of three variables (XI, X2, X3) that
are independent of the third variable X3. Fair enough. We then solve the
3-dimensional wave equation for this initial data. The solution will also be
independent of the third variable and will be the solution of the original 2-
dimensional problem. This gives us explicitly

u(x, t) =

+ -t 127r 17r g(xl + kt cos 0), X2 + kt sin 0 cos ( 2) sin 0 dOl d02.
1 1
47r 0 0
66 Solving Partial Differential Equations

There is another way to express this solution. The pair of variables (cos (h ,
sin 01 cos O2 ) describes the unit disk xi + x~ ::; 1 in a two-to-one fashion (two
different values of O2 give the same value to cos O2 ) as (0], O2) vary over
o ::; 01 ::; 71",0 ::; O2 ::; 271". Thus if we make the substitution (YI, Y2)
(cos 01, sin 01cos O2) then dYI dY2 = sin 2011 sin O2IdOl d02 and sin 011 sin 021 =
JI -IYI2 so

u(x, t) = ~ (~{ f(x + kty) dY)

8t 271" llYI5,1 VI - lyI 2
t
+- 1, g(x + kty) d
Y
271" lyl5,1 VI - lyl2 .

Note that these are improper integrals because (1 _lyI2)-1/2 becomes infinite
as IYI ---+ I, but they are absolutely convergent. Still another way to write the
integrals is to introduce polar coordinates:

u(x, t) = 88 (-2t71" 10("r 10t f(xl + ktrcosO,x2 + ktrsinO) ~ dO)

t I - r2

tl~ll g(xl + ktrcosO,x2 +ktrsinO) ~dO.

+- ~
271" 0 0 vI - r2

The convergence of the integral is due to the fact that Jol (rdr /~) is finite.
There are several astounding qualitative facts that we can deduce from these
elegant quantitative fonnulas. The first is that k is the maximum speed of
propagation of signals. Suppose we make a "noise" located near a point y at
time t = o. Can this noise be "heard" at a point x at a later time t? Certainly
not if the distance (x - y) from x to y exceeds kt, for the contribution to u( x, t)
from f(y) and g(y) is zero until kt ~ Ix - YI. This is true in all dimensions,
and it is a direct consequence of the fact that u(x, t) is expressed as a sum of
convolutions of f and 9 with distributions that vanish outside the ball of radius
kt about the origin. (Compare this with the heat equation, where the "speed
of smell" is infinite!) Also, of course, there is nothing special about starting at
t = o. The finite speed of sound and light are well-known physical phenomena
(light is governed by a system of equations, called Maxwell's equations, but each
component of the system satisfies the wave equation). But something special
happens when n = 3 (it also happens when n is odd, n ~ 3). After the noise
is heard, it moves away and leaves no reverberation (physical reverberations
of sound are due to reflections off walls, ground, and objects). This is called
Huyghens' principle and is due to the fact that distributions we convolve f and
9 with also vanish inside the ball of radius kt. Another way of saying this
is that signals propagate at exactly speed k. In particular, if f and 9 vanish
Schrodinger's equation and quantum mechanics 67

outside a ball of radius R, then after a time f(R + lxi), there will be a total
silence at point x. This is clearly not the case when n = 1,2 (when n = 1
it is true for the initial position f, but not the initial velocity g). This can be
thought of as a ripple effect: after the noise reaches point x, smaller ripples
continue to be heard. A physical model of this phenomenon is the ripples you
see on the surface of a pond, but this is in fact a rather unfair example, since
the differential equations that govern the vibrations on the surface of water are
nonlinear and therefore quite different from the linear wave equation we have
been studying. In particular, the rippling is much more pronounced than it is
for the 2-dimensional wave equation.
There is a weak form of Huyghens' principle that does hold in all dimensions:
the singularities of the signal propagate at exactly speed k. This shows up in the
convolution form of the solution when n = 2 in the smoothness of (1-lyI2)-'/2
everywhere except on the surface of the sphere IYI = 1.
Another interesting property is the focusing of singularities, which shows up
most strikingly when n = 3. Since the solution involves averaging over a sphere,
we can have relatively mild singularities in the initial data over the whole sphere
produce a sharp singularity at the center when they all arrive simultaneously.
Assume the initial data is radial: f(x) and g(x) depend only on Ixl (we write
f(x) = f(lxi), g(x) = g(lxi)). Then u(O, t) = (8/8t)(tf(kt)) + tg(kt) since

1
47r k 2t
1.
IYI=kt
f(y) dy = 4 1k 2 47r(kt) 2f(kt)
7r t

etc.
It is the appearance of the derivative that can make u(O, t) much worse than
°
f or g. For instance, take g(x) = and

{ (I -lxl)'/2 if Ixl :::; 1

f(x)= ° iflxl>1.

Then f is continuous, but not differentiable. But u(O, t) = f(kt) + ktf'(kt)

tends to infinity at t = 1/ k, which is the instant when all the singularities are
focused.

5.4 SchrOdinger's equation and quantum mechanics

The quantum theory of a single particle is described by a complex-valued "wave
function" cp( x) defined on lffi.3. The only restriction on cp is that fIR 3 Icp( x Wdx
be finite. Since cp and any constant multiple of cp describe the same physical
state, it is convenient, but not necessary, to normalize this integral to be one.
68 Solving Partial Differential Equations

The wave function changes with time. If u(x, t) is the wave function at time t
then
:t u(x, t) = ik[}'xu(x, t).
This is the free Schrodinger equation. There are additional terms if there is a
potential or other physical interaction present. The constant k is related to the
mass of the particle and Planck's constant.
The free Schrodinger equation is easily solved with initial condition u(x, 0) =
cp(x). We have (8/8t)rxu(~,t) = ikl~12rxu(~,t) and rxu(~,O) = cp(~) so
that
rxu(~, t) = eiktl €12 cp(~).
Referring to example (3) of 4.2 we find

u(x, t) = (~)3/2 e±~1ri f e-ilx-yI2/4ktcp(y) dy

kt JIR3
where the ± sign is the sign of t (of course the factor e±~1ri has no physical
significance, by our previous remarks).
Actually the expression for rxu is more useful. Notice that Irxu(~, t)1 =
Icp(~)1 is independent of t. Thus

f 1) f Irxu(~,tWd~
lu(x,tWdx = -(2
JIR3 ~ n JIR 3
is independent of t, so once the wave-function is normalized at t = 0 it remains
normalized.
The interpretation of the wave function is somewhat controversial, but the
standard description is as follows: there is an imperfect coupling between phys-
ical measurement and the wave function, so that a position measurement of a
particle with wave function cp will not always produce the same answer. Instead
we have only a probabilistic prediction: the probability that the position vector
will measure in a set A ~ lffi.3 is
IA Icp(xWdx
IIR3Icp(x)12 dx .
We have a similar statement for measurements of momentum. If we choose
units appropriately, the probability that the momentum vector will measure in a
set B ~ lffi.3 is
I8 Icp(xWd~
IIR3Icp(x)12 d(
Note that IIR3Icp(~W d~ = (h)n IIR3(CP(X)? dx so that the denominator is
always finite.
Now what happens as time changes? The position probabilities change in a
very complicated way, but Irxu(~, t)1 = Icp(~)1 so the momentum probabilities
Problems 69

remain the same. This is the quantum mechanical analog of conservation of

momentum.

5.5 Problems
1. For the Laplace and the heat equation in the half-space prove via the
Plancherel formula that

[ lu(x, t)12 dx ~ [ lu(x, 0)12 dx t > O.

JRn JRn
What is the limit of this integral as t ---+ 0 and as t ---+ oo?
2. For the same equations show lu(x, t)1 ~ SUpyERn lu(y, 0)1. (Hint: Write
u = G t * f and observe that Gt(x) ~ O. Then use the Fourier inversion
formula to compute fRn
Gt(x) dx and estimate

lu(x,t)1 ~ [JRn
[ Gt(Y)dY] sup If(y)l·
vERn
3. Solve
8 2
[ 8t 2 + ~x ] 2 U(X, t) = 0

for t ~ O,x E lffi.n given

8
u(x,O) = f(x), 8t u(x, 0) = g(x)

f,g E S with lu(x,t)1 ~ c(l + Itl). Hint: Show

F,r;u(~, t) = te-tl€lg(O + (e-tl€1 + tl~le-tl€I)j(~).

To invert note that F;I(I~le-tl€l) = - %tF;I(e-tl€I).
4. Solve (8/8t)u(x,t) = k~xu + u for t ~ O,x E lffi.n with u(x,O) =
f(x) E S.
5. Solve (8 2 /8t 2 )u(x,t) + ~xu(x,t) = 0 for 0 ~ t ~ T,x E lffi.n with
u(x,O) = f(x),u(x,T) = g(x) for Fxu(~,t) (do not attempt to invert
the Fourier transform).
6. Solve the free Schrodinger equation with initial wave function <p( x) =
e- 1xI2 .
7. In two dimensions show that the Laplacian factors
70 Solving Partial Differential Equations

and the factors commute. Deduce from this that an analytic function is
harmonic.
8. Let f be a real-valued function in S(Jffi.I ) and define 9 by g(~) =
-i(sgn oj(~). Show that 9 is real-valued and if u(x, t) and v(x, t) are the
harmonic functions in the half-plane with boundary values f and 9 then
they are conjugate harmonic functions: u + iv is analytic in z = x + it.
(Hint: Verify the Cauchy-Riemann equations.) Find an expression for v
in terms of f. (Hint: Evaluate F-I(-isgn~e-tl€l) directly.)
9. Show that a solution to the heat equation (or wave equation) that is inde-
pendent of time (stationary) is a harmonic function of the space variables.
10. Solve the initial value problem for the Klein-Gordon equation

8 2u _
8t 2 -
A
iJ.xU -
2
mum> °
8u
u(x,O) = f(x), 8t (x,O) = g(x)
for Fxu(~, t) (do not attempt to invert the Fourier transform).
11. Show that the energy

is conserved for solutions of the Klein-Gordon equation Klein-Gordon

equation.
12. Solve the heat equation on the interval [0, I] with Dirichlet boundary
conditions

u(O,t) =0, u(l,t)=O

(Hint: Extend all functions by odd reflection about the boundary points
°
x = and x = 1 and periodicity with period 2).
13. Do the same as problem 12 for Neumann boundary conditions
8 8
8x u(O, t) = 0, 8x u(l, t) = 0.

(Hint: This time use even reflections).

14. Show that the inhomogeneous heat equation with homogeneous initial
conditions
8
8t u(x, t) = kb.xu(x, t) + F(x, t)
u(x,O) = °
Problems 71

is solved by Duhamel's integral

u(x, t) = lot (In Gs(y)F(x - y, t - s) dy ) ds

where
Gs(Y) = 1 e-lyI2/4ks
(47rks)n/2
is the solution kernel for the homogeneous heat equation. Use this to
solve the fully inhomogeneous problem.

a
at u(x, t) = ktl",u(x, t) + F(x, t)
u(x,O) = f(x).

15. Show that the inhomogeneous wave equation on lffi.n with homogeneous
initial data
a2
at 2 u(x, t) = e tl",u(x, t) + F(x, t)
u(x,O) = 0
a
atu(x,O) = 0

is solved by Duhamel's integral

where

H = ;:-1 (sin kSI~I)

8 kl~l·

Show this is valid for negative t as well. Use this to solve the inhomoge-
neous wave equation with inhomogeneous initial data.
16. Interpret the solution in problem 15 in terms of finite propagation speed
and Huyghens' principle (n = 3) for the influence of the inhomogeneous
term F(x, t).
17. Show that if the initial temperature is a radial function then the temperature
at all later times is radial.
18. Maxwell's equations in a vacuum can be written

1 a
--a E(x, t) = curl H(x, t)
c t
1 a
--a H(x,t) = -curl E(x,t)
c t
72 Solving Partial Differential Equations

where the electric and magnetic fields E and H are vector-valued func-
tions on ~. Show that each component of these fields satisfies the wave
equation with speed of propagation c.
19. Let u(x, t) be a solution of the free Schrodingerequation with initial wave
function 'P satisfying fIR} 1'P(x)1 dx < 00. Show that lu(x, t) :'S ct- 3/ 2 for
some constant c. What does this tell you about the probabilty of finding
a free particle in a bounded region of space as time goes by?
The Structure of Distributions

6.1 The support of a distribution

The idea of support of an object (function, distribution, etc.) is the set on which
it does something nontrivial. Roughly speaking, it is the complement of the set
on which the object is zero, but this is not exactly correct. Consider, for example,
the function f (x) = x on the line. This function is zero at x = 0, and so at first
we might be tempted to say that the support is Jffi.! minus the origin. But this func-
tion is not completely dead at x = O. It vanishes there, but not at nearby points.
For a function to be truly boring at a point, it should vanish in a whole neigh-
borhood of the point. Therefore, we define the support of a function f (written
supp (f) to be the complement of the set of points x such that f vanishes in a
neighborhood of x. In our example, the support is the whole line. By the nature
of the definition, the support is always a closed set, since it is the complement
of an open set. For a continuous function, the support of f is just the closure
of the set where f is different from zero. It follows that supp (f) is a compact
set (recall that the compact sets in Jffi.n are characterized by two conditions: they
must be closed and bounded) if and only if f vanishes outside a bounded set.
Such functions are said to have compact support. The test functions in V(Jffi.n)
all have compact support. Thus we can describe the class of test functions suc-
cinctly as the Coo functions of compact support (the same is true if we consider
test functions on an open set n c Jffi.n).
Now we would like to make the same definition for distributions. We begin by
defining what it means for a distribution to be identically zero in a neighborhood
of a point. Since a neighborhood of a point just means an open set containing
the point, it is the same thing to define T == 0 on n for an open set n.

Definition 6.1.1
T == 0 on n means (T, cp) = 0 for all test functions cp with supp cp ~ n.
Then supp T is the complement of the set of points x such that T vanishes in a
neighborhood of x.

73
74 The Structure of Distributions

Intuitively, if we cannot get a rise out of T by doing anything on n, then T is

dead on n. However, it is important to understand what the definition does not
say: since the support of 'P is a closed set and n is an open set, the statement
supp 'P ~ n is stronger than saying 'P vanishes at every point not in n; it says
that 'P vanishes in a neighborhood of every point not in n. For example, if
n = (0,1) in lffi.1 then 'P must vanish outside (f, 1 - f) for some f > 0 in order
that supp 'P ~ n.
To understand the definition, we need to look at some examples. Consider
first the O'-function. Intuitively, the support of this distribution should be the
origin. Indeed, if n is any open set not containing the origin, then 0' == 0 on n,
because (0', 'P) = 'P(O) = 0 if supp 'P ~ n. On the other hand, we certainly do
not have 0' == 0 on a neighborhood of the origin since every such neighborhood
supports test functions with 'P(O) = 1. Thus supp 0' = {OJ as expected.
What is the support of O"(n = I)? Suppose n is an open set not containing
the origin, and supp 'P ~ n. Then 'P vanishes in a neighborhood of the origin
(not just at 0), and so (0', 'P) = -'P'(O) = o. (In other words, 'P(O) = 0 does
not imply 'P'(O) = 0, but 'P(x) = 0 for all x in a neighborhood of 0 does imply
'P'(O) = 0.) It is also easy to see that 0" does not vanish in any neighborhood
of 0 (just construct 'P supported in the neighborhood with 'P' (0) = 1). Thus
supp 0" = {OJ. Intuitively, both 0' and 0" have the same support, but 0" "hangs
out," at least infinitesimally, beyond its support. In other words, (0", 'P) depends
not just on the values of'P on the support {OJ, but also on the values of the
derivatives of 'P on the support. This is true in general: the support of T is the
smallest closed set E such that (T, 'P) depends only on the values of'P and all
its derivatives on E.
We can already use this concept to explain the finite propagation speed and
Huyghens' principle for the wave equation discussed in section 5.3. The finite
propagation speed says that the distributions F- 1 (cos ktlW and
F- 1 (sinktl~l/kl~l) are supported in the ball Ixl ~ kt (we say supported in
when the support is known to be a subset of the given set), and Huyghens'
principle says that the support is actually the sphere Ixl = kt. Since supports
add under convolution,

supp T *f ~ supp T + supp f

where the sum set A + B is defined to be the set of all sums a + b with a E A
and bE B.

Exercise: Verify this intuitively from the formula

T * f(x) = / T(x - y)f(y) dy.

The support of a distribution 75

The properties of the solution

= F- 1(cos ktlW * 1 + F- 1 ( kl~1

sinktl~l)
u(x, t) *9
follow from these support statements.
As a more mundane example of the support of a distribution, consider the
distribution (T, cp) = J I(x)cp(x) dx associated to a continuous function f. If
the definition is to be consistent, supp (T) should equal the support of 1 as a
function. This is indeed the case, and the containment supp (T) ~ supp (f)
is easy to see from the contrapositive: if 1 vanishes in an open set n, then T
vanishes there as well. Thus the complement of supp (f) is contained in the
complement of supp (T). Conversely, if T vanishes on an open set n, then by
considering test functions approximating a 8-function 8(x - y) for yEn we
can conclude that 1 vanishes at y.
A more interesting question arises if we drop the assumption that 1 be con-
tinuous. We still expect supp (T) to be equal to supp (f), but what do we mean
by supp (f)? To understand why we have to be careful here, we need to look
at a peculiar example. Let 1 be a function on the line that is zero for every
x except x = 0, and let 1(0) = 1. By our previous definition for continuous
functions, the support of 1 should be the origin. But the distribution associated
with 1 is the zero distribution, which has support equal to the empty set.
Of course this seems like a stupid example, since the same distribution could
be given by the zero function, which would yield the correct support. Rather
than invent some rule to bar such examples (actually, you won't succeed if you
try!) it is better to reconsider what we mean by "I vanishes on n" if 1 is not
necessarily continuous. What we want is that the integral of 1 should be zero
on n, but not because of cancellation of positive and negative parts. That means
we want

In I/(x)1 dx = 0

as the condition for 1 to vanish on n. If this is so then

In I(x)cp(x) dx = 0

for every test function cp with support in n (in fact this is true for any function
cp). This should seem plausible, and it follows from the inequalities

lin I(x)cp(x) dxl ::; In I/(x)llcp(x)1 dx

::; M ! I/(x)1 dx = 0
76 The Structure of Distributions

where M is the maximum value of Icp(x)1 (if cp is unbounded this requires a

more complicated proof, but in the case at hand cp is a test function and so is
bounded). So we have f == 0 on n implies T == 0 on n, and once again supp
(T) = supp (I) if we define supp (I) as before (x is not in supp (I) if and only
if f vanishes in a neighborhood of x.)
The distributions on JRn with compact support form an important class of
distributions. Since the supp (T) is always closed, to say it is compact is the
same thing as saying it is bounded. (For distributions on an open set n, compact
support means that the support also stays away from the boundary of n.) The
distributions of compact support are denoted E' (JRn) (or E', for short), and this
notation should make you suspect that they can also be considered as continuous
linear functions on some space of test functions named E. In fact it turns out
that E is just the space of all Coo functions, with no restriction on the growth
at infinity. What we are claiming is the following:

1. if T is a distribution of compact support, then (T, cp) makes sense for

every Coo function cp; and
2. if T is a distribution, and if (T, cp) makes sense for every Coo function
cp, then T must have compact support.

It is easy to explain why (1) is true. Suppose the support of T is contained in

the ball Ixl ~ R (this will be true for some R since the support is bounded).
Choose a function '1jJ E V that is identically one on some larger ball. If cp is
Coo then cp'1jJ is in V and cp'1jJ = cp on a neighborhood of the support of T. Thus
it makes sense to define

(T, cp) = (T, '1jJcp)

since T doesn't care what happens outside a neighborhood of its support. It is

easy to see that the definition does not depend on the choice of'1jJ (a different
choice '1jJ' would yield (T, '1jJ' cp) = (T, '1jJcp) + (T, ('1jJ' - '1jJ )cp) and the last term is
zero because ('1jJ' -'1jJ)cp vanishes in a neighborhood of the support ofT). We thus
have associated a linear functional on E to each distribution of compact support,
and it can be shown to be continuous in the required sense. The converse (2) is
not as intuitive and requires the correct notion of continuity (see section 6.5).
The idea is that if T did not have compact support we could construct a Coo
function cp for which (T, cp) would have to be infinite, because no matter how
quickly T tries to go to zero as x ----t 00, we can construct cp to grow even more
rapidly as x ----t 00.
We can now consider all three spaces of test functions and distributions as a
kind of hierarchy. For the test functions we have the containments
Structure theorems 77

(not alphabetical order, alas, even in French). For the distributions the contain-
ments are reversed:
['<:;;;S'c'D'.
In particular, since distributions of compact support are tempered, the Fourier
transform theory applies to them.
We will have a lot more to say about this in section 7.2.

6.2 Structure theorems

We have seen many distributions so far, but we have not said much about what
the most general distribution looks like. We know that there are distributions
that come from functions, and it is standard abuse of notation to say that they are
the functions they come from. Since the space of distributions is closed under
differentiation, there are all the derivatives of functions. It is almost true to say
"that's all." If we view distribution theory as the completion of the differential
calculus, then this would say that it is a minimal completion ... we have not
added anything we were not absolutely compelled to add.
But why did I say "almost" true? A more precise statement is that it is
"locally" true. If T is a distribution of compact suppport, or even a tempered
distribution, then it is true that T can be written as a finite sum of derivatives
of functions. If T is only a distribution, then T can be written as a possibly
infinite sum of derivatives of functions that is locally finite, meaning that for a
fixed test function 'P, only a finite number of terms are nonzero. For example,
there is
00

T = L 8(n)(x - n),
n=1

that means
00

(T,'P) = L'P(n)(n).
n=1

For each fixed 'P, all but a finite number of terms will be zero. Of course
we have not yet written T as a sum of derivatives of functions, but recall that
8 = H' where H is the Heaviside function

H(x) = {01 x> 0

x~O

so
00

T = L H(n+I)(x - n)
n=1
78 The Structure of Distributions

does the job. In fact, we can go a step further and make the functions continuous,
if we note that H = X' where

X(x) ={ ~ if x> 0
if x ~ o.
Thus we can also write
00

T= Lx(n+2)(x-n).
n=t
This is typical of what is true in general.
The results we are talking about can be thought of as "structure theorems"
since they describe the structure of the distribution classes [', S', and V'. On the
other hand, they are not as explicit as they might seem, since it is rather difficult
to understand exactly what the 27th derivative of a complicated function is.
Before stating the structure theorems in m.n , we need to introduce some no-
tation. This multi-index notation will also be very useful when we discuss
general partial differential operators. We use lowercase greek letters a, (3, ...
to denote multi-indices, which are just n-tuples a = (at, ... , an) of nonneg-
ative integers. Then (a/ax)'" stands for (a/axt}a 1 ••• (a/axn)",n . Similarly
xa = Xfl ... x~n. We write lal = at + ... + an and call this the order of the
multi-index. This is consistent with the usual definition of the order of a partial
derivative. We also write a! = at! ... an!.

Structure Theorem for [': Let T be a distribution of compact support on

m.n . Then there exist a finite number of continuous functions fa such that T =
L: (a/ax)'" fa (finite sum), and each term (a/ax)", fa has compact support.
Structure Theorem for S': Let T be a tempered distribution on m.n . Then there
exist a finite number of continuous functions fa (x) satisfying a polynomial order
growth estimate

such that

T = L (:x) a fa (finite sum).

Structure Theorem for v': Let T be a distribution on m.n . Then there exist
continuous functions fa such that

T = L (!) a fa (infinite sum)

where for every bounded open set n, all but a finite number of the distributions
(a / ax) a fa vanish identically on n.
Stnucturetheorenls 79

There is no uniqueness corresponding to these representations of T. In the

first two structure theorems you can even consolidate the finite sum into a
single term, T = (a/ax)'" f for some a and f. For example, if n = 1 and,
say, T = fo + ff, then also T = l' where f(x) = !I (x) + fox fo(y)dy.

Exercise: Verify this using the fundamental theorem of the calculus.

The trouble with this consolidation is that, when n > 1, it may increase the
order of the derivatives. For example, to consolidate

a
-fl+-12
a
aXI aX2
requires (a2/aXlaX2)f.
The highest order of derivative (the maximum of lal = al + a2 + ... + an) in
a representation T = L (a/ax) a fa is called the order of the distribution. The
trouble with this definition is that we really want to take the minimum value of
this order over all such representations (remember the lack of uniqueness). Thus
if we know T = L (a/ax)'" fa with lal ~ m then we can only assert that the
order of T is ~ m, since there might be a better representation. (Also, there
is no general agreement on whether or not to insist that the functions fa in the
representation be continuous; frequently one obtains such representations where
the functions fa are only locally bounded, or locally integrable, and for many
purposes that is sufficient.) However, despite all these caveats, the imprecise
notion of the order of a distribution is a widely used concept. One conclusion
that is beyond question is that the first two structure theorems assert that every
distribution in [' or S' has finite order. We have seen above an example of a
distribution of infinite order.
One subtle point in the first structure theorem is that the functions fa them-
selves need not have compact support. For example, the 6-function is not the
derivative (of any order) of a function of compact support, even though 6 has
support at the origin. The reason for this is that (T', 1) = 0 if T has compact
support, where 1 denotes the constant function. Indeed (T', 1) = (T', 'Ij;) by
definition, where 'Ij; E D is one on a neighborhood of the support of T'. But
(T','Ij;) = (T,'I//) and '1// vanishes on a neighborhood of the support of T, so
(T', 'Ij;) = O. Notice that this argument does not work if T' has compact support
but T does not.
The proofs of the structure theorems are not elementary. There are some
proofs that are short and tricky and entirely nonconstructive. There are con-
structive proofs, but they are harder. When n = 1, the idea behind the proof
is that you integrate the distribution many times until you obtain a continuous
function. Then that function differentiated as many times brings you back to
the original distribution. The problem is that it is tricky to define the integral
of a distribution (see problem 9). Once you have the right definition, it requires
the correct notion of continuity of a distribution to prove that for T in [' or S'
80 The Structure of Distributions

you eventually arrive at a continuous function after a finite number of integrals

(of course this may not be true for general distributions).
One aspect of the proofs that is very simple is that the structure theorem for
V' follows easily from the structure theorem for E'. The idea uses the important
idea of a partition of unity. This is by definition a sequence of test functions
<pj E V such that 2:: j <pj == I and the sum is locally finite. It is easy to construct
partitions of unity (see problem to), and for an arbitrary distribution T we then
have T = 2:: j <pjT (locally finite sum). Note that <pjT has compact support
(the support of <pj). If we apply the first structure theorem to each <pjT and
sum, we obtain the third structure theorem.

6.3 Distributions with point support

We have seen that the 6-function and all its derivatives have support equal to
the single point {O}. In view of the structure theorems of the last section, it is
tempting to guess that there are no others; a distribution with support {O} must
be of the form

T = L aa (:x) a 6 (finite sum).

On the other hand, we might also conjecture that certain infinite sums might be
allowed if the coefficients aa tend to zero rapidly enough. It turns out that the
first guess is right, and the second is wrong. In fact, the following is true: given
any numbers Ca , there exists a test function <p in V such that (a/ax)'" <p(O) =
Ca. In other words, there is no restriction on the Taylor expansion of a Coo

function (recall that the Taylor expansion about 0 is

This means that the Taylor expansion does not have to converge (even if it does
converge, it does not have to converge to the function, unless <p is analytic).
Suppose we tried to define T = 2:: aa (a/ax) a 6 where infinitely many aa
are nonzero. Then (T, <p) = 2::( -I )Ialaaca and by choosing Ca appropriately
we can make this infinite. So such infinite sums of derivatives of 6 are not
distributions. Incidentally, it is not hard to write down a formula for <p given
Ca. It is just
Distributions with point support 81

where 1jJ E V is equal to one in a neighborhood of the origin,

-1
L o
FIGURE 6.1

and Aa --t 00 rapidly as a --t 00. If supp 1jJ is Ixl :::; I, then supp 1jJ(AaX) is
Ixl :::; A;;1 which is shrinking rapidly. For any fixed x f- 0, only a finite number
of terms in the sum defining 'P will be nonzero, and for x = 0 only the first
term is nonzero. If we are allowed to differentiate the series term by term then
we get (a/ax)'" 'P(O) = C a because all the derivatives of 1jJ( AaX) vanish at the
origin. It is in justifying the term-by-term differentiation that all the hard work
lies, and in fact the rate at which Aa must grow will be determined by the size
of the coefficients Ca.
Just because there are no infinite sums L aa (a/ax) a 6 does not in itself
prove that every distribution supported at the origin is a finite sum of this sort.
It is conceivable that there could be other, wilder distributions, with one point
support. In fact, there are not. Although I cannot give the entire proof, I can
indicate some of the ideas involved. If T has support {O}, then (T, 'P) = 0 for
every test function 'P vanishing in a neighborhood of the origin. However, it
then follows that (T, 'P) = 0 for every test function 'P that vanishes to order N
at zero, (a/ax)'" 'P(O) = 0 for all lal :::; N (here N depends on the distribution
T). This is a consequence of the structure theorem, T = L (a/ax )'" fa (finite
sum) and an approximation argument: if'P vanishes to order N we approximate
'P by 'P(x)(1 - 1jJ(AX)) as A --t 00. Then (T, 'P(x)(1 - 1jJ(AX))) = 0 because
'P(x)(I - 1jJ(AX)) vanishes in a neighborhood of zero and

(T, 'P(x)(1 - 1jJ(Ax)))

= \L (!) a fa, 'P(x)(1 - 1jJ(AX))

= L( J_1) la l fa (x) (:x) a ['P(x)(1 - 1jJ(AX))] dx

82 The Structure of Distributions

which converges as A ----+ 00 to

L(_I)la l ! fa(x) (:x) a <p(x)dx = (T,<p)

(the convergence as A ----+ 00 follows from the fact that <p vanishes to order Nat
zero and we never take more than N derivatives-the value of N is the upper
bound for lad in the representation T = L: (a/ax)" fa). Thus (T, <p) is the
limit of terms that are always zero, so (T, <p) = o. This is the tricky part of the
argument, because for general test functions, say if <p(0) =f:. 0, the approximation
of <p(x) by <p(x)(1 -1/J(AX)) is no good-it gives the wrong value at x = O.
Once we know that (T, <p) = 0 for all <p vanishing to order N at the origin, it
is just a matter of linear algebra to complete the proof. Let us write, temporarily,
VN for this space of test functions. It is then easy to construct a finite set of
test functions <Pa (one for each a with lal : : ; N) so that every test function <p
can be written uniquely <p = IjJ + L: Ca<Pa with IjJ E VN. Indeed we just take
<Pa = (l/a!)x<>1/J(x), and then Ca = (a/ax)" <p(0) does the trick. In terms of
linear algebra, VN has finite codimension in V. Now (T, <p) = L: ca(T, <Pa) for
any <p E V because (T, 1jJ) = O. But the distribution L: ba (a/ax) 0 6 satisfies

(L ba (:x) a 6, <p ) = L L c~ba ( (:x) 6, <p~ )

= L(-I)lolcaba

since

if a =f:. (3 and 1 if a = (3. Thus we need only choose bo = (-I )1 0 1(T, <p) in
order to have (T, <p) equal to

for every test function <p, or T = L: bo (a/ax) a 6.

We have already seen an interesting application of this theorem: if a harmonic
function has polynomial growth, then it must be a polynomial. If lu(x)1 ::::;
c(l + IxDN then u may be regarded as a tempered distribution. Then ~u = 0
means -1~12u(O = 0 which easily implies supp '11, = {O} (see problem 16
for a more general result). So u(~) = L: Ca (a / a~)" 6 which implies u is a
polynomial. Of course not every polynomial is harmonic, but for n ~ 2 there
is an interesting theory of harmonic polynomials called spherical harmonics.
Positive distributions 83

6.4 Positive distributions

Mathematicians made an unfortunate choice when they decided that positive
should mean strictly greater than zero, leaving the unwieldly term nonnegative
to mean ~ o. That makes liars of us, because we like to say positive when
we really mean nonnegative. Thus we will define a positive distribution to be
one that takes nonnegative values on nonnegative test functions, (T, cp) ~ 0 if
cp ~ o. (Throughout this section we deal only with real-valued functions and
distributions.) Clearly, if (T, cp) = J f(x)cp(x) dx and f ~ 0 then T is a positive
distribution. Moreover, if f is continuous, then T is a positive distribution if
and only if f ~ o. If f is not continuous then we encounter the same sort of
difficulty that we did in the discussion of the support of f. The resolution of
the difficulty is similar: if we write f = f+ - f- as the unique decomposition
into positive and negative parts (so f+(x) = max(J(x),O) and f-(x) = max
(-f(x),O))

FIGURE 6.2

then T is a positive distribution if and only if J f- (x) dx = o.

Thus the definition of positive distribution works the way it should for func-
tions. Are there any other positive distributions? You bet! The b-function is
positive, since (15, cp) = cp(O) ~ 0 if cp ~ O. But 15' is not positive, (nor is -15')
since we cannot control the sign of the derivative by controlling the sign of the
function.
A natural question arises whenever we define positivity for any class of ob-
jects: is everything representable as a difference of two positive objects? If
so, we can ask if there is some canonical choice of positive and negative parts.
But for distributions we never get to the second question, because the answer to
the first is a resounding NO! Very few distributions can be written in the form
TJ - T2 for T J and T2 positive distributions. In particular, 15' cannot.
To understand why positive distributions are so special we need to turn the
positivity condition into a slightly different, and considerably stronger, inequal-
ity. It is easier to present the argument for distributions of compact support. So
we suppose supp (T) is contained in Ixl :::; R, and we choose a nonnegative
84 The Structure of Distributions

test function 'IjJ that is identically one on a neighborhood of supp (T). Then
(T, 'IjJ) is a positive number; call it M. We then claim that I(T, cp)1 :::; eM where
e = sup", Icp(x)l, for any test function cpo
To prove this claim we simply note that the test functions e'IjJ ± cp are positive
on a neighborhood of supp (T), and so (T, e'IjJ ± cp) ~ 0 (here we have used
the observation that if T is a positive distribution of compact support we only
need nonnegativity of the test function cp on a neighborhood of the support of
T to conclude (T, cp) ~ 0). But then both (T, cp) :::; (T, e'IjJ) and

-(T, cp) :::; (T, e'IjJ)

which is the same as

I(T,cp)l:::; (T,e'IjJ) = eM

as claimed.
What does this inequality tell us? It says that the size of (T, cp) is controlled
by the size of cp (the maximum value of Icp(x)l). That effectively rules' out
anything that involves derivatives of cp, because cp can wiggle a lot (have large
derivatives) while remaining small in absolute value.
Now suppose T = T) - T2 where T) and T2 are positive distributions of
compact support. Then if M) and M2 are the associated constants we have

so T satisfies the same kind of inequality. Since we know 0' does not satisfy
such an inequality, regardless of the constant, we have shown that it is not the
difference of positive distributions of compact support. But it is easy to localize
the argument to show that 0' =1= T) - T2 even if T) and T2 do not have compact
support: just choose 'IjJ as before and observe that 0' = T) - T2 would imply
0' = 'ljJo' = 'ljJT) - 'ljJT2 and 'ljJT) and 'ljJT2 would be positive distributions of
compact support after all.
For readers who are familiar with measure theory, I can explain the story
more completely. The positive distributions all come from positive measures. If
IL is a positive measure that is locally finite (IL(A) < 00 for every bounded set
A), then (T, cp) = J cp dlL defines a positive distribution. The converse is also
true: every positive distribution comes from a locally finite positive measure.
This follows from a powerful result known as the Riesz representation theorem
and uses the inequality we derived above. If you are not familiar with measure
theory, you should think of a positive measure as the most general kind of
probability distribution, but without the assumption that the total probabity adds
up to one.
Continuity of distribution 85

For an example of an "exotic" probability measure consider the usual con-

struction of the Cantor set: take the unit interval, delete the middle third, then in
each of the remaining intervals delete the middle third, and iterate the deletion
process infinitely often.

1st stage

2 nd stage

FIGURE 6.3

Now define a probability measure by saying each of the two intervals in stage
1 has probability 4,
each of the four intervals in stage 2 has probability ~, and
in general each of the 2n intervals in the nth stage has probability 2 -n. The
resulting measure is called the Cantor measure. It assigns total measure one to
the Cantor set and zero to the complement. The usual length measure of the
Cantor set is zero.

6.5 Continuity of distribution

So far I have been deliberately vague about the meaning of continuity in the
definition of distribution, and I have described for you most of the theory of dis-
tribution without involving continuity. For a deeper understanding of the theory
you should know what continuity means, and it may also help to clarify some
of the concepts we have already discussed. There are at least three equivalent
ways of stating the meaning of continuity, namely

1. a small change in the input yields a small change in the output~

2. interchanging limits, and

3. estimates.

What does this mean in the context of distribution? We want to think of a

distribution T as a function on test functions, (T, cp). So (1) would say that if cp)
is close to CP2 then (T, CP)) is close to (T, CP2), (2) would say that if Iimj--+oo CPj =
86 The Structure of Distributions

cP then limj-+oo(T,cpj) = (T,cp), and (3) would say that I(T,cp)1 is less than a
constant times some quantity that measures the size of cpo Statements like (1)
and (2) should be familiar from the definition of continuous function, although
we have not yet explained what "CPt is close to CP2" or "Iimj-+oo CPj = cP"
should mean. Statement (3) has no analog for continuity of ordinary functions,
and depends on the linearity of (T, cp) for its success. Readers who have seen
some functional analysis will recognize this kind of estimate, which goes under
the name "boundedness."
To make (1) precise we now list the conditions we would like to have in
order to say that CPt is close to CP2. We certainly want the values CPt (x) close
to CP2(X), and this should hold uniformly in x. To express this succinctly it is
convenient to introduce the "sup-norm" notation:

IIcplioo = sup Icp(x)l·

The meaning of "norm" is that

and 0 :::; IIcplioo with equality if and only if cP == o.

Exercise: Verify these conditions.
So we want IIcpt + CP21100 :::; E if CPt is close to CP2. But we have to demand
more, because in our work with test functions we often differentiate them. Thus
we will demand also

which is not a consequence of IIcpt + CP21100 :::; E even if we change E (a small

wiggly function cP may be close to zero in the sense that IIcp - 01100 = IIcplioo is
small, but IIcp'IIoo may be very large). To be more precise, we can only demand

for a finite number of derivatives, because higher order derivatives tend to grow
rapidly as the order of the derivative goes to infinity.
There is one more condition we will require in order to say CPt is close to
CP2, and this condition is not something you might think of at first: we want the
supports of CPt and CP2 to be fairly close. Actually it will be enough to demand
that the supports be contained in a fixed bounded set, say Ixl :::; R for some
given R. Without such a condition, we would have to say that a test function
cP is close to 0 even if it has a huge support (of course it would have to be very
small far out), and this would be very awkward for the kind of locally finite
infinite sums we considered in section 6.2.
Continuity of distribution 87

So, altogether, we will say that !.pI is close to !.p2 if

for all lal : : ; m and supp !.ph supp !.p2 ~ {Ixl : : ; R}. Of course "close" is a
qualitative word, but the three parameters 8, m, R give it a quantitative meaning.
The statement that T is continuous at !.pI is the statement: for every f > 0 and
R sufficiently large there exist parameters 8, m (depending on !.pI, R, and f)
such that if !.pI is close to !.p2 with parameters 8, m, R then

Continuity of T is defined to be continuity at every test function. Notice that

this really says we can make (T,!.pl) close to (T, !.p2) if we make !.pI close to
!.p2, but we must make !.pI close to !.p2 in the rather complicated way described
above.
Now we can describe the second definition of continuity in similar terms. We
need to explain what we mean by Iimj--+oo !.pj = !.p for test functions. It comes
as no surprise that we want

for every a (a = 0 takes care of the case of no derivatives), and the condition on
the supports is that there exists a bounded set, say Ixl : : ; R, that contains supp
!.p and all supp !.pj. Again this is not an obvious condition, but it is necessary if
we are to have everything localized when we take limits. This is the definition
of the limit process for V. Readers who are familiar with the c~ncepts will
recognize that this makes V into a topological space, but not a metric space-
there is no single notion of distance in any of our test function spaces V,S,
or E.
With this definition of limits of test functions, it is easy to see that if T
is continuous then (T,!.p) = Iimj--+oo (T,!.pj) whenever !.p = Iimj--+oo !.pj. The
argument goes as follows: we use the continuity of T at!.p. Given f > 0 we
take R as above and then find 8, m so that I(T,!.p) - (T, vJ)1 : : ; f whenever !.p
and vJ are close with parameters 8, m, R. Then every!.pj has support in Ixl : : ; R,
and for j large enough

for all lal : : ; m. This is the meaning of

88 The Structure of Distributions

Thus, for j large enough, all CPj are close to cP with parameters 8, m, R and so

But this is what limj->oo (T, CPj) = (T, cp) means. A similar argument from
the contrapositive shows the converse: if T satisfies limj->oo (T, CPj) = (T, cp)
whenever limj->oo CPj = cP (in the above sense) then T is continuous.
This form of continuity is extremely useful, since it allows us to take limiting
processes inside the action of the distribution. For example, if CPs (t) is a test
function of the two variables (s, t), then

The reason for this is first that the linearity allows us to bring the difference
quotient inside the distribution,

and then the continuity lets us take the limit as h -+ 0 because

in the sense of V convergence. Similarly with integrals:

because the Riemann sums approximating f~oo CPs ds converge in the sense of
V, and the Riemann sums are linear. We will use this observation in section
7.2 when we discuss the Fourier transform of distributions of compact support.
The third form of continuity, the boundedness of T, involves estimates that
generalize the inequality

that a positive distribution of compact support must satisfy. The generalization

involves allowing derivatives on the right side and also restricting the support
of cpo Here is the exact statement: for every R there exists M and m such that
Continuity of distribution 89

for all ep with support in Ixl ~ R. It is easy to see that this boundedness implies
continuity in the first two forms. For example, if epl and ep2 both have support
in Ixl ~ R then
I(T,epl) - (T,ep2)1 = I(T,epl -ep2)1

~ M ,afM II (:x) a epl - (:x) a ep211oo'

so if

for all Iad ~ m then

I(T,epl) - (T,ep2)1 ~ cm M8
where C m is the number of multi-indices a with lal ~ m. Since Cm and M are
known in advance, we just take 8 = f/cmM.
But conversely, we can show that continuity in the first sense implies bound-
edness by using the linearity of T in one of the basic principles of functional
analysis. The continuity of T at the zero function means that given any f > 0,
say f = 1, and R, there exist 8 and m such that if supp ep ~ {Ixl ~ R} and
11(8/8xtep-Olloo ~ 8 for allial ~ m then I(T,epl ~ l. Now the trick is
that for any ep with supp ep ~ {Ixl ~ R} we can multiply it by a small enough
II
constant C so that we have (8/8xt ceplloo ~ 8 for allial ~ m. Just take
c to be the smallest of 811 (8/8xt epll:,1 for lal ~ m. Then I(T, cep)1 ~ 1 or
I(T, ep)1 ~ c l and

so we have the required boundedness condition with M = 8- 1.

The boundedness form of continuity explains why positive distributions are so
special-they satisfy the boundedness estimate with m = 0, or no derivatives.
More generally, the number of derivatives involved in the boundedness estimate
is related to the number of derivatives in the structure theorem. Recall that for
T E V' we had T = I: (8/ 8x) a Ja for Ja continuous, where only a finite
number of terms are nonzero on any bounded set. Suppose that on Ixl ~ R we
only have terms with lal ~ m. In other words, if ep is supported in Ixl ~ R,
then

(T,ep) = L ((:x)a Ja,ep;

lalsm

= L
lalsm
(_l)la l J
Ja(x) (:x) a ep(x)dx.
90 The Structure of Distributions

But we have the estimate

hence

where M is the maximum value of

1 Ixl:C;R
Ifa(X)1 dx

(since fa is continuous, these integrals are all finite).

In fact, what we have just argued is that the structure theorem implies bound-
edness, which is putting the cart before the horse. In the proof of the structure
theorem, the bounded ness is used in a crucial way. However, the above converse
argument at least makes the structure theorem plausible.
There are similar descriptions of continuity for the two other classes of distri-
butions, S' and f/, and in fact they are somewhat simpler to state because they
do not involve supports. I will give the definition in terms of boundedness, and
leave the first two continuity forms as an exercise. For distributions of compact
support, the boundedness condition is just that there exist one estimate

for some m and M, for all test functions 'P, regardless of support. For tempered
distributions, the boundedness condition is again just a single estimate, but this
time of the form

for some m, M, and N, and all test functions (it turns out to be the same
whether you require 'P E D or just 'P E S, for reasons that will be discussed
in the next section). This kind of condition is perhaps not so surprising if you
recall that the finiteness of

for all N and a is the defining property of S.

So why didn't I tell you about the continuity conditions, at least in the form
of boundedness, right from the start? It is usually not difficult to establish the
Continuity of distribution 91

boundedness estimate for any particular distribution. I claim that essentially

all you have to do is examine the proof that the distribution is defined for
all test functions, and throw in some obvious estimates. (I'll illustrate this in a
moment.) But it is sometimes burdensome, and what's more, it makes the whole
exposition of the theory burdensome. For example, when we defined (a/ax j)T
by

\ a~j T, <p) = - \ T, a~j <p) ,

we would have been required to show that (a/ax j)T is also bounded. That
argument looks like this: since T is bounded, for every R there exists m and
M such that

if supp <p ~ {Ixl :::; R}. For the same R, then, we have

if supp <p ~ {Ixl :::; R}, which is of the required form with m increased to
m + 1. The same goes for all the other operations on distributions. Thus, by
postponing the discussion of bounded ness until now, I have saved both of us a
lot of trouble.
Here is an example of how the boundedness is implicit in the proof of exis-
tence. Remember the distribution

(T, <p) = 1-1-00 + 11[00 Ixl + 11

<p(x) dx
_I
<p(X) - <p(O) dx
Ixl
that was concocted out of the function 1/lxl that is not locally integrable? To
show that it was defined for every test function <p, you had to invoke the mean
value theorem,

<p(x) - <p(O) = x<p'(y)

for some y between 0 and x. Since y depends on x we will write it explicitly

as y(x). Then
92 The Structure of Distributions

and both integrals exist and are finite. To get the boundedness we simply use a
standard inequality to estimate the integrals:

I(T, 'P)I 1
~ [ : + 00 II~I)I dx + [II 1'P'(y(x))1 dx.
If 'P is supported in Ix I ~ R then

J-00- + i roo 1'PIxl(x)1

l
dx =J- I+ir 1'PIxl(x)1
-R l
R
dx ~ 21ogRII'P1100
and

Thus altogether
I(T,'P)I ~ 21ogRII'P1100 +211'P'1100
which is an estimate of the required form with m = I. Perhaps you are surprised
that the estimate involves a derivative, while the original formula for T did not
appear to involve any derivatives. But, of course, that derivative does show up
in the existence proof.
Finally, you should beware confusing the continuity of a single distribution
with the notion of limits of a sequence of distributions. In the first case we
hold the distributions fixed and vary the test functions, while in the second case
we hold the test function fixed and vary the distributions, because we defined
Iimj--+oo Tj = T to mean Iimj--+oo (Tj, 'P) = (T, 'P) for every test function. This
is actually a much simpler definition, but of course each distribution T j and T
must be continuous in the sense we have discussed.

6.6 Approximation by test functions

When we first discussed how to define various operations on distributions, we
began with the important theorem that all distributions, no matter now rough
or how rapidly growing, may be approximated by test functions, which are
infinitely smooth and have compact support. How is this possible? In this
section I will explain the procedure. As you will see, it involves two operations
on distributions-convolution and multiplication by a smooth function-so that
you might well raise the objection of circular reasoning: we used the theorem
to define the operations, and now we use the operations to prove the theorem.
In fact, to give a strictly logical exposition, you would have to use the method
of adjoint identities to define the operations, then prove the theorem, and finally
use the theorem to show that operations defined by the adjoint identities are the
same as the operations defined by approximation by test functions. However, for
Approximation by test functions 93

this account, the important point to understand is the approximation procedure,

because the same ideas are used in proving many other approximation theorems.
The two components of the approximation procedure are
1. multiplication by a smooth cut-off function, and
2. convolution with an approximate identity.
They can be performed in either order. Since we have already discussed (2),
let's begin with (1), which is in many ways much simpler. A cut-off function
is a test function whose graph looks like

FIGURE 6.4

It is identically one on a large set and then falls off to zero. Actually we
want to think about a family of cut-off functions, with the property that the set
on which the function is one gets longer and larger. To be specific, let 'Ij; be one
fixed test function, say with support in Ixi S; 2, such that 'Ij;(x) = 1for Ixi S; 1.
Then look at the family 'Ij;(EX) as E- t O. We have 'Ij;(EX) = 1 if IExl S; 1,
or equivalently, if Ixi 0::; c l , and c l - t 00 as E- t O. Since 'Ij;(EX) E V, we
can form 'Ij;(Ex)T for any distribution T, and it should come as no surprise that
'Ij;( Ex)T - t T as E - t 0 as distributions. Indeed if cp is any test function, then
cp has support in Ixl S; R for some R, and then cp(X)'Ij;(EX) = cp(x) if E< l/R
(for then 'Ij;(EX) = 1 on the support of cp). Thus

('Ij;(Ex)T,cp) = (T,'Ij;(EX)Cp) = (T,cp)

for E < 1/ R, hence trivially Iim€->o('Ij;(Ex)T, cp) = (T, cp), which is what we
mean by 'Ij;( Ex)T - t T as E - t O. But it is equally trivial to show that 'Ij;( Ex)T
is a distribution of compact support: in fact its support is contained in Ixl S; 2/E
because 'Ij;( EX) = 0 if Ixi ;: : 2/ E. To summarize: by the method of multiplication
by cut-off functions, every distribution may be approximated by distributions of
compact support. To shC!w that every distribution can be approximated by test
functions, it remains to show that every distribution of compact support can be
approximated by test functions.
Now if T is a distribution of compact support and cp is a test function, then
cp *T is also a test function. In fact we have already seen that supp cp *t <;:; supp
cp+ supp T (in the sense of sum sets) so cp * T has compact support, and cp * T
is the function cp*T(x) = (T,T-xcp) (recall that cp(x) = cp(-x), which is Coo
94 The Structure of Distributions

because all derivatives can be put on cpo Let CP.(x) be an approximate identity (so
CP.(x) = cncp(c1x) where f cp(x) dx = 1). Then we claim limf--+o cp.*T = T.
Indeed, if'IjJ is any test function then (cp. * T, 'IjJ) = (T, I{J. * 'IjJ) and I{J. is also
an approximate identity, so I{J. * 'IjJ ---4 'IjJ hence (T, I{J. * 'IjJ) ---4 (T, 'IjJ).
Actually, this argument requires that I{J. *'IjJ ---4 'IjJ in V, as described in the last
section. That means we need to show that all the supports of I{J. * 'IjJ remain in a
fixed compact set, and that all derivatives (a/ax t (I{J. *'IjJ) converge uniformly
to (a/axt 'IjJ. But both properties are easy to see, since supp (I{J * 'IjJ) ~
supp I{J. + supp 'IjJ and supp I{J. shrinks as E ---4 0, while (a/ax t (I{J. * 'IjJ) =
I{J. * (a/ax) a 'IjJ and the approximate identity theorem shows that this converges
uniformly to (a/axt 'IjJ.
We can combine the two approximation processes into a single, two-step
procedure. Starting with any distribution T, form CP. * ('IjJ(Ex)T). This is a test
function, and it converges to T as E ---4 0 (to get a sequence, set E = 1/ k, k =
1,2, ... ). It is also true that we can do it in the reverse order: 'IjJ( EX)( CP. * T)
is a test function (not the same one as before), and these also approximate T as
E ---4 O. The proof is similar, but slightly different.
The same procedure shows that functions in V can approximate functions in
S or E, and distributions in 5' and E' , where in each case the approximation
is in the sense appropriate to the object being approximated. For example, if
f E S then we can regard f as a distribution, and so CP. * ('IjJ(Ex)f(x)) ---4 f as
a distribution: but we are asserting more, that the approximation is in terms of
S convergence, i.e.,

(1 + Ixl)N (!) a CP. * ('IjJ(Ex)f(x)) ---4 (1 + Ixl)N (!) a f(x)

uniformly as E ---4 O.
Now we can explain more clearly why we are allowed to think of tempered
distributions as a subclass of distribution. Suppose T is a distribution (so (T, cp)
is defined initially only for cp E V), and T satisfies and estimate of the form

for all cp E V. Then I claim there is a natural way to extend T so that (T, cp)
is defined for all cp E S. Namely, we approximate cp E S by test functions
CPk E V, in the above sense. Because of the estimate that T satisfies, it follows
that (T, CPk) converges as k ---4 00, and we define (T, cp) to be this limit. It is
not hard to show that the limit is unique-it depends only on cp and not on the
particular approximating sequence. And, of course, T satisfies the same estimate
as before, but now for all cp E S. So the extended T is indeed continuous on
S, hence is a tempered distribution. The extension is unique, so we are justified
in saying the original distribution is a tempered distribution, and the existence
of an estimate of the given form is the condition that characterizes tempered
Local theory of distributions 95

distributions among all distributions. In practice it is usually not difficult to

decide if a given distribution satisfies such an estimate, although there are some
general existence theorems for distributions where it is either unknown, or very
difficult, to decide. This is not just an academic question, since, as we have
seen, Fourier transform methods require that the distribution be tempered.

6.7 Local theory of distributions

From the beginning, we have emphasized that distributions can be defined on
any open set n in JRn. Nevertheless, we have devoted most of our attention to
distributions defined on all of JRn. In this section we will consider more closely
the distinction, and some of the subtler issues that arise.
Recall that the space of test functions V(n) is defined to be the Coo functions
cp with compact support in n. In other words, cp must be zero outside a compact
set K which lies in the open set n, so K must have positive distance to the
complement of n. Of course V(n) ~ V(JRn). Then V'(n), the distributions
on n, are the continuous linear functionals on V(n), where continuity means
exactly the same thing as for distributions on JRn, only restricted to test functions
in V(n). The two natural questions we need to answer are
1. what is the relationship between V' (n) and V' (JRn), and
2. what is the relationship between V'(n) and the distributions of compact
support whose support is contained in n?
Suppose T E V' (JRn). Then (T, cp) is defined for any cp E V(JRn), and since
V(n) ~ V(JRn) we obtain a distribution in V' (n) by restriction. Let RT denote
this restriction:
(RT, cp) = (T, cp) for cp E v(n).
Although RT is defined by the same formula as T, it is different from T since it
forgets about what T does outside n. In particular, R is not one-to-one because
we can easily have RTI = RT2 with TI =f:. T2 (just add a distribution supported
outside n to TI to get T2)' Do we get all distributions in V' (n) by restriction of
distributions in V' (JRn )? In other words, is the mapping R : V' (JRn) ---4 V' (n)
onto? Absolutely not! This is what makes the local theory of distributions
interesting.
Let's give an example of a local distribution that is not the restriction of a
global distribution. Let n = (0, 1) ~ JR1 . Consider T E V' (n) given by

For any cp E V(O, 1) this is a finite sum, since cp must be zero in a neighborhood
96 The Structure of Distributions

of 0, and it is easy to see that T is linear and continuous on V(O, 1). But clearly
T is not the restriction of any distribution in V' (Jm.I ) because the structure
theorem would imply that such a distribution has locally finite order, which T
manifestly does not. Another example is

which is not a restriction since we can construct cp E V(Jm.1) supported in [0, 1]

for which the sum diverges.
What these examples show, and what is true in general, is that local distribu-
tions in V'(n) can behave arbitrarily as you approach the boundary of n (be-
cause all the test functions in V(n) vanish before you reach the boundary), but
restrictions cannot have too rapid growth near the boundary. But now suppose
T is a local distribution that is a restriction, T = R.TI for some TI E V' (Jm.n ).
Can we find an extension? We know that T does not uniquely define Tt, but is
there some smallest or best extension? Could we, for example, find an extension
TI that has support in n (the closure of n)? Again the answer is no.
As an example, consider

(T, cp) = 11
o
cp(x) dx
x
for cp E V(O,I). Since cp vanishes near zero the integral is finite (i.e., l/x is
locally integrable on (0, 1)). We have seen how to extend T to a distribution
on Jm.1 , as for example

(T, cp) = lim

,->0
1Ixl~'
cp(x) dx.
x
But if we try to make the support of the extension [0, 1] we would have to take
a definition

The trouble is that the finite sum L:.f=o Cjcp(j) (0) is always well defined, while
if we take cp with cp(O) =1= 0 the integral will diverge.
On the other hand, it is always possible to find an extension whose support is
a small neighborhood of n, provided of course that T is a restriction. Indeed,
if T = R.TI then also T = R.( 'ljJT1 ) where 'IjJ is any Coo function equal to 1
on n. We then have only to construct such 'IjJ that vanishes outside a small
neighborhood of n.
Turning to the second question, we write E'(n) for the distributions of com-
pact support whose support is contained in n. These are global distributions,
but it is also natural to consider them as local distributions. The point is that if
Local theory of distributions 97

Tl =1= T2 are distinct distributions in E'(S1), then also RTI =1= RT2, so that we do
not lose any information by identifying distributions in E' (S1) ~ V' (1m.n) with
their restrictions in V'(S1). Thus E'(S1) ~ V'(S1). Actually E'(S1) is a smaller
space than the space of restrictions, since the support of a distribution in E'(S1)
must stay away from the boundary of S1. Thus, for example,

(T, 'P) = 11 'P(X) dx

is a distribution in E'(1m.) which restricts to a distribution in V'(O, I), but it is

not in E' (0, I) since its support is the closed interval [0, 1J.
We can also regard E'(S1) as a space of continuous linear functions on E(S1),
the space of all Coo functions on S1 (no restriction on the growth at the bound-
ary). In summary, we have the containments

and also E(S1) ~ V'(S1), but no other containments are valid.

The theory of distributions can be extended to smooth surfaces in 1m.n and,
more generally, smooth manifolds. To keep the discussion on a concrete level, I
will consider the case of the sphere sn-I ~ 1m.n given by the equation Ixi = 1.
Note that Sl ~ 1m.2 is just the unit circle in the plane, and S2 ~ 1m.3 is the usual
unit sphere in space. What we would like to get at is a theory that is intrinsic
to the sphere itself, rather than the way it sits in Euclidean space. However,
we will shamelessly exploit the presentation of sn-l in 1m.n in order to simplify
and clarify the discussion.
Before beginning, I should point out that we already have a theory of distribu-
tions connected with the sphere, namely, the distributions on 1m.n whose support
is contained in sn-l. For example, we have already considered the distribution

(T,'P) =
lS2
r 'P(x)dO"(x)
in connection with the wave equation. A simpler example is
8
(T,'P) = -8 'P(1,0, ... ,0).
XI

However, this class of distributions really does depend on the embedding of

sn-l in 1m.n- l , and so is extrinsic rather than intrinsic. Later we will see that it
is a larger class than the distributions V' (sn-I) that we are going to define.
The idea is quite simple. In imitation of the definition of V' (1m.n ), we want
to define V' (sn-l) to be the continuous linear functionals on a space of test
functions v(sn-I). These test functions should be just the Coo functions on
the sphere (since the sphere is compact, we do not have to restrict the support
of the test functions in any way). One way to define a Coo function on sn-I is
just as the restriction to the sphere of a Coo function on the ambient space 1m.n ;
equivalently, 'P defined on sn-I is Coo if it can be extended to a Coo function
98 The Structure of Distributions

on ~n. Of course this definition appears to be extrinsic, and the extension is by

no means unique (in fact there can also be many extensions that are not COO).
To make matters better, it turns out that we can specify the extension. Let 'Ij;(r)
be a COO function of one variable whose graph looks like this

o
FIGURE 6.S

Then, given cp defined on sn-l, let

Ecp(x) = 'Ij;(lxl)cp (1:1)

be the extension that multiplies the angular behavior of cp by the radial behavior
of'lj; (for any x f. 0, the point x/lxl belongs to sn-l, so cp (x/lxl) is well
defined, and we set Ecp(O) = 0 since 'Ij;(0) = 0). Then cp has a Coo extension
if and only if Ecp is Coo. So we can define 'D( sn-l) as the space of functions
cp on sn-l such that Ecp E 'D(~n).
Still, we have not overcome the objection that this is an extrinsic definition.
To get an equivalent intrinsic definition we need to consider the idea of local
coordinates on the sphere. You are no doubt familiar with polar coordinates in
~2 and spherical coordinates in ~3:

x = rcos {} x = rcos{}sin¢
y = rsin{} y = r sin {} sin ¢
z = rcos¢

We obtain points on the sphere by setting r = 1, which leaves us one ({}) and
two ({}, ¢) coordinates for Sl and S2. In other words, (cos {}, sin (}) gives a point
on Sl for any choice of {}, and (cos {} sin ¢, sin {} sin ¢, cos ¢) gives a point on
the sphere for any choice of {}, ¢ (you can think of {} as the longitude and I - ¢
as the latitude.) Of course we must impose some restrictions on the coordinates
if we want a unique representation of each point, and here is where the locality
of the coordinates system comes in.
For the circle, we could restrict {} to satisfy 0 :'S {} < 27r to have one {} value
for each point. But including the endpoint 0 raises certain technical problems, so
we prefer to consider a pair of local coordinate systems to cover the circle, say
Local theory of distributions 99

o < () < 3; and -7r < () < ~, each given by an open interval in the parameter
space and each wrapping ~ of the way around the circle, with overlaps:

3n
0< ()< T

7t
-n< ()<-
2

FIGURE 6.6

Each local coordinate system describes a portion of the circle by a one-to-one

smooth mapping from the coordinate () to the circle, () --+ (cos (), sin ()). In this
case, the mapping formula is the same for both local coordinate systems, but
this is a coincidence, so we do not want to emphasize the fact.
For the sphere, we can restrict () to 0 :'S () < 27r and ¢ to 0 :'S ¢ :'S 7r to get
a representation that is almost unique (when ¢ = 0 or 7r, at the north and south
poles, () can assume any value-what time is it at the north pole?). But the
situation here is even worse, because the north and south poles are singularities
of the coordinate system (not of the sphere!). If we consider the local coordinate
system given by spherical coordinates with say 0 < () < 3; and ~ < ¢ < 5;,
then we will get a smooth one-to-one parametrization of a portion of the sphere.
For a second coordinate system we could take -7r < () < ~ and ~ < ¢ < 5;,
and together they cover all of the sphere except for small circular caps around
the north and south poles. We can cover these by similar rotated spherical
coordinate systems Gust interchange y and z, for example) centered about east
and west poles. The point is that there are many such coordinate systems, and
we can think of them as giving an atlas of local "maps" of portions of the
sphere. These local coordinate systems are intrinsic to the sphere, and they give
the sphere its structure as a two-dimensional manifold (the dimension is the
number of coordinates).
Now we can state the intrinsic definition of a C= function on the sphere: it
is a function that is C= as a function of the coordinates (the coordinates vary
in an open set in a Euclidean space of the same dimension as the sphere) for
every local coordinate system. In fact it is enough to verify this for a set of
local coordinate system that covers the sphere, and then it is true for all local
coordinate systems. It turns out that this intrinsic definition is equivalent to the
extrinsic definition given before.
100 The Structure of Distributions

Once we have V( sn-I) defined, we can define V' (sn-I) as the continuous
linear functionals on V( sn-I ). Continuity requires that we define convergence
in v(sn-I). From the extrinsic viewpoint, Iimj--+oo 'Pj = 'P in v(sn-I) just
means Iimj--+oo E'Pj = E'P in V(lJR n ). Intrinsically, it means 'Pj converges
uniformly to 'P and all derivatives with respect to the coordinate variables also
converge (a minor technicality arises in that the uniformity of convergence is
only local-away from the boundary of the local coordinate system). Because
the sphere is compact we do not have to impose the requirement of a common
compact support.
_ Now if T E v,(sn-I), we can also consider it as a distribution on IJR n , say
T, via the identity

where 'Plsn-t denotes the restriction of'P E V(lJR n ) to the sphere, and of course
'Plsn-t E v(sn-I). It comes as no surprise that T has support contained in
sn-I. But not every distribution supported on the sphere comes from a distri-
bution in v,(sn-I) in this way. For example (T,'P) = (8/ihl)'P(1,0, ... ,0)
involves a derivative in a direction perpendicular to the sphere

FIGURE 6.7

and so cannot be computed in terms of the restriction 'Plsn-t. In fact, we have

the following structure theorem.

Theorem 6.7.1
Every distribution T E V'(lJRn) supported on the sphere sn-I can be written

T= L (8)k
N

k=O
ar Tk
Local theory of distributions 101

where Tk is a distribution arising from Tk E V' (sn-l) by the above identifica-

tion, and

(in general we have to write

_a -"--1.._
n aX

ar - f,;r Ixlaxj'
but Ixl = 1 on the sphere, and of course a/ar is undefined at the origin.)

The same idea can be used to define distributions V' (E) for any smooth
surface I: ~ lffi.n . If I: is compact (e.g, on ellipsoid I:j=I AjX] = I,Aj > 0)
there is virtually no change, while if I: is not compact (e.g, a hyperboloid
xI xi -
+ xj = 1) it is necessary to insist that test functions in V(I:) have
compact support. (Readers who are familiar with the definition of an abstract
C= manifold can probably guess how to define test functions and distributions
on a manifold.)
Here is one more interesting structure theorem involving V' (sn-I) for n ~ 2.
A function that is homogeneous of degree a (remember this means f(AX) =
N' f(x) for all A > 0) can be written f(x) = Ixl'" F (x/Ix!) where F is a
function on the sphere. Could the same be true for homogeneous distributions?
It turns out that it is true if a > -n (or even complex a with Re a > -n).
Let T E v'(sn-l). We want to make sense of the symbollxl"'T (x/lxl). If T
were a function on the sphere we could use polar coordinates to write

! Ixl"'T (I:') <p(x) dx = 1= (In+1 T(y )<p(ry) dU(y)) r",+n-l dr

for <p E V(lffi.n). Thus Ixl"'T (x/lxl) should be given in the general case by
fo= (T, <p(ry))r",+n-l dr.

Theorem 6.7.2
If Re a > -n then this expression defines a distribution on lffi.n that is homo-
geneous of degree a, for any T E V' (sn-l). Conversely, every distribution on
lffi.n that is homogeneous of degree a is of this form,for some T E V' (sn-l).

The situation when Re a ~ -n is more complicated, because the r-integral

may be divergent. Nevertheless, there is a way of interpreting the integral that
makes the analog of the theorem true for all a except a = -n, -n - 1, ... To
102 The Structure of Distributions

see what happens at the exceptional points, let's look at 0: = -no Then the
integral is

1o
00 dr
(T,<p(ry))-
r

which will certainly diverge if (T, <pC ry)) does not tend to zero as r -+ O. But
if T E V' (sn-I) has the property that (T, I) = 0 (1 is the constant function on
sn-I, which is a test function) then (T, <pC ry)) = (T, <pC ry) - <p(0) 1) and since
<p(ry) - <p(0) 1 -+ 0 as r -+ 0 (in the sense of v(sn-I) convergence) we do
have (T, <p(ry)) -+ 0 as r -+ 0, and the integral converges. Thus the theorem
Io
generalizes to say oo (T, <pc ry)) dr / r is a distribution homogeneous of degree
-n provided (T, 1) = O. However, the converse says that every homogeneous
distribution of degree -n can be written as a sum of such a distribution and a
multiple of the delta distribution.
In a sense, there is a perfect balance in this result: we pick up a one- dimen-
sional space, the multiples of 8, which have nothing to do with the sphere; at
the same time we give up a one-dimensional space on the sphere by imposing
the condition (T, 1) = O. An analogous balancing happens at 0: = -n - k, but
it involves spaces of higher dimension.

6.8 Problems
1. Show that supp <p' ~ supp <p for every test function <p on m.I . Give an
example where supp <p' =1= supp <po

2. Show that supp (:x) a T ~ supp (T) for every distribution T.

3. Show that supp (fT) ~ supp T for every Coo function f.
4. Show that In II(x)1 dx = 0 if and only if both In
11+(x)1 dx = 0 and
In If-(x)1 dx = O.
5. Suppose T is a distribution on m.I , and define a distribution S on m.2 by
(S, <p) = (T, <Po) where <Po (x ) = <p(x,O). Show that supp S is contained
in the x-axis. Give an example of a distribution whose support is contained
in the x-axis which is not of this form. (Hint: (a / ay )8.)
6. Let fl and h be continuous functions on m.2 . Show that there exists a
continuous function I such that
Problems 103

7. Let (T, cp) = Iim<-+o r~: + f,OO[cp(x)/xj dx on lffi.1. Show that

8. Give an example of a distribution T on lffi.1 that does not have compact

support but for which T' does have compact support.
9. Define an "integral" IT of a distribution T on lffi.1 as follows: (IT, cp) =
-(T, Icp) where Icp(x) = f~oo cp(y) dy provided f~oo cp(y) dy = 0 (this
condition is needed to have Icp E D), and more generally (IT, cp+acpo) =
-(T,lcp) where cp is as before and CPo is any fixed test function with
f~oo cp(y) dy = I. Show that every test function is of this form and that
IT is a well defined distribution. Show that (d/dx)IT = T. Show
that any other choice of CPo yields a distribution differing by a constant
from IT.
10. Let 'lj;j be any sequence of nonnegative test functions such that L 'lj;j(x)
is everywhere positive. Show that cpj(x) = 'lj;j(x)/ Lk 'lj;k(X) gives a
partition of unity.
11. Let I(x) = sine"'. Show that f'(x) is a tempered distribution, even
though we do not have 1f'(x)1 ~ c(1 + Ixl)N for any c and N.
12. Give the form of the general distribution whose support is the single
point y.
13. Give the form of the general distribution whose support is a pair of points
y and z.
14. Show that there exists a test function cp whose Taylor expansion can be
prescribed arbitrarily at two distinct points y and z.
15. Show that the real and imaginary parts of (x + iy)k are harmonic polyno-
mials on lffi.2, for any positive integer k.
16. Show that if IT = 0 for a Coo function I, then suppT is contained in
the zero-set of I (i.e., {x : I(x) = O}). (Hint: the zero-set is closed.)
17. Show that the distribution

(T, cp) = /-1 +

-00
JOO cp(x) dx +
I Ixl
/1_I
cp(x) - cp(O) dx
Ixl
is not positive. (Hint: choose cp with a maximum at x = 0.)
18. Show that supp I = supp 1+ u supp 1- for continuous f. Does the same
hold even if I is not continuous?
19. Show that (T, cp) ~ 0 if T is a positive distribution of compact support
and cp ~ 0 on a neighborhood of the support of T.
104 The Structure of Distributions

20. Give an example of a test function 'P such that 'P+ and 'P- are not test
functions.
21. Show that any real-valued test function 'P can be written 'P = 'PI - 'P2
where 'PI and 'P2 are nonnegative test functions. Why does not this con-
tradict problem 20?
22. Define the Cantor function f to be the continuous, nondecreasing function
that is zero on ( -00,0], one on [I, 00), and on the unit interval is constant
on each of the deleted thirds in the construction of the Cantor set, filling
in half-way between

FIGURE 6.8

4
(so f{x) = on the first stage middle third [t,~] , f{x) = ~ and on i
the second stage middle thirds, and so on). Show that the Cantor measure
is equal to the distributional derivative of the Cantor function.
23. Show that the total length of the deleted intervals in the construction of
the Cantor set is one. In this sense the total length of the Cantor set is
zero.
24. Give an example of a function on the line for which 11'P1100 :-:; 10-6 ,
I/'P'I/oo :-:; 10- 6 but I/'P"I/oo ::::: 106 . (Hint: Try asinbx for suitable
constants a and b.)
25. Let 'P be a nonzero test function on IJR I . Does the sequence 'Pn(x) =
~'P{x - n) tend to zero in the sense of D? Can you find an example of a
distribution T for which Iim n -too (T, 'Pn) i- O? Can you find a tempered
distribution T with Iim n -too (T, 'Pn) i- O?
26. Let T be a distrbution and 'P a test function. Show that

by bringing the derivative inside the distribution.

27. For Re 0: < 2,0: i- 1, define a distribution Ta concocted from lxi-a by
(Ta, 'P) = 1- +1
-00
1

I
00
'P(x)lxl- a dx

+ f>'P(x) - 'P(O))lxl- a dx + ~~~.

Problems 105

Show that (To.,cp) = f~oocp(x)lxl-o.dx if cp(O) = 0 or if Re a < 1.

Show that T is defined for all test functions and satisfies a bounded ness
condition with m = 1.
28. Show that To. is homogeneous of degree -a. Would the same be true if
we omitted the term 2cp(0)/(1 - a)?
29. Show that the family of distributions To. is analytic as a function of a
(i.e., (To.,cp) is analytic for every test function cp) in Re a < 2,a #-1.
30. Show that if T is a distribution and I a Coo function, the definition
(fT, cp) = (T,lcp) yields a distribution (in the sense that IT satisfies a
boundedness condition).
31. Let (T, cp) = f I(x)cp(x) dx for a function I(x) satisfying I/(x)1 ~ c(1 +
IxI)N. What kind of boundedness condition as a tempered distribution
does T satisfy?
32. Let T be a distribution of compact support and I a polynomial. Show
that T * I is also a polynomial. (Hint: Polynomials can be characterized
by the condition (8/8xt I = 0 for some a.) What can you say about
the degree of the polynomial T * I?
33. Show that (T, cp) = 2.:::"=1 cp(n) is a tempered distribution on ]m.1.
34. Show that every distribution in V'(!1) can be approximated by test func-
tions in V(!1) (you will have to use the fact that there exists a sequence 'lj;j
of test functions in V(!1) such that 'lj;j = 1 on a set K j with !1 = UKj ).
35. For!1 = (0, 1) ~ ]m.1 construct 'lj;j as called for in problem 34.
36. Suppose!1 1 ~ !12. Show that distributions in V'(!12) can be restricted to
distributions in V'(!1I).
37. Give an example of a function in £(0, 1) that is not the restriction of any
function in £ (]m.1 ).
38. Show that the product cpT is well defined for T E V'(!1) and cp E £(!1).
39. Show that differentiation is well defined in V'(!1).
40. Show that distributions on the circle can be identified with periodic dis-
tributions on the line with period 27L
41. Show that if T E V' (sn-I) then

(Xj~
8Xk
-Xk~)
8xj
T

is a well defined distribution in v,(sn-I). Show the same is true for

V'(lxl < 1).
42. Compute the dimension of the space of distributions supported at the origin
that are homogeous of degree -n - k.
Fourier Analysis

7.1 The Riemann-Lebesgue lemma

One of the basic estimates for the Fourier transform we have used is

IJc~)1 ~ ! If(x)1 dx.

J
It shows that j is bounded if If (x) I dx is finite. But it turns out that something
more is true: j(~) goes to zero as ~ -+ 00. What do we mean by this? In one
dimension it means lim€--->+oo j(~) = 0 and lim€--->_oo j(~) = O. In more than
one dimension it means something slightly stronger than the limit of j(~) is
zero along every curve tending to infinity.

Definition 7.1.1
We say afunction g(~) defined on lffi.n vanishes at infinity, written lim~--->oo g(~) =
0, iffor everyE > 0 there exists N such that Ig(~)1 ~ E for alll~1 ~ N.

In other words, g is uniformly small outside a bounded set.

For a continuous function, vanishing at infinity is a stronger condition than
boundedness (if f is continuous then If(x)1 ~ M on Ixl ~ N, and If(x)1 ~ E
on Ixl ~ N). The Riemann-Lebesgue lemma asserts that j vanishes at infinity if
J If (x) Idx is finite. Riemann proved this for integrals in the sense of Riemann,
and Lebesgue extended the result for his more general theory of integration.
Whatever theory of integration you have learned, you can interpret the result in
that theory. We will say that f is integrable if J If(x)1 dx is finite.
Let me sketch two different proofs, omitting certain details that involve inte-
gration theory. The most straightforward proof combines three ideas:

1. We already know the vanishing at infinity for j if f E S, since this implies

jES.

106
The Riemann-Lebesgue lemma 107

2. S is dense in the integrable functions.

3. The property of vanishing at infinity is preserved under uniform limits.

The density of S (or even D) in various distribution spaces was discussed in

6.6. Essentially the same construction, multiplication by a cut-off function and
convolution with an approximate identity, yields the existence of a sequence of
functions f n E S such that

}!..moo Ilfn(x) - f(x)1 dx = 0

for any integrable function f. This is the meaning of (2) above, and we will not
go into the details because it involves integration theory. Notice that by using
our basic estimate we may conclude that in converges uniformly to i,

lim sup lin(~) - i(~)1 = o.

n--->oo €

Thus to conclude the proof of the Riemann-Lebesgue lemma we need to show:

Lemma 7.1.2
If gn vanishes at infinity and gn -+ 9 uniformly, then 9 vanishes at infinity.

The proof of this lemma is easy. Given the error f > 0, we break it in half and
find gn such that Ign(~) - g(~)1 ~ ~ for all ~ (this is the uniform convergence).
Then for that particular gn, using the fact that it vanishes at infinity, we can find
N such that Ign(~)1 ~ ~ for I~I ~ N. But then for I~I ~ N we have

f f
<-+-=f
- 2 2
so 9 vanishes at infinity.
This first proof is conceptually clear, but it uses the machinery of the Schwartz
class S. Riemann did not know about S, so instead he used a clever trick. He
observed that you can make the change of variable x -+ x+7f/~ in the definition
of the Fourier transform (this is in one-dimension, but a simple variant works
in higher dimensions) to obtain

i(O = 1 f(x)e ix € dx

= 1 f(x + 7f/Oe ix €e i7r dx

= - I f(x + 7f/~)eix€ dx.
108 Fourier Analysis

So what? Well not much, until you get the even cleverer idea to average this
with the original definition to obtain

j(~) = ~ /U(X) - f(x + 'Tr/O)eixf. dx.

Now take absolute values to estimate

. 21/
If(OI ~ If(x) - f(x + 7f/01 dx.
What happens when ~ ---) oo? Clearly I ---) 0, and it is not surprising that
7f ~

/If(X) - f(x + 'TrIO 1dx ---) 0

(this again requires some integration theory).
The Riemann-Lebesgue lemma is also true for Fourier sine and cosine trans-
forms, namely

lim /f(X)Sinx. ~ dx = 0
f.->oo
and

lim
f.->oo
/f(x)cOSX.~dX=O
if f is integrable. In fact, these forms follow by considering }(O ± j( -~).
As an application of the Riemann-Lebesgue lemma we prove the equipartition
of energy for solutions of the wave equation. Remember that in section 5.3 we
showed that the energy

was a conserved quantity for solutions of the wave equation Utt - k 2 D. x u = 0,

where the first integral is interpreted as kinetic energy and the second as potential
energy. The equipartition says that as t ---) ±oo, the kinetic and potential
energies each tend to be half the total energy. To see this we repeat the argument
we used to prove the conservation of energy, but retain just one of the two terms,
say kinetic energy. This is equal to

by the Plancherel formula, and using the formula

• • sin ktl~1
Fxu(~, t) = f(O cos ktl~1 + g(O kl~1
The Riemann-Lebesgue lemma 109

for the solution to the wave equation we compute

j :tFxU(~, t)j2 == el~12 sin2 ktl~lli(OI2

+ cos2 ktl~1 Ig(~W
- kl~1 sin ktl~1 cos ktl~1 (i(~)g(~) + i(~)g(~))

== ~el~12Ii(~W - ~k21~12cos2ktl~lli(~W
+ ~lg(~)12 + ~ cos2ktl~llg(OI2
~ ~kl~1 sin 2ktl~1 (i(~)g(~) + i(Og(~))
using the familiar trigonometric identities for double angles. The point is that
when we take the ~-integral we find that the kinetic energy is equal to the sum
of

which is exactly half the total energy, plus the terms

~ (2!)n [/ cos2ktl~llg(~)12 d~
- e / cos 2ktl~1 11~liW 12 d~
-k / sin 2ktl~1 (1~li(Og(~) + 1~li(~)g(~)) d~]
which we claim must go to zero as t -+ ±oo by the Riemann-Lebesgue lemma.
This argument is perhaps simpler to understand if we consider the case n == 1
first. Consider the term

/ cos2ktl~IIY(OI2~.
We may drop the absolute values of I~I because cosine is even. Now the
requirement that the initial energy is finite implies J Ig(OI2 ~ is finite, hence
Ig(OI2 is integrable. Then limt-+±oo J cos 2kt~lg(~)12 d~ = 0 is the Riemann-
Lebesgue lemma for the Fourier cosine transform of Ig(~W. Similarly, we
must have la(~)12 integrable, hence the vanishing of J cos 2kt~la(~W d~ as
110 Fourier Analysis

t ~ ±oo. Finally, the function a(Og(~) is also integrable (this follows from
the Cauchy-Schwartz inequality for integrals,

J IF(x)G(x)1 dx ~ (J IF(xW dX) 1/2 (J IG(xW dX) 1/2,

which is one of the standard estimates of integration theory), and so

J sin 2ktl~1 (1~li(~)g(O) d~ = J sin 2kt~ (a(~)g(~)) d~

vanishes as t ~ ±oo.
The argument in lffi.n is similar. In fact, we appeal to the one-dimensional
Riemann-Lebesgue lemma! The idea is to do the integration in polar coordinates,
so that, for example,

r
JlRn cos2ktl~llg(~Wd~ = Jo{'X> cos2ktr J (r Sn-I
Ig(rw)1 2 d!.JJ) r n - I dr.

The fact that f Ig(~W d~ is finite is equivalent to r n - I fsn-I Ig(rwW d!.JJ being
integrable as a function of the single variable r, and then the one-dimensional
Riemann-Lebesgue lemma gives

The other terms follow by the same sort of reasoning.

The Reimann-Lebesgue lemma is a qualitative result: it tells you that i
vanishes at infinity, but it does not specify the rate of decay. For many purposes
it is desirable to have a quantitative estimate, such as li(~)1 ~ cl~l-a for
I~I 2: 1 for some constants c and a, or some other comparison li(OI ~ ch(~)
where h is an explicit function that vanishes at infinity. It turns out, however,
that we cannot conclude that any such estimate holds from the mere fact that
f is integrable; in fact, for any fixed function h vanishing at infinity, it is
possible to find f which is continuous and has compact support, such that
li(OI ~ ch(~) for all ~ does not hold for any constant c (this is the same as
i(
saying the ratio 0/ h( 0 is unbounded). This negative result is rather difficult
to prove, but it is easy to give examples of integrable functions that do not satisfy
li(~)1 ~ cl~l-a for all I~I 2: 1 for any constant c, where a > 0 is specified
in advance. Recall that we showed the Fourier transform of the distribution
Ixl-!9 was equal to a multiple of 1~1!9-n, for 0 < (3 < n (the constant depends
on (3, but for this argument it is immaterial). Now Ixl-!9 is never integrable,
because we need (3 < n for the finiteness of the integral near zero and (3 > n
for the finiteness of the integral near infinity. Since it is the singularity near
zero that interests us, we form the function f(x) = Ixl-!ge-lxI2/2. Multiplying
by the Gaussian factor makes the integral finite at infinity, regardless of (3. Thus
our function is integrable as long as (3 < -no Now the Fourier transform is a
multiple of the convolution of 1~1!9-n and e-leI2/2. Since these functions are
The Riel1Ulnn-Lebesgue reml1Ul III

pOSItive, it is easy to show that their convolution decays exactly at the rate
1~1,8-n. In fact we have the estimate

1(0 = C/ I~ _1JI,8-ne-11112 /2 d1J

~ C r
11111'5.1
I~ _1JI,8-ne-11112/2 d1J

~ C r I~ _1JI i3 -n e -I/2 d1J

11111'5. 1
~ c'(I~I- l)i3- n if I~I ~ 2.

Once we take (3 to satisfy 0: - n < (3 < n we have our counterexample to the

estimate

These examples (and the general negative result) show that the Riemann-
Lebesgue lemma is best possible, or sharp, meaning that it cannot be improved,
at least in the direction we have considered. Of course all such claims must be
taken with a grain of salt, because there are always other possible directions in
which a result can be improved. It is also important to realize that it is easy to put
conditions on I that will imply specific decay rates on the Fourier transform;
however, as far as we know now, all such conditions involve some sort of
smoothness on I, such as differentiability or HOlder or Lipschitz conditions.
The point of the Riemann-Lebesgue lemma is that we are only assuming a
condition on the size of the function.
Another theorem that relates the size of a function and its Fourier transform
is the Hausdorff-Young inequality, which says

(/ Il(~W d~) I/q ~c p (/ I/(x)IP dX) I/p

whenever I/IP is integrable, where p and q are fixed numbers satisfying ~ + = t

1, and also 1 < p ~ 2. In fact it is even known what the best constant cp is (this
is Beckner's inequality, and you compute the rather complicated formula for cp
by taking I(x) to be a Gaussian and make the inequality an equality). Note that
a special case of the Hausdorff-Young inequality, p = q = 2, is a consequence
of the Plancherel formula; in fact in this case we have an equality rather than
an inequality. Also, the estimate 11(01 ~ J I/(x)1 dx is the limiting case of
the Hausdorff-Young inequality as p ---4 1. The proof of the Hausdorff-Young
inequality is obtained from these two endpoint results by a method known as
"interpolation," which is beyond the scope of this book.
112 Fourier Analysis

7.2 Paley-Wiener theorems

We have seen that there is a relationship between decay at infinity for a function
(or tempered distribution) and smoothness of the Fourier transform. We may
think of compact support as the ultimate in decay at infinity, and analyticity as
the ultimate in smoothness. Then it should come as no surprise that a function
(or distribution) of compact support has a Fourier transform that is analytic. We
use the term "Paley-Wiener theorem" to refer to any result that characterizes
support information about f in terms of analyticity conditions on j (note that
some Paley-Wiener theorems concern supports that are not compact). Generally
speaking, it is rather easy to pass from support information about f to analyticity
conditions on j, but it is rather tricky to go in the reverse direction.
You can get the general idea rather quickly by considering the simplest ex-
ample. Let f be a function on JR', and let's ask if it is possible to extend the
definition of j from JR to C so as to obtain a complex analytic function. The ob-
vious idea is to replace the real variable ~ with the complex variable ( = ~ + iTf
and substitute this in the definition:

j(~ + iTf) = i: eix (Hi7]) f(x) dx

= i: eiX~e-X7]f(x)dx

We see that this is just the ordinary Fourier transform of the function e- X 7] f(x).
The trouble is, if we are not careful, this may not be well defined because e- X 7]
grows too rapidly at infinity. But there is no problem if f has compact support,
for e- X 7] is bounded on this compact set. Then it is easy to see that j( () is
analytic, either by verifying the Cauchy-Riemann equations or by computing
the complex derivative

(d~) j(() = i: ixe ixC f(x) dx

= .rx(ixe- X7]f(x))(O·
In fact, the same reasoning shows that if f is a distribution of compact support,
we can define j( () = (j, eixC ) and this is analytic with derivative (d/ d() j( () =
(j, ixe ixC ).
What kind of analytic functions do we obtain? Observe that j(() is an
entire analytic function (defined and analytic on the entire complex plane). But
Paley-Wiener theorems 113

not every entire analytic function can be obtained in this way, because i( ()
satisfies some growth conditions. Say, to be more precise, that f is bounded
and supported on Ixl :S A. Then

li«()1 = Ii: f(x)eiX~e-XT/ dxl :S eA1T/1 i : If(x)1 dx = ceA1T/1

because e-XT/ achieves its maximum value eA1T/1 at one of the endpoints x = ±A.
Thus the growth of li«()1 is at most exponential in the imaginary part of (.
Such functions are called entire functions of exponential type (or more precisely
exponential typeA). Note that it is not really necessary to assume that f is
bounded-we could get away with the weaker assumption that f is integrable.
We can now state two Paley-Wiener theorems.

Theorem 7.2.1
p.w. 1: Let f be supported in [-A, A] and satisfy J~A If(xW dx < 00.
Then i(O is an entire function of exponential type A, and J~oo li(~Wd~ <
00. Conversely, if F( () is an entire function of exponential type A, and
i
J~oo IF( ~W d~ < 00, then F = for some such function f.

Theorem 7.2.2
p.w. 2: Let f be a Coo function supported in [-A, A]. Then i«()
is an entire
function of exponential type A, and i(~) is rapidly decreasing, i.e., li(~)1 :S
cN(I + IW- N for all N. Conversely, if F«) is an entire function of exponential
i
type A, and F( 0 is rapidly decreasing, then F = for some such function f.

One thing to keep in mind with these and other Paley-Wiener theorems is
that since you are dealing with analytic functions, some estimates imply other
estimates. Therefore it is possible to characterize the Fourier transforms in
several different, but ultimately equivalent, ways.
The argument we gave for i( () to be entire of exponential type A is valid
under the hypotheses of P.W. 1 because we have

A ( A ) 1/2 (A ) 1/2
iA If(x)1 dx:S iA If(xWdx iA 1 dx

A ) 1/2
:S (2A)I/2 ( iA If(xW < 00

by the Cauchy-Schwartz inequality. Of course J~oo li(~W d~ < 00 by the

Plancherel formula, so the first part of P.w. 1 is proved. Similarly, we have
i
proved the first part of P.w. 2, since f E V ~ S so E S is rapidly decreasing.
The converse statements are more difficult, and we will only give a hint of the
114 Fourier Analysis

proof. In both cases we know by the Fourier inversion formula that there exists
I(x), namely

I(x) = J-.jOO F(Oe- ix € ~

271"-00

such that 1(0 = F(~). The hard step is to show that I is supported in [-A, A].
In other words, if Ixl > A we need to show I(x) = o. To do this we argue
that it is permissible to make the change of variable ~ -+ ~ + iT/ in the Fourier
inversion formula, for any fixed T/. Of course this requires a contour integration
argument. The result is

for any fixed T/. Then we let T/ go to infinity; specifically, if x > A, we let
T/ -+ -00 and if x < -A we let T/ -+ +00, so that eX'" goes to zero. The fact
that F is of exponential type A means F(~ + iT/) grows at worst like eA1.,,1, but
eX'" goes to zero faster, so

and hence I(x) = 0 as required. This is of course an oversimplification of the

argument, because we also have the ~ integral to contend with.
In describing the Paley-Wiener theorem characterizing Fourier transforms of
distributions of compact support, we need a growth estimate that is a little more
complicated than exponential type, because these Fourier transforms may have
polynomial growth in the real variable-for example, they may be polynomials.

Theorem 7.2.3
P. W. 3: Let I be a distribution supported in [-A, A]. Then 1 is an entire
function satisfying a growth estimate

for some c and N. Conversely, if F is an entire function satisfying such a

growth estimate then it is the Fourier transform of a distribution with support
in [-A, A].

As an example, take I = b'(x - A). Then we know l(~) = _i~eiA€, hence

l( 0 = _i(e iA ( (since it is analytic) and this satisfies the growth estimate with
N=1.
The argument for P.W. 3 is a bit more complicated than before. Suppose we
choose a cut-offfunction 'IjJ that is identically one on a neighborhood of [-A, A]
and is supported in [-A - f, A + fl. Then 1(0 = (t, 'IjJ(x)e ix () and we can
Paley-Wiener theorems 115

apply the bounded ness estimate discussed in section 6.5 to obtain

Since differentiation of eix ( produces powers of ( up to N, and leix(1 = e- X1J

is bounded by e(A+<)I1J1 on the support of 'Ij;, we obtain the estimate

which is not quite what we wanted, because of the f. (We cannot eliminate f
by letting it tend to zero because the constant c depends on f and may blow
up in the process.) The trick for eliminating the f is to make f and 'Ij; depend
on 'TJ, so that the product fl'TJl remains constant. The argument for the converse
involves convolving with an approximate identity and using P.W. 2.
All three theorems extend in a natural way to n dimensions. Perhaps the
only new idea is that we must explain what is meant by an analytic function of
several complex variables. It turns out that there are many equivalent definitions,
perhaps the simplest being that the function is continuous and analytic in each
of the n complex variables separately. If f is a function on lRn supported in
Ixl ::; A, then the Fourier transform extends to en ,

as an entire analytic function, and again it has exponential type A, meaning

since e- x .1J has maximum value e AI1J1 on Ixl ::; A (take x = -A'TJ/I'TJI). Then
P.w. 1, 2, 3 are true as stated with the ball Ixl ::; A replacing the interval
[-A,A].
For a different kind of Paley-Wiener theorem, which does not involve compact
support, let's return to one dimension and consider the factor e- X1J in the formula
for j«). Note that this either blows up or decays, depending on the sign of
X'TJ. In particular, if 'TJ ~ 0, then e- X1J will be bounded for x ~ o. So if f
is supported on [0,00), then j( () will be well defined, and analytic, on the
upper half-space 'TJ > 0 (similarly for f supported on (-00, O} and the lower
half-space 'TJ < 0).

Theorem 7.2.4
P. w: 4 Let f be supported on [0, 00) with It W
If (x dx < 00. Then j «) is an
analytic function on 'TJ > 0 and satisfies the growth estimate

supfoo Ij(~ + i'TJW d~ < 00.

1J>0 -00
116 Fourier Analysis

Furthermore, the usual Fourier transform j(O is the limit of j(~ + i1]) as
1]
--4 0+ in the following sense:

Conversely, suppose F( 0 is analytic in 1] > 0 and satisfies the growth estimate

sup
'7>0
1 If(~ + i1])1 ~ <
00

-00
2 00.

Then F = j for such a function f.

As an application of the Paley-Wiener theorems (specifically P.w. 3), let's

show that the wave equation in n dimensions has the same maximum speed of
propagation that we observed in section 5.3 for n = 1,2,3. This amounts to
showing that the distributions

and F- 1 (Sin ktl~l)

kl~1

are supported in Ixl :::; kltl. To apply P.w. 3 we need first to understand how
the functions cos ktl~1 and sin ktl~l/kl~1 may be extended to entire analytic
functions. Note that it will not do to replace I~I by 1(1 since 1(1 is not analytic.
The key observation is that since both cos z and sin z I z are even functions of
z, we can take cos v'Z and sin v'ZI v'Z and these will be entire analytic functions
of z, even though v'Z is not entire (it is not single-valued). Then the desired
analytic functions are

sin kt((. ()1/2

cos kt( ( . ()1/2 and
k((- ()1/2 .

Indeed they are entire analytic, being the composition of analytic functions «( . (
is clearly analytic), and they assume the correct values when ( is real.
To apply P.W. 3 we need to estimate the size of these analytic functions; in
particular we will show IF(()I :::; ce ktl '7l for each of them. The starting point is
the observation

and similarly

~ iv) I : :; e1vl
Isin(u
u+zv
The Poisson summation formula 117

(this requires a separate argument for lu + ivl :::; 1 and lu + ivl ~ 1). To apply
these estimates to our functions we need to write ((. ()1/2 = U + iv (this does
not determine u and v uniquely, since we can multiply both by -1, but it does
determine Ivl uniquely), and then we have IF(()I :::; celktvl for both functions.
To complete the argument we have to show Ivl :::; 1171. This follows by algebra,
but the argument is tricky. We have ( . ( = (u + iv?, which means

u2 _ v 2 = 1~12 _ 11712
uv = ~·17

by equating real and imaginary parts, and of course I~ . 171 :::; I~I 1171 so we can
replace the second equation by the inequality

Using the first equation to eliminate u 2 , we obtain the quadratic inequality

Completing the square transforms this into

and taking the square root yields v 2 :::; 11712 as desired.

So P.w. 3 says that F-I(COS ktlW and F- 1 (sin ktl~l/ kl~1) are distributions
supported in Ixl :::; kt, confirming our calculations for n = 1,2,3. In fact, in all
cases these distributions have been calculated explicitly, but the same argument
works for more general hyperbolic differential equations.

7.3 The Poisson summation formula

We have computed the Fourier transform of 8 to be the function that is identi-
cally 1. We can also compute the Fourier transform of any translate of 8,

For simplicity let's assume n = 1. Let y vary over the integers (call it k) and
sum:
118 Fourier Analysis

The sum on the left (before taking the Fourier transform) defines a legitimate
tempered distribution, which we can think of as an infinite "comb" of evenly
spaced point masses. The sum on the right looks like a complicated function.
In fact, it isn't. What it turns out to be is almost exactly the same as the sum
on the left! This incredible fact is the Poisson summation formula.
Of course the sum L:~-oo eik € does not exist in the usual sense, so we will
not discover the Poisson summation formula by a direct attack. We will take a
more circular route, starting with the idea of periodization. Given a function f
on the line, we can create a periodic function (of period 27r) by the recipe
00

Pf(x) = L f(x - 27rk)

k=-oo

assuming f has compact support or decays rapidly enough at infinity.

The key question to ask is the following: what is the relationship between
the coefficients of the Fourier series expansion of Pf, Pf(x) = ~ckeikx and
the Fourier transform of f? Once you ask this question, the answer is not hard
to find. We know

Ck = ~ (''' Pf(x)e- ikx dx.

27r io
If we substitute the definition of P f, and interchange the sum and the integral
(this is certainly valid if f has compact support), we find

1 00 f2rr .
Ck = 27r . L io f(x - 27rj)e-<kx dx
J=-OO 0

1 j-2rr(j-I)
L
00

= -2 f(x)e- ikx dx
7r j=-oo -2rrj

= -1
27r
1 00

-00
f(x)e-<kx
. dx

1 A

= -f(-k)
27r

since e-2rrikj = 1. So the Fourier series coefficients of P f are essentially

obtained by restricting j to the comb.
The Poisson summation formula now emerges if we compute Pf(O) in two
ways, from the definition, and by summing the Fourier series:

00 I 00

L f(27rk) = -
27r
L j(k).
k=-oo k=-oo
The Poisson summation fonnula 119

The left side is just

and the right side is

for! E S, so we have

In n dimensions the argument is similar. Let zn

denote the lattice of points
(k, , ... , kn ) with integer coordinates. Then we have

and

L !(21fk) = (2~)n L j(k).

kEZ n kEZ n

For which functions ! does this second form hold? (Of course this is the
form that Poisson discovered more than a century before distribution theory.)
Certainly for! E S, but for many applications it is necessary to apply it to more
general functions, and this usually requires a limiting argument. Unfortunately,
it is not universally valid; there are examples of functions ! for which both
series converge absolutely, but to different numbers. A typical number theory
application involves taking j to be the characteristic function of the ball Ixl ~
R. Then the right side gives the number of lattice points inside the ball or,
equivalently, the number of representations of all integers ~ R2 as the sum of
n squares. This number is, to first-order approximation, the volume of the ball.
By using the Poisson summation formula, it is possible to find estimates for the
difference.
The Poisson summation formula yields all sorts of fantastic results simply by
substituting functions ! whose Fourier transform can be computed explicitly.
For example, take a Gaussian:

This is an important identity in the theory of ()-functions.

120 Fourier Analysis

I would like to consider now a crystalographic interpretation of the Poisson

summation formula. We can regard L:kEZn 8(x - k) as a model of a cubic
crystal, each term 8 (x - k) representing an atom located at the lattice point
k. The Fourier transform is then a model of the output of an X-ray diffraction
experiment. Such experiments do in fact produce the characteristic pattern of
bright spots on a cubic lattice, even though the real crystal is of finite spatial
extent. Since not all crystals are cubic, this point of view leads us to seek a
slight generalization of the Poisson summation formula: we want to compute
the Fourier transform of L:'YEr 8(x -1')where r is an arbitrary lattice in lffi.n •
By definition, a lattice is a set represented by l' = kl VI + ... + kl Vn where k j
are integers and VI, ... , V n , are linearly independent vectors in lffi.n • From this
description we see that r = AZ n where A is an invertible n x n matrix. (A small
point that may cause confusion is that the vectors VI, ... ,Vn are not unique, nor
is the matrix A. For example, the lattice in lffi.2 generated by VI = (0,1) and
V2 = (1,1) is the same as Z2.)
Given any lattice r, the dual lattice r ' is defined to be the set of vectors
x E lffi.n such that x . , = 27r' integer for every l' E r. For example, if r = zn,
then r ' = 27rzn. More generally, if r = Azn then r ' = 27r(A tr)-lzn since
(Atr)-I x · Ay = x· y. We claim that

F (L'YEr
8(x -1')) = cr L
1" Ef'
8(x - ,')

is the Poisson summation formula for general lattices, where the constant cr
can be identified with the volume of the fundamental domain of The idea r/.
is that we have

F(L8(X-,)) (~)=Leih
'YEr 'YEr
= L ei€.Ak = L ei(A1r€).k

kEZ n kEZ n

= (27r)n L 8(Atr~ - k)
kEZ n

by the usual Poisson summation formula. But 8(Atr~ - k) = I detAI-18(~ -

(Atr)-I k), which proves our formula with cr = (27r)nl detAI- I .
We can now give a brief introduction to the notion of quasicrystals. The
idea is that there exist nonperiodic arrangements of atoms that produce discrete
Fourier transforms. The particular arrangement we describe is obtained by strip-
projection from a higher dimensional lattice. To be specific, take a lattice r in
lffi.2 obtained from Z2 by rotation through an "irrational" angle (more precisely,
it is the tangent of the angle that must be an irrational number). We take a strip
The Poisson suml1Ultion formula 121

Iyl : : ; b parallel to the x-axis and then project the lattice points in this strip onto
the x-axis; shown in the figure .

•
•
• •
•
•

• •
•
•
FIGURE 7.1

,2 _
The result is just L:~ I<b 8(x - 'I) where, = (,I, '2) varies over r. This
is clearly a sum of iscrete "atoms," and it will not be periodic. The Fourier
transform is I:1r219 e irle and this can be given a form similar to the Poisson
summation formula. We start with I:r eirleeir2'1 = (271"? I:r' 8(~ -,;,17 -
,~),
multiply both sides by a function g(17), and integrate to obtain

Clearly we want to choose the function g such that g(t) = x(ltl : : ; b). Since
we know J~b eits dt = 2 sin sb/ s it follows from the Fourier inversion formula
that g( s) = (sin sb) / 71" s is the correct choice. This yields

.E eirl e = 471" .E sin ~~ 8 (~ - ,;)

1r21:'S:b r"2

as the Fourier transform. This is a weighted sum of 8-functions centered at the

points,; which are the first coordinates of the dual lattice.
Despite the superficial resemblance, there are some differences between this
quasi-periodic Poisson summation formula and the usual one. The main differ-
ence is that the points ,; are not isolated; in fact they form a dense subset of
the line. Of course the contribution from lattice points with large ,~ will be
multiplied by the small coefficient (sin b,Dh~, so we can imagine the Fourier
122 Fourier Analysis

transform as a discrete set of relatively bright stars amid a Milky Way of stars
too dim to perceive individually. The trouble is that this image is not quite
accurate, because of the slow decay of g( s). The sum defining the Fourier
transform is not absolutely convergent, so our Milky Way is not infinitely bright
only because of cancellation between positive and negative stars. This difficulty
can be overcome by choosing a smoother cut-off function as you approach the
boundary of the strip. This leads to a different choice of g, one with faster
decay.

7.4 Probability measures and positive definite functions

A probability measure J.t on ~n is an assignment of a probability to all reasonable
subsets of lRn which is positive and additive: 0 ~ J.t(A) ~ I and J.t(A U B) =
J.t(A) + J.t(B) if A and B are disjoint. In addition we require the conditions
J.t( ¢) = 0, J.t(~n) = 1, where ¢ denotes the empty set, and a technical condition
called countable additivity:

if the sets Aj are disjoint. All the intuitive examples of probability distributions
on lRn satisfy these conditions, although it is not always easy to verify them.
f f
Associated to a probability measure J.t is an integral f dJ.t or f(x) dJ.t(x)
defined for continuous functions (and more general functions as well) which
gives the "expectation" of the function with respect to the probability measure.
For continuous functions with compact support, the integral may be formed in
the usual way, as the limit of sums I: f (x j ) J.t( I j ) where x j E I j and {Ij } is a
partition of ~n into small boxes. Another way of thinking about it is to use a
"Monte Carlo" approximation: choose points YI, Y2, . .. at random according to
the probability measure and take IimN-+oo k (J(yJ) + ... + f(YN)).
A probability measure can be regarded as a distribution, namely (J.t, r.p) =
f r.p dJ.t, and as we have seen in section 6.4, it is a positive distribution, meaning
f
(J.t, r.p) ~ 0 if r.p ~ O. The condition J.t(lRn ) = 1 is equivalent to I dJ.t = 1,
which we can write (J.t, 1) = 1 by abuse of notation (the constant function 1 is not
in V, so (J.t, 1) is not defined a priori). Of course this means Iimk-+oo (J.t, 'f/;k) = 1
where 'f/;k is a suitable sequence of test functions approximating the constant
function 1. The converse statement is also true: any positive distribution
with (J.t,I) = 1 is associated to a probability measure in this way. If f
is a positive, integrable function with f f(x) dx = 1 (ordinary integration),
then there is an associated probability measure, denoted f(x) dx, defined by
J.t(A) = fA f(x) dx, whose associated integral is just f r.pdJ.t = f r.p(x)f(x) dx.
Such measures are called absolutely continuous. Other examples are discrete
Probability measures and positive definite functions 123

measures fL = 2: PjO(x - aj) where {Pj} are discrete probabilities

( 0 '5: Pj '5: I and I>j = 1)

distributed on the points aj (the sum may be finite or countable) and more exotic
Cantor measures described in section 6.4.
Probability measures have Fourier transforms (the associated distributions are
tempered), which are given as functions directly by the formula

These functions are continuous (even uniformly continuous) and bounded; in

fact,

J
IP(x)1 :::; 1 dfL(X) = 1

and

P(O) =J I dfL(X) = 1.
If fL = f dx then P = j in the usual sense. In general, P does not vanish at
infinity, as for example '6 = 1.
It turns out that we can exactly characterize the Fourier transforms of proba-
bility measures. This characterization is called Bochner s theorem and involves
the notion of positive definiteness. Before describing it, let me review the anal-
ogous concept for matrices.
Let Ajk denote an N x N matrix with complex entries. Associated to this
matrix is a quadratic form on eN, which we define by (Au, u) = 2: j ,k Ajkujuk
for u = (u I, ... , UN) a vector in eN. We say the matrix is positive definite if
the quadratic form is always nonnegative, (Au, u) ~ 0 for all u. Note that, a
priori, the quadratic form is not necessarily real, but this is required when we
write (Au, u) ~ O. Also note that the term nonnegative definite is sometimes
used, with "positive definite" being reserved for the stronger condition

(Au, u) >0 if u =1= o.

For our purposes it is better not to insist on this strict positivity. Also, it is
sometimes assumed that the matrix is Hermitian, meaning Ajk = Akj. In the
situation we are considering this will always be the case, although it is not
necessary to insist on it. Now we can define what is meant by a positive definite
function on ~n •

Definition 7.4.1
A continuous (complex-valued) function F on ~n is said to be positive definite
if the matrix Ajk = F(xj - Xk) is positive definite for any choice of points
124 Fourier Analysis

XI, ... , XN in m.n , and any N. Specifically, this means

L LF(xj - Xk)UjUk ~ 0
j k

for any choice of U E eN.

Now we claim that the Fourier transform of a probability measure is a positive

definite function. This is very easy to see, since

LLiL(~j -~k)UjUk = LL (/eiX'(~j-~k)df1(X)) UjUk

j k j k

= f (~~e;"'.je-.. "U') dp(x)

= f I~ e-;""U, I' dp(x)

and the integral of a nonnegative function is nonnegative (note that it might be

zero if the probability measure is concentrated on the points where the function
Lk e-iX'~kuk vanishes). Thus we have established the easy half of

Theorem 7.4.2
Bochner's Theorem: A function F is the Fourier transform of a probability
measure, if and only if
1. F is continuous.
2. F(O) = 1.
3. F is positive definite.

To prove the converse, we first need to transform the positive definite con-
dition from a discrete to a continuous form. The idea is that, in the condition
defining positive definiteness, we can think of the values Uj as weights asso-
ciated to the points Xj. Then the sum approximates a double integral. The
continuous form should thus be

!! F(x - y)cp(x)cp(y) dxdy ~0

for any test function cp E V. A technical lemma, whose proof we will omit,
asserts that this condition is in fact equivalent to positive definiteness (for con-
tinuous functions).
So let us start with a continuous, positive definite function F . .The positive
definite condition implies that F is bounded, so that F = T for some tempered
distribution T. We are going to prove that T is in fact a positive distribution.
Probability measures and positive definite junctions 125

In fact, the continuous form of positive definiteness says exactly (F * 'P, <p) ~ 0
for every 'P E V, and the same holds for every 'P E S because V is dense in S.
Now a change of variable lets us write this as

(F,(jH 'P) ~ 0

where rp(x) = 'P(-x). Since F = T we have

So the positive definiteness of F says that T is nonnegative on functions of

the form IciW for all 'P E S, which is the same a 1'P12 for all 'P E S, since the
Fourier transform is an isomorphism of S.
We have almost proved what we claimed, because to say T is a positive
distribution is to say (T, '1jJ) ~ 0 for any test function '1jJ that is nonnegative. It
is merely a technical matter to pass from functions of the form 1'P12 to general
nonnegative functions. Once we know that T is a positive distribution, by the
results of section 6.4 it comes from a positive measure fL (not necessarily finite,
however). From T(O) = 1 and T continuous we conclude that fL(Im.n) = T(O) =
1, so fL is a probability measure.
Positive definiteness is not always an easy condition to verify, but here is one
example where it is possible: take F to be a Gaussian. Since we know F is the
Fourier transform of a Gaussian, which is a positive function, we know from
Bochner's theorem that e-tlxl2 is positive definite. We can verify this directly,
first in one dimension. Then

LLF(Xj -Xk)UjUk = LLe-tlxj-XkI2UjUk

j k j k

L L e-tx;e-txioe2tXjxUUjUk.
j k

The term e2tXjXk seems to present difficulties, so let's expand it in a power

series,

When we substitute this into our computation we can take the m-summation
out front to obtain
126 Fourier Analysis

which is clearly positive. The argument in n dimensions is similar, but we have

to first break up e2tX i"Xk into a product of n component functions (the notation
is awkward here, since Xj and Xk are vectors in lffi.n) and then take the power
series expansion of each exponential factor.

7.5 The Heisenberg uncertainty principle

One of the most famous and paradoxical predictions of quantum theory is the
statement that it is impossible to determine simultaneously both the position and
momentum of a particle with extreme accuracy. The quantitative version of this
statement is the Heisenberg uncertainty principle, which may be formulated as
an inequality in Fourier analysis (it can also be expressed in terms of opera-
tor theory, as we will discuss later). To describe this inequality, let us recall
briefly the basic principles of quantum mechanics. A particle is described by
a wave function '1jJ(x), which is a complex-valued function on lffi.n satisfying
f 1'1jJ(x)12 dx = 1 (strictly speaking, the wave function is only determined up
to a phase, i.e., '1jJ(x) and ei D:'1jJ(x) represent the same particle). The proba-
bility measure 1'1jJ(x)12 dx gives the probability of finding the particle in some
region of space: prob (particle lies in A) = fA 1'1jJ(x)12 dx. The mean position
of the particle is then x, given by Xj = f Xj 1'1jJ(x)12 dx, and the variance of the
probability measure,

is a good index of the amount of "spread" of the measure, hence the "uncer-
tainty" in measuring the position of the particle. We will write Var( '1jJ) to indicate
the dependence on the wave function. An important but elementary observation
is that the variance can be expressed directly as

without first defining the mean. To see this, write y = x + z. Then

I Ix - x - zI21'1jJ(xW dx = I Ix - xI21'1jJ(xW + Izl2 I 1'1jJ(xW dx

-2 I(X - x)· zl'1jJ(x)1 2dx.

We claim the last integral vanishes, because (x - x) . z = E7=, (Xj - Xj )Zj and
f(Xj - Xj )1'1jJ(xW dx = 0 by the definition of Xj (remember f 1'1jJ(xW dx = 1).
The Heisenberg uncertainty principle 127

Thus

fix - x- ZI21?p(x)12 dx = Var(?p) + IzI2

and the minimum is clearly assumed when z = o.
In quantum mechanics, observables are represented by operators (strictly
speaking, Hermitian operators). If A is such an operator, then (A?p,?p) =
J A?p(x)?p(x) dx l represents the expected value of the corresponding observ-
able on the particle represented by ?p. In other words, this is the average value
you would actually measure for the observable on a collection of identical par-
ticles described by ?p. If we write A = (A?p,?p), then

Var(A) = ((A - A)?p, (A - A)?p) = fIA?P(x) - A?p(xW dx

represents the variance, or uncertainty, in the measurement of A. For the observ-

able "position," the operator A is multiplication by x (actually this is a vector
of operators, since position is a vector observable). You can easily verify that
A and Var(A) correspond to x and V(?p), which we defined before.
For the observable "momentum," the operator is a multiple of

We will not be concerned here with the constants in the theory, including the
famous Planck's constant. For this operator, the computation of mean and
variance can be moved to the Fourier transform side (this was the way we
described it in section 5.4). For the mean, we have

f i:~ (x)?p(x) dx = (2~)n f (i :~) (O~(~) d~

= f
(2~)n ~jl~(~Wd~
by the Plancherel formula and the fact that (a?p/aXj)(~) = -i~j~W. Note
that the constant (27r)-n is easily interpreted as follows: (27r)n = J 1~(~)12~,
so the function (27r) -n/2~(~) has the correct normalization

and so the mean of the momentum observable for ?p is the same as the mean
of the position variable for (21f)-n/2~. Similarly, the variance for momentum
is Var((27r)-n/2~).

I Note that in this section we are using the complex conjugate in the inner product, contrary to
previous usage.
128 Fourier Analysis

By choosing 'Ij; very concentrated around x, we can make Var('Ij;) as small as

we please. In other words, quantum mechanics does not preclude the existence
of particles whose position is very localized. Similarly, there are also particles
whose momentum is very localized (-j; is concentrated). What the Heisenberg
uncertainty principle asserts is that we cannot achieve both localizations simul-
taneously for the same wave function. The reason is that the product of the
variances is bounded below by n 2 /4; in other words,

Var('Ij;)Var«27r)-n/2-j;) ~ n 2 /4.

Thus if we try to make Var('Ij;) very small, we are forced to make Var«27r)-n/2-j;)
large, and vice versa.
To prove the Heisenberg uncertainty principle, consider first the case n = 1.
Let us write A for the operator i( d/ dx) and B for the operator of multiplication
by x. Note that these operators do not commute. If we write [A, Bj = AB - BA
for the commutator, then we find

[A, Bj'lj; = i d~ (x'lj;) - xi d~ 'Ij; = i'lj;,

or simply -irA, Bj = I (the identity operator). Also, both operators are Her-
mitian, (A'Ij;" 'lj;2) = ('Ij;" A'Ij;2) by integration by parts (the i factor produces a
minus sign under complex conjugation, canceling the minus sign from integra-
tion by parts), and similarly

The commutation identity thus yields

1 = -i(AB - BA)'Ij;, 'Ij;) = -i(B'Ij;, A'Ij;) + i(A'Ij;, B'Ij;)

= 2Re(i(A'Ij;,B'Ij;)).

Then the Cauchy-Schwartz inequality gives

(A'Ij;, A'Ij;)(B'Ij;, B'Ij;) ~~

which is exactly the Heisenberg uncertainty principle in the special case where
the means A and 13 are both zero, for then (A'Ij;, A'Ij;) = J
IA'Ij;12 dx is the
variance of momentum, and (B'Ij;, B'Ij;) = J 1B'Ij; 12 dx is the variance of position.
However, the general case is easily reduced to this special case. One way to
see this is to observe that translating 'Ij; moves the mean position without chang-
ing the momentum calculation (-j; is multiplied by an exponential of absolute
value one, so 1-j;1 2 is unchanged). Similarly, we may translate -j; (multiply 'Ij;
The Heisenberg uncertainty principle 129

by e- iax ) without changing the position computation. Thus applying the above
argument to e- iax '1jJ(x - b) for a = A and b = B yields a wave function with
both means zero, and the above argument applied to this wave function yields
the desired Heisenberg uncertainty principle for '1jJ. Another way to see the
same thing is to observe that the operators A - AI and B - BI satisfy the same
commutation relation, so the above argument applied to these operators yields

(A'1jJ - A'1jJ, A'1jJ - A'1jJ)(B'1jJ - B'1jJ, B'1jJ - B'1jJ) ~~

which is the general form of the uncertainty principle.
Now there are two important lessons to be drawn from our argument. The
first is that this is really an operator theoretic result. We only used the facts
that A and B were Hermitian operators whose commutator satisfied the identity
-irA, B) = I. In fact, the same argument works if we have the inequal-
ity -irA, B) ~ I, which is sometimes useful in applications (this just means
(-i[A,B)'1jJ,'1jJ) ~ ('1jJ,'1jJ). Two such operators are called complementary pairs
in quantum mechanics. Thus, the operator version of the uncertainty principle
!
asserts Var(A)· Var(B) ~ for complementary pairs. In fact, to get the Fourier
analytic form of the uncertainty principle we needed to use the identity

Var(A) = Var((27r)-1/2~)
for the momentum operator A = i(d/dx).
The second important lesson that emerges from the proof is that we can also
determine exactly when the inequality in the uncertainty principle is an equality.
In the special case that both operators have mean zero, the only place we used
an inequality was in the estimate

2 Re(i(A'1jJ, B'1jJ)) :::; 2(A'1jJ, A'1jJ)1/2(B'1jJ, B'1jJ)

which involved the Cauchy-Schwartz inequality. But we know that this is an

equality if and only if the two functions A'1jJ and B'1jJ are proportional to each
other (in order to have equality above we also need to know that i(A'1jJ, B'1jJ) is
real and positive). But this leads to a first-order differential equation '1jJ' = cx'1jJ
which has exactly the Gaussians as solutions, and it is easy to check that the
condition i(A'1jJ, B'1jJ) ~ 0 is satisfied for any Gaussian ce- tx2 .
Thus we conclude that there is equality in the uncertainty principle exactly
in the case that '1jJ is a Gaussian of arbitrary variance, translated arbitrarily in
both space and momentum, i.e., (2t/'Ir)I/4eiaxe-tlx-bI2 (the constant is chosen
to make J 1'1jJ12 dx = 1). One could also compute the variances exactly in this
case. An interesting point about these functions is that by varying the parame-
ter t we may obtain all possible values for Var(A) and Var(B) whose product
equals !.
130 Fourier Analysis

For the Heisenberg uncertainty principle in n dimensions, we consider the n

complementary pairs Aj = i(8/8xj) and B j = multiplication by Xj (it is easy
to see -i[Aj, Bjl = I), and hence we have

and it follows that

by elementary inequalities; this is the n dimensional inequality. Again, equality

holds exactly when 'Ij; is a Gaussian, translated in space and momentum.
Although the proof of the uncertainty principle we gave was not Fourier
analytic, the interpretation in Fourier analytic terms is extremely interesting.
For one thing, it gives an interpretation of the uncertainty principle in classical
physics. Suppose 'Ij;(t) is a sound wave. Then Var('Ij;) is an index of its con-
centration in time. We interpret "j;(t) as the frequency distribution of the sound
wave, so Var((27r)-1/2"j;) is an index of the concentration of the pitch. The
uncertainty principle then says that a sound cannot be very concentrated in both
time and pitch. In particular, short duration tones will have a poorly determined
pitch. String players and singers take advantage of this: in very rapid passages,
errors in intonation will not be noticeable.
The Fourier analytic version of the uncertainty principle also has an important
interpretation in signal processing. The idea is that one would love to have a
basic signal that is very localized in both time and frequency. The uncertainty
principle forbids this, and moreover tells you that the Gaussian is the best
compromise. This leads to the Gabor transform. Take a fixed Gaussian, say
'Ij;( t) = 7r- 1/ 4 e- t2 /2, translate it in time and frequency and take the inner product
with a general signal f(t) to be analyzed:

This gives a "snapshot" of the strength of the signal in the vicinity of the time
t = b and in the vicinity of the frequency a. The rough localization achieved by
the Gaussian is the best we can do in this regard, according to the uncertainty
principle.
We can always recover the signal f from its Gabor transform, via the inversion
formula
Hermite junctions 131

To establish this formula, say for f E S (it holds more generally, in a suitable
sense) we just substitute into the right side the definition of G(a, b) (renaming
the dummy variable s):

and using the Fourier inversion formula for the s and a integrals we obtain

Since f~oo e-(t-b)2 db = 'Tr1/2 we have f(t) as claimed.

The inversion formula is perhaps not so surprising because the Gabor trans-
form is highly redundant. We start with a signal that is a function of one
variable and create a transform that is a function of two variables. One of the
interesting questions in signal processing is whether we can recover the signal
by "sampling" the Gabor transform; i.e., restricting the variables a and b to
certain discrete choices.

7.6 Hermite functions

Gaussian functions have played a central role in our description of Fourier
analysis, in part because the Fourier transform of a Gaussian is again a Gaussian.
Specifically, if

then

Ff =..j2;j.
From a certain point of view, the Gaussian is just the first of a sequence of func-
tions, called Hermite functions, all of them satisfying an analogous condition
of being an eigenfunction of the Fourier transform (recall that an eigenfunction
for any operator A is defined to be a nontrivial solution of Af = ).f for a con-
stant). called the eigenvalue, where nontrivial means f is not identically zero).
Because the Fourier inversion formula can be written F2cp(X) = 2'Trcp( -x) it
follows that rcp = (2'Tr?cp, so the only allowable eigenvalues must satisfy
).4 = (2'Tr)2, hence
132 Fourier Analysis

From a certain point of view, the problem of finding eigenfunctions for the
Fourier transform has a trivial solution. If we start with any function cp, then
the function

satisfies FJ = (27r)1/2J, and analogous linear combinations give the other

eigenvalues. You might worry that J is identically zero, but we can recover cp
as a linear combination of the 4 eigenfunctions, so at least one must be nontrivial.
More to the point, such an expression gives us very little information about the
eigenfunctions, so it is not considered very interesting.
It is best to present the Hermite functions not as eigenfunctions of the Fourier
transform, but rather as eigenfunctions of the operator H = -(d2 jdx 2 ) + x 2 •
This operator is known as the harmonic oscillator, and the question of its eigen-
functions and eigenvalues (spectral theory) is important in the quantum me-
chanical theory of this system. Here we will work in one-dimension, since the
n-dimensional theory of Hermite functions involves nothing more than taking
products of Hermite functions of each of the coordinate variables.
The spectral theory of the harmonic oscillator H is best understood in terms
of what the physicists call creation and annihilation operators A * = (dj dx) - x
and A = -(djdx) - x. It is easy to see that the creation operator A* is the
adjoint operator to the annihilation operator A. Now we need to do some algebra
involving A*, A, and H. First we need

A * A = H - 1 and AA * = H + 1,
as you can easily check. Then AH = AA * A + A while H A = AA * A - A so
[A, H) = 2A and similarly A* H = A* AA* - A* while HA* = A* AA* + A*
so [A*,H) = -2A*.
So what are the possible eigenvalues A for H? Suppose H cp = Acp. Then A
is real because H is self-adjoint:

A(cp,Cp) = (Hcp,cp) = (cp,Hcp) = )..(cp,cp),

hence A = )... Furthermore, we claim A ~ 1. To see this use A * A =H- 1 to

get

A(cp,cp) = (Hcp,cp) = (A* Acp,cp) + (cp,cp) = (Acp,Acp) + (cp,cp) ~ (cp,cp),

and furthermore A = 1 is possible only if Acp = O. But Acp = 0 is a first-order

differ~ntial e~uatio~
whose upique solution (up to a constant multiple) is the
Gaussian e- x /2. Smce Ae- x /2 = 0 we have

He- X2 / 2 = e- x2 / 2 + A* Ae- x2 / 2 = e- x2 / 2 .
Hermite junctions 133

So we have identified I as the lowest eigenvalue (the bottom of the spectrum)

with multiplicity one, and the Gaussian as the unique corresponding eigenfunc-
tion (the groundstate). To get a hold of the rest of the spectrum we need a little
lemma that describes the action of the creation and annihilation operators on
eigenfunctions:

Lemma 7.6.1
Suppose cp is an eigenfunction of H with eigenvalue A. Then A *cp is an eigen-
function with eigenvalue A + 2, and Acp is an eigenfunction (as long as A =I- 1)
with eigenvalue A - 2.

The proof is a simple computation involving the commutation identities we

already computed. Of course we need also to observe that A * cp and Acp are
not identically zero, but we have already seen that Acp = 0 means A = 1, and
the solutions to A*cp = 0 are ce x2 / 2 which are ruled out by growth conditions
(we need at least f Icp(x)12 dx < 00 for spectral theory to work). Now since
AH - H A = 2A we have AH cp - H Acp = 2Acp, hence

H(Acp) = AHcp - 2Acp = (A - 2)Acp

and similarly

A* Hcp - HAcp = -2Acp,

hence

H(Acp) = A Hcp + 2Acp = (A + 2)Acp,

proving the lemma.
So the creation operator boosts the eigenvalue by 2 and the annihilation opera-
tor decreases it by 2. We immediately obtain an infinite ladder of eigenfunctions
with eigenvalues 1,3,5, ... by applying powers of the creation operators to the
groundstate. We write

where the positive constants Cn are chosen so that f h n (X)2 dx = 1 (in fact Cn =
but we will not use this formula). Then Hh n = (2n + l)h n
7r- 1/ 4(2 n n!)-1/2,
by the lemma. We claim that there are no other eigenvalues, and each positive
odd integer has multiplicity one (the space of eigenfunctions is one-dimensional,
consisting of multiples of h n ). The reasoning goes as follows: if we start with
any A not a positive odd integer, then by applying a high enough power of the
annihilation operator we would end up with an eigenvalue less than 1 (note that
we never pass through the eigenvalue 1), which contradicts our observation that
1 is the bottom of the spectrum. Similarly, the fact that the eigenspace of 1 has
134 Fourier Analysis

multiplicity one is inherited inductively for the eigenspaces of 2n + I because

the annihilation operator is a one-to-one mapping down the ladder (it only fails
to be one-to-one on the eigenspace of I, which it does annihilate).
The functions h n (x) are called the Hermite functions. Explicitly

hn(x) -
_
C
( d
n dx - x
)n e _x /2 ,
2

and it is easy to see that hn(x) = CnHn(x)e-x2/2 where Hn(x) is a polynomial

of degree n (in fact Hn is even or odd depending on the parity of n), called the
Hermite polynomial. It is clear from this formula that h n E S.
Now the Hermite functions form an orthonormal system (this is a gen-
eral property of eigenfunctions of self-adjoint operators), namely (h n , h m ) =
J hn(x)hm(x) dx = 0 if n =1= m (and = 1 for n = m). Note that since hn(x)
are real-valued functions, we can omit complex conjugates. This follows from
the identity

which is impossible for n =1= m unless (h n , h m ) = O. They are also a complete

system, so any function 1 (with J 1112 dx < 00) has an expansion
00

1= L(f, hn)hn
n=O

called the Hermite expansion of 1. The coefficients (f, h n ) satisfy the Parseval
identity

f: l(f,
n=O
hnW = 11112 dx.
The expansion is very well suited to the spaces Sand S'. We have 1 E
S if and only if the coefficients are rapidly decreasing, l(f, hn)1 ~ cN(l +
n)-N all N. Notice that (f, h n ) is well defined for 1 E S'. In that case we
have l(f, hn)1 ~ c(l + n)N for some N, and conversely L, anh n represents a
tempered distribution if the coefficients satisfy such a polynomial bound.
Now it is not hard to see that the Hermite functions are eigenfunctions for the
Fourier transform. Indeed, we only have to check (from the ping- pong table)
the behavior of the creation and annihilation operators on the Fourier transform
side, namely

F Acp = iA Fcp

and

F Acp = -iAFcp.
Radial Fourier transfonns and Bessel/unctions 135

From the first of these we find

Fh n = cn F(A*)n e -x 2 /2
= cn(i)n(A*)n Fe- x2 /2
= i n J2;c n (A*)n e -x 2 /2 = inJ2;hn

so h n is an eigenfunction with eigenvalue in"j2;. We also observe that

FHcp = F(A* A + l)cp = (A* A + l)Fcp = HFcp

so H commutes with the Fourier transform. It is a general principle that com-

muting operators have common eigenfunctions, which explains why the spectral
theory of Hand F are so closely related.
Note that we can interpret this fact as saying that the Hermite expansion
diagonalizes the Fourier transform. If f = L anh n then F f = L inV27i anhn'
so that F is represented as a diagonal matrix if we use the basis {h n }.
We also have another representation for all the eigenfunctions of the Fourier
transform; for example, Ff = Vhf if and only if f has a Hermite expansion
f = L a4nh4n containing only Hermite functions of order == 0 mod 4. Again,
this is not a very illuminating condition. For example, the Poisson summation
formula implies that the distribution f = L;;'=-oo o(x - V27ik) satisfies Ff =
Vhf. I do not know how to compute the Hermite expansion of this distribution.
Using further properties of Hermite functions, it is possible to find explicit
solutions to various differential equations involving H, such as the Schrodinger
equation

:tU = ikHu
(and the n-dimensional analog) which is important in quantum mechanics.

7.7 Radial F onrier transforms and Bessel functions

The Fourier transform of a radial function is again a radial function. We ob-
served this before indirectly, because the Fourier transform commutes with ro-
tations, and radial functions are characterized by invariance under rotation. In
this section we give a more direct explanation, including a formula for the ra-
dial Fourier transform. The formula involves a class of special functions called
Bessel functions. We begin, as usual, with the one-dimensional case, where the
result is very banal. In one-dimension, a radial function is just an even function,
136 Fourier Analysis

f(x) = f( -x). The Fourier transform of an even function is also even, and is

1(0 = i:
given by the Fourier cosine formula:

eix € f(x) dx

= i: (cosx~+isinx~)f(x)dx

= 21 00
f(x)cosx~dx,
the sine term vanishing because it is the integral of an odd function.
There is a similar formula for n = 3, but we have to work harder to derive it.
We work in a spherical coordinate system. If f is radial we write f(x) = f(lx!)
by abuse of notation. Suppose we want to compute 1(0,0, R). Then we take
the z-axis as the central axis and the spherical coordinates are
x = r sin ¢> cos 0
y = r sin ¢> sin 0
z = rcos¢>
° ° °
with ~ r < 00, ~ 0 ~ 27r, ~ ¢> ~ 7r and the element of integration is
dx dy dz = r2 sin ¢> dr d¢> dO (r2 sin ¢> is the determinant of the Jacobian ma-
trix 8(x, y, z)/8(r, ¢>, 0), a computation you should do if you have not seen it
before). The Fourier transform formula is then

I(O,O,R) = 121< 11< 1 00

eiRT cos <Pf(r)r2 sin ¢>dr d¢>dO.

Now the O-integral produces a factor of 27r since nothing depends on O. The
¢>-integral can also be done explicitly, since

11<
o
eiRT cos <P sin ¢>d¢> = ___e_i.R_T_CO_S_<P
zRr
/1<
0

2sinrR
iRr rR
Thus we have altogether
A

f(O, 0, R) = 47r Jo
roo ~
sinrR
f(r)r 2 dr.

Since I is radial, this is the formula for I(R) (or, if you prefer, given any ~
with I~I = R, set up a spherical coordinate system with principle axis in the
direction of ~, and the computation is identical).
Superficially, the formula for n = 3 resembles the formula for n = 1, with
the cosine replaced by the sine. But the appearance of r R in the denominator
Radial Fourier transfonns and Bessel junctions 137

has some implications concerning the decay of Fourier transforms of radial

functions. For example, suppose f is radial and integrable. We can almost
conclude that j(R) decays faster than R- 1 as R -+ 00. I say "almost" because
we have to assume something about the behavior of f near zero; it suffices to
have f continuous at zero, or even just bounded in a neighborhood of zero.
The fact that f is integrable is equivalent to the finiteness of I If(r)lr2 dr. This
certainly implies 1100 If(r)lr dr < 00, and the assumption about the boundedness
of f near zero implies Jd If(r)lr dr < 00, so

1 00
If(r)lrdr < 00.

Then the Riemann-Lebesgue lemma for the Fourier sine transform of the one-
dimensional function If(r)lr implies

lim
R->oo
10
00
sinrRf(r)rdr=O.

Since j(R) = 47fR- 1 It sinrRf(r)rdr we obtain the desired conclusion. Of

course this is much stronger than the conclusion of the 3-dimensional Riemann-
Lebesgue lemma. The moral of the story is that Fourier transforms of radial
functions are unusually well behaved.
So what is the story for other dimensions? n = 2 will illustrate the difficulties.
If we use polar coordinates

x = r cosO
y = r sin 0

for 0 ~ r < 00, 0 ~ 0 ~ 271", the element of integration is dx dy = r dr dO, and

our computation as before becomes

j(R, 0) = 127r 1 00
eirR cos (J f(r)r dr dO.

J:
This time there is no friendly sin 0 factor, so we cannot do the O-integration in
terms of elementary functions. It turns out that 7r e i8 cos (J dO is a new kind
of special function, a Bessel function of order zero. As the result of historical
accident, the exact notation is

Jo(s) = ~ 127r ei8 cos (J dO.

271" 0

Thus the 2-dimensional radial Fourier transform is given by

j(R) = 27r 1 00
Jo(rR)f(r)rdr.

This would fit the pattern of the other two cases if we could convince ourselves
that Jo behaves something like a cosine or sine times a power.
138 Fourier Analysis

Now there are whole books devoted to the properties of J o and its cousins,
the other Bessel functions, and precise numerical computations are available.
Here I will only give a few salient facts. First we observe that by substituting
the power series for the exponential we can integrate term by term, using the
elementary fact

1 (7r 2k (2k)!
27r 10 cos BdB = 22k(k!)2
to obtain

Jo(s) = ~
27r
rf
10
27r
(iscosB)k dB
k!
k=O

l)k 2k 1 127r
= L -(2k)!
00 (
s -
27r
cos
0
2k BdB
k=O

(the integrals of the odd powers are zero by cancelation). This power series
converges everywhere, showing that Jo is an even entire analytic function. Also
Jo(O) = 1. Notice that these properties are shared by cos s and sin s / s, which
are the analogous functions for n = 1 and n = 3.
The behavior of J o( s) as s --+ 00 is more difficult to discern. First let's make
a change of variable to reduce the integral defining Jo to a Fourier transform:
t = cos B. We obtain

Thus J o is the I-dimensional Fourier transform of the function we can write

(1 - t2):~'/2 (the + subscript means we set the function equal to zero when
1 - t2 becomes negative, i.e., outside It I < 1). This function has compact
support and is integrable, even though it has singularities at t = ± 1. Note that
the Paley-Wiener theorems imply that J o is an entire function, as we observed
already, and in fact it has exponential type 1. Also the Riemann-Lebesgue
lemma implies Jo vanishes at infinity, but we are seeking much more precise
information. Now we claim that only the part of the integral near the singUlarities
t = ± 1 contributes to the interesting behavior of J o near infinity. If we were
to multiply (1 - t2):t'/2 by a cut- off function vanishing in a neighborhood of
the singularities t = ±I, the result would be a function in V, whose Fourier
transform is in S hence decaJ,s faster than any polynomial rate. Now we can
factor the function (1 - t 2 ):t' 2 = (1 - t)+'/2(1 + t)+'/2. In a neighborhood of
+'
t = + 1, the function (1 + t) /2 is Coo, and to first approximation is just the
Radial Fourier transfonns and Bessel junctions 139

constant 2- 1/ 2. Similarly, in a neighborhood of t = -1, the function (1- t)+1/2

is C= and is approximately 2- 1/ 2 . Thus, to first approximation, we should have

10(s) ~ -1-1=
V27r -=
eist (1 - t)+1/2 dt

(we made the change of variable x = 1 - t in the first integral and x = 1 + t in

the second). However, by a modification of the argument in example 5 of 4.2
we have

Thus for s > 0 large, we have

This is exactly what we were looking for: the product of a power and a sine
function. Of course our approximate calculation only suggests this answer, but
a more careful analysis of the error shows that this is correct; in fact the error
is of order s-3/2. In fact, this is just the first term of an asymptotic expansion
(as s -+ 00) involving powers of s and translated sines.
To unify the three special cases n = 1,2,3 we have considered, we need
to give the definition of the Bessel function J o of arbitrary (at least a > -4)
order:

Jo(s) = ro SO 11
-I
(1 - t 2 )0-1/2 eist dt
140 Fourier Analysis

(the constant /01 is given by /01 = 1/J7r2CXr(0: + !». Note that when 0: = !
it is easy to evaluate

= j!S-I/2 sin s.

Using integration by parts we can prove the following recursion relations for
Bessel functions of different orders:
d
ds(sCXJcx(s)) = scxJcx_l(s)
d
ds (s-CX Jcx(s)) = _s-cx Jcx+I(S).

Using these we can compute

!,
and more generally, if 0: = k + k an integer, then J k +! is expressible as a
finite sum of powers and sines and cosines. The function s-cx J 01 (s) is clearly
an entire analytic function. Also, the asymptotic behavior of J 01 (s) as s -+ +00
is given by

Jcx(s) ~ Vfi-; s- 1/2 cos ( s - 2""

0:71" - 4"
71") .

Now each of the formulas for the radial Fourier transform for n = 1,2, 3 can
be written
A

feR) = Cn
1o
00
J=J(rR)
2 n

(rR)-2
2 f(r)r n - I dr

and in fact the same is true for general n. The general form of the decay rate
for radial Fourier transforms is the following: if f is a radial function that is
integrable, and bounded near the origin, then feR) decays faster than R-(!!.yl).
Many radial Fourier transforms can be computed explicitly, using properties
of Bessel functions. For example, the Fourier transform of (1 - IxI2)+. in lffi.n
is equal to

J';:+cx(R)
c(n,o:) R';:+cx

for the appropriate constants c(n, 0:).

Haar functions and wavelets 141

7.8 Haar functions and wavelets

Fourier analysis shows us how to expand an arbitrary function in terms of the
simple basic functions eix.~. These functions are precisely determined in fre-
quency, but have no localization in space. We have seen that the Heisenberg
uncertainty principle prohibits a function from simultaneously being very local-
ized in space and frequency. The Gabor transform and its inversion formula
gives a way of generating all functions out of a single prototype, the Gaussian,
which is translated in both space and frequency. The Gaussian is reasonably
localized on both accounts, and we cannot do any better from that point of view.
Wavelet theory is a different attempt to expand a general function in terms of
simple building blocks that are reasonably localized in space and frequency.
The difference is that we do not translate on the frequency side; instead we take
both translates and dilates of a single wavelet.
The simplest example of such a wavelet is the Haar function, defined as

~(x) ~
if O<xS;!

if !<xS;1
[ :'
otherwise.

11---

o 112

-1

FIGURE 7.2

The support of t/J is [0, 1]. We dilate t/J by powers of 2, so t/J(2i x) is supported
on [0,2- j ] (j is an integer, not necessarily positive) and we translate the dilate
by 2- j times an integer, to obtain

This function is supported on [2- j k, r j (k+ 1)]. It is also convenient to normal-

142 Fourier Analysis

ize this function by multiplying by 2 j / 2 . The family of functions 2 j / 2 'f/;(2 j x-

k) = 'f/;j,k(X) is an orthonormal family, meaning f'f/;j,k(X)'f/;j',k,(x)dx = 0
unless j = j' and k = k', in which case the integral is one (the condition
f 'f/;j,k(X? dx = 1 is satisfied exactly because of the normalization factor 2j / 2
we chose). We can understand the orthogonality condition very simply. If j = j'
then the functions 'f/;j,k and 'f/;j,k' have disjoint support if k =f:. k', so their product
is identically zero. If j =f:. j' then 'f/;j,k and 'f/;j' ,k' may have overlapping support,
but we claim that the integral of their product must still be zero, because of
cancellation. Say j > j'. Then 'f/;j,k changes sign on neighboring intervals of
length 2- j . But 'f/;j' ,k' is constant on intervals of length 2-/, in particular on
the support of 'f/;j,k. so

I 'f/;j,k(X)'f/;j',k,(x)dx= const I 'f/;j,k(x)dx=O.

Because we have an orthonormal family, we can attempt to expand an arbi-

trary function according to the recipe
00 00

f = L L (1, 'f/;j,k)'f/;j,k.
j=-oo k=-oo

We refer to this as the Haar series expansion of f. The system is called complete
if the expansion is valid. Of course this is something of an oversimplification
for two reasons:

1. We cannot expect to represent a completely arbitrary function, the usual

minimal restriction being f If(x)j2 dx < 00.
2. The convergence of the infinite series representation may not be valid in
the pointwise sense and may not have certain intuitively plausible proper-
ties.

Since both these problems already occur with ordinary Fourier series, we should
not be surprised to meet them again here.
We claim in the case of Haar functions that we do have completeness. Before
demonstrating this, we should point out one paradoxical consequence of this
claim. Each of the functions 'f/;j,k has total integral zero, f~oo 'f/;j,k(X) dx = 0,
so any linear combination will also have zero integral. If we start out with a
function f whose total integral is nonzero, how can we expect to write it as a
series in functions with zero integral?
The resolution of the paradox rests on the fact that an infinite series L: ],. k Cjk
'f/;j,k(X) is not quite the same as a finite linear combination. The argument that

IL j,k
Cjk'f/;j,k(X) dx = L Cjk I 'f/;j,k(X) dx = L Cj,k ·0= 0
j,k j,k
Hoar junctions and wavelets 143

requires the interchange of an integral and an infinite sum, and such an inter-
change is not always valid. In particular, the convergence of the Haar series
expansion does not allow term-by-term integration.
We can explain the existence of Haar series expansions if we accept the
following general criterion for completeness of an orthonormal system: the
system is complete if and only if any function whose expansion is identically
zero (Le., (f, "pj,k) = 0 for all j and k) must be the zero function (in the sense
discussed in section 2.1). We can verify this criterion rather easily in our case,
at least under the assumption that f is integrable (this is not quite the correct
assumption, which should be that Ifl2 is integrable).
So suppose f is integrable and (f, "pj,k) = 0 for all j and k. First we claim
Id/ 2 f(x) dx = O. Why? Since (f, "po,o) = 0 we know

1o
1/2 f(x) dx = 11 f(x) dx,
1/2
so

11 f(x) dx = 211/2 f(x) dx.

But then since (f, "p-I,O) = 0 we know

11 f(x) dx = 12 f(x) dx
so

12 f(x) dx = 411/2 f(x) dx.

By repeating this argument we obtain

12k f(x) dx = 2k+1 11/2 f(x) dx

10
and this would contradict the integrability condition unless 1/ 2 f(x) dx = O.
However, there is nothing special about the interval [0, t].
By the same sort of ar-
gument we can show II f(x) dx for any interval I of the form [2- j k, 2- j (k+ 1)]
for any j and k, and by additivity of the integral, for any I of the form
[2- j k, 2- j m] for any integers j, k, m. But an arbitrary interval can be ap-
proximated by intervals of this form, so we obtain II f(x) dx = 0 for any
interval I. This implies that f is the zero function in the correct sense.
There is a minor variation of the Haar series expansion that has a nicer in-
terpretation and does not suffer from the total integral paradox. To explain it,
we need to introduce another function, denoted I.{J, which is simply the charac-
teristic function of the interval [0, 1]. We define I.{Jj,k by dilation and translation
as before: I.{Jj,k(X) = 2jf2I.{J(2j x - k). We no longer claim that I.{Jj,k is an
144 Fourier Analysis

orthonormal system, since we have no cancellation properties. However, all

the functions <Pj,k with one fixed value of j are orthonormal, because they
have disjoint supports. Also, if jf 2: j, then the previous argument shows
f <Pj,k(X)'l/Jj',k,(x)dx = O. Thus, in particular, the system consisting of'l/Jo,k
for all k and 'l/Jj,k for all k and just j 2: 0 is orthonormal and a variant of the
previous argument shows that it is also complete. The associated expansion is
00 00 00

k=-oo j=O k=-oo

Now here is the interpretation of this expansion. The first series,

L (f, <PO,k)<PO,k
k=-oo

is just a coarse approximation to f by a step function with steps of length one.

Each of the subsequent series
00

L (f, 'l/Jo,k)'l/JO,k
k=-oo

for j = 0,1,2, ... adds finer and finer detail on the scale of 2- j . One of the
advantages of this expansion is its excellent space localization. Each of the Haar
functions is supported on a small interval; the larger j, the smaller the interval.
If the function f being expanded vanishes on an interval f, then the coefficients
of f will be zero for all 'l/Jj,k whose support lies in f.
A grave disadvantage of the Haar series expansions is that the Haar functions
are discontinuous. Thus, no matter how smooth the function f may be, the
approximations obtained by taking a finite number of terms of the expansion
will be discontinuous. This defect is remedied in the smooth wavelet expansions
we will mention shortly.
Another closely related defect in the Haar series expansion is the lack of
localization in frequency. We can compute directly the Fourier transform of <P:

Aside from the oscillatory factors, this decays only like I~I-l, which means it is
not integrable (of course we should have known this in advance: a discontinuous
function cannot have an integrable Fourier transform). The behavior of ~ is
similar. In fact, since

'l/J(x) = <p(2x) - <p(2x - 1)

Hoar junctions and wavelets 145

we have

which has the same I~I-I decay rate.

The Haar system has been around for more than half a century, but has not
been used extensively because of the defects we have mentioned. In the last
five years, a number of related expansions, called wavelet expansions, have been
introduced. Generally speaking, there are two functions, cp, called the scaling
junction, and 'Ij;, called the wavelet, so that the identical formulas hold. The
particular system I will describe, called Daubechies wavelets, are compactly
supported and differentiable a finite number of times. Other wavelet systems
have somewhat different support and smoothness properties. The key to gen-
erating wavelets is the observation that the Haar system possesses a kind of
self-similarity. Not only is 'Ij; expressible as

'Ij;(x) = cp(2x) - cp(2x - 1),

a linear combinations of translates and dilates of cp, but so is cp itself:

cp(x) = cp(2x) + cp(2x - 1).

This identity actually characterizes cp, up to a constant multiple. It is referred

to as a scaling identity or a dilation equation. Because of the factor of 2 on
the right side, it says that cp on a larger scale is essentially the same as cp on
a smaller scale. The scaling function for the Daubechies wavelets satisfies an
analogous but more complicated scaling identity,

N
cp(x) = L akCP(2x - k).
k=O

The coefficients ak must be chosen with extreme care, and we will not be able
to say more about them here. The number of terms, N + 1, has to be taken fairly
large to get smoother wavelets (approximately 5 times the number of derivatives
desired).
The scaling identity determines cp, up to a constant multiple. If we restrict cp
to the integers, then the scaling identity becomes an eigenvalue equation for a
finite matrix, which can be solved by linear algebra. Once we know cp on the
integers, we can obtain the values of cp on the half-integers from the scaling
identity (if x = m/2 on the left side, then the values 2x - k = m - k on the
146 Fourier Analysis

FIGURE 7.3

right are all integers). Proceeding inductively, we can obtain the values of cp at
dyadic rationals m2- j, and by continuity this determines cp( x) for all x. Then
the wavelet 'ljJ is detennined from cp by the identity
N
'ljJ(x) = ~)-1)kaN_kcp(2x-k).
k=O

These scaling functions and wavelets should be thought of as new types of

special functions. They are not given by any simple formulas in tenns of more
elementary functions, and their graphs do not look familiar (see figures 7.3
and 7.4).
However, there exist very efficient algorithms for computing the coefficients
of wavelet expansions, and there are many practical purposes for which wavelet
expansions are now being used.
It turns out that we can compute the Fourier transfonns of scaling functions
and wavelets rather easily. This is because the scaling identity implies
N
cp(~) = ~ L akeik€cp(~/2)
k=O

which we can write

cp(~) = p(~/2)cp(~/2)
Hoar functions and wavelets 147

FIGURE 7.4

where

By iterating this identity we obtain

It is convenient to normalize cp so that <p(0) = 1. Then since ~/2m ---+ 0 as

m ---+ 00 we obtain the infinite product representation

<p(~) = IT p(~/2j).
j=1

(Note: the coefficients are always chosen to satisfy 42::=0

ak = 1 so p(O) = 1,
which makes the infinite product converge-in fact it converges so rapidly that
148 Fourier Analysis

only 10 or 20 terms are needed to compute rp( 0 for small values of 0 Then
1/; is expressible in
terms of rp as

1/;(0 = q(~/2)rp(0

where
N
q(O = 2:1 ""
L..(-l) k aN-k e'"k€ .
k=O
It is interesting to compare the infinite product form of rp and the direct com-
putation of rp in the case of the Haar functions. Here p(O = + e i €) = t(1
e €/2 cos ~/2, so
i

00 00

j=2 j=2

This is the same result as before because

II cosx/2 k = sin x/x,

j=1

an identity known to Euler, and in special cases, Francois Viete in the 1590s.

7.9 Problems
1. Show that

lim
t-+O
1
a
b
f(x) sin(tg(x)) dx = 0

if f is integrable and 9 is Cion [a, b] with nonvanishing derivative.

2. Show that an estimate

Ij(~)1 ~ C Ilflll h(~)

cannot hold for all integrable functions, where h(O is any fixed function
vanishing at infinity, by considering the family of functions f(x)ei,.,.x for
fixed f and T/ varying.
3. Show that an estimate of the form

for all 'f with Ilfllp < 00 cannot hold unless! + !q = 1, by considering
• • P
the dilates of a fixed functIOn.
Problems 149

4. Let I(x) '" L cne inx be the Fourier series of a periodic function with
J0271: I/(x)1 dx < 00. Show that
lim
n-+±oo
Cn = O.

5. Suppose I/(x)1 s;; ce- a1xl . Prove that} extends to an analytic function
on the region 11m (I < a.
6. Show that the distributions

and

are supported in Ixl s;; Itl. Use this to prove a finite speed of propagation
for solutions of the Klein-Gordon equation

8 2u _ J\ 2
8t 2 - I..l.xU - m u.

7. Let 1 be an integrable function supported in the half-plane ax + by ~ 0 in

Jm.2. What can you say about analyticity of the Fourier transform }? What
can you say if 1 is supported in the first quadrant (x 2': 0 and y 2': O)?
8. What can you say about the product of two entire functions of exponential
type? How does this relate to the support of the convolution of two
functions of compact support?
9. Characterize the Fourier transforms of test functions on Jm.1 supported in
an arbitrary interval [a, bJ.
10. Characterize the Fourier transforms of test functions on Jm.2 supported in
the rectangle Ixl s;; a, Iyl s;; b.
11. Let 1 E f'(Jm.I). Show that j(() = 0 if and only if g(x) = ei(x is a
solution of the convolution equation 1*g = O. Show that ( has multiplicity
at least m (as a zero of j) if and only if g(x) = xm-1ei(x is a solution.
12. Show that if 1 and 9 are distributions of compact support and not identi-
cally zero then 1 * 9 is not identically zero.
13. EvaluateL:=_oo(1 + tm 2)-1 using the Poisson summation formula.
14. Compute the Fourier transform of L~-oo o'(x - k). What does this say
about L~-oo I'(k) for suitable functions I?
15. Let r be the equilateral lattice in Jm.2 generated by vectors (1,0) and
(1/2, .;3/2). What is the dual lattice? Sketch both lattices.
16. What is the Fourier transform of L~-oo o(x - Xo - k) for fixed xo?
150 Fourier Analysis

17. Compute

~ (Si~tkr
using the Poisson summation formula.
18. Is a translate of a positive definite function necessarily positive definite?
What about a dilate?
19. Show that e- 1xl is positive definite. (Hint: What is its Fourier transform?)
20. Show that the product of continuous positive definite functions is positive
definite.
21. Show that if I and 9 are continuous positive definite functions on JmI,
I(x)g(y) is positive definite on ]R2.
22. Show that I*I is always positive definite, where lex) = I( -x) and
IE S.
23. Let f..l be a probability measure on [0, 27r). Characterize the coefficients
of the Fourier series of f..l in terms of positive definiteness.
24. Compute Var("p) when "p is a Gaussian. Use this to check that the uncer-
tainty principle inequality is an equality in this case.
25. Show that if a distribution and its Fourier transform both have compact
support, then it is the zero distribution.
26. If"p has compact support, obtain an upper bound for Var("p).
27. Let "pt(x) = rn/2"p(x/t), for a fixed function"p. How does Var("pt}
depend on t? How does Var("pt}Var((27r)-n/2-¢t} depend on t?
28. Show that the Hermite polynomials satisfy

29. Show that

[n/2] (_l)k ,
H ( ) - ""' n. ( )n-2k
n x - L...J k! (n _ 2k)! 2x .
k=O
30. Prove the generating function identity

~ Hn(x)tn = e2xt-t2.
L...J n!
n=O
31. Show that H~(x) = 2nHn_I(X).
32. Prove the recursion relation

33. Compute the Fourier transform of the characteristic function of the ball
Ixl ~ b in ]R3.
Problems 151

34. Prove the recursion relations

and
d
d8 (8- 0 J o (8)) = _8- 0
J o +1(8).

Use this to compute J 3 / 2 explicitly.

35. Show that J o (8) is a solution of Bessel's differential equation

1"(8) + -;I /,(8) + (aii

1-
2
) 1(8) = O.

36. Show that

for n a positive integer.

37. Show that the integer translates of a fixed function 1 on Jm.1 are orthonor-
mal, i.e.,

l:/(X-k)/(x-m)dx={ ~ ~::
if and only if j satisfies
00

l: Ij(~ + 27rkW == 1.
k=-oo
38. Show that the three families offunctions 'Pj,k(X)'l/Jj,k' (y), 'l/Jj,k(X)'Pj,k' (y),
and 'l/Jj,k(X)'l/Jj,k'(Y) for j,k, and k' varying over the integers, form a
complete orthonormal system in Jm.2, where 'I/J is the Haar function and
'P the associated scaling function (the same for any wavelet and scaling
function).
39. Show that a wavelet satisfies the vanishing moment conditions

m=O,I, ... ,M

provided
I N
q(~) = "2l:(-I)kaN_keik€
k=O
has a zero of order M + I at ~ = o.
152 Fourier Analysis

40. Let Vj denote the linear span of the functions <Pj,k(X) as k varies over the
integers, where <P satisfies a scaling identity
N
<p(x) = L ak<P(2x - k).
k=O

Show that Vj ~ Vj+l' Also show that f(x) E Vj if and only if f(2x) E
Vj+l'
41. Let <P and 'Ij; be defined by cp(~) = X[-ll',ll'](O and

-J;(~) = X[-21l', -1l'](~) + X [11', 21l'](~).

Compute <P and 'Ij; explicitly. Show that 'lj;j,k is a complete orthonormal
system. (Hint: Work on the Fourier transform side.)
Sobolev Theory and Microlocal Analysis

8.1 Sobolev inequalities

Throughout this work I have stressed the revolutionary quality of distribution
theory: instead of asking "does this problem have any function solutions?," we
ask "does this problem have any distribution solutions?" Nevertheless, there
are times when you really want function solutions. We claim that even then
distribution theory can be very useful. First we find the distribution solutions,
and then we pose the question "when are the distribution solutions actually
function solutions?" In order to answer such a question, we need a technique
for showing that certain distributions are in fact functions. This technique is
Sobolev theory.
Now in fact, Sobolev theory does not work miracles. We can never make
the 8-distribution into a function. So in Sobolev theory, the hypotheses usually
include the assumption that the distribution in question does come from a func-
tion, but we do not make any smoothness assumptions on the function. The
conclusion will be that the function does have some smoothness, say it is Ck.
This means that if the distribution is a solution of a differential equation of order
less than or equal to k, in the distribution sense, then it satisfies the differential
equation in the usual, pointwise sense. In this way, Sobolev theory allows you
to obtain conventional solutions using distribution theory. Sobolev theory actual
pre-dates distribution theory by more than a decade, so our placement of this
topic is decidedly counter-historical.
The concepts of Sobolev theory involve a blend of smoothness and size mea-
surements of functions. The size measurements involve the integrals of powers
of the function J If(x)iP dx where p is a real parameter satisfying p ~ 1. We
have already encountered the two most important special cases, p = 1 and
J
2. The finiteness of If(x)1 dx is what we have called "integrability" of f.
The quantity J If(x)j2 dx occurs in the Plancherel formula. If If(x)1 > 1 then
If(x)iP increases with p, so that as far as the singularities of f are concerned,
the larger p is, the more information is contained in controlling the size of

153
154 Sobolev Theory and Microlocal Analysis

J If(x)iP dx. But because the situation is reversed when If(x)1 < 1, there is no
necessary relationship between the values of J If(x)iP dx for different values
of p.
The standard terminology is to call (J If(x)iP dX)l/p the £P nann of f,
written IIflip. The conditions implied by the use of the word "norm" are as
follows:
1. IIfilp ~ 0 with equality only for the zero function (positivity)
2. IIcfllp = Icl IIfilp (homogeneity)
3. IIf + gllp ~ IIfilp + IIglip (triangle inequality).
The first two conditions are evident (with f = 0 taken in the appropriate sense)
but the proof of the triangle inequality is decidedly tricky (except when p = 1)
and we will not discuss it. The case p = 00, called the Loo norm, is defined
separately by

IIfiloo = sup If(x)l·

(Exercise: Verify conditions 1, 2, 3 for this norm.) The justification for this
definition is the fact that

lim IIfilp = IIflloo,

P-+OO
which is valid under suitable restrictions so that both sides exist (a simple
"counterexample" is f == 1, for which IIfiloo = 1 but IIfilp = +00 for all
p < 00). Suppose that f is continuous with compact support. Say f vanishes
outside Ixl ~ Rand IIfiloo = M. Then

IIfilp = (/ If(x)IP dX) lip ~ M(cRn)l/p

where the constant is the volume of the unit ball in JRn. Since limp-+oo a I I p = 1
for any positive number a, we have limp-+oo IIfilp ~ M. On the other hand,
IIfiloo = M means that for every f > 0 there is an open set U on which
If(x)1 ~ M - f; say U has volume A. Then

IIfilp ~ (fu If(X)IP) lip ~ (M - f)Al/p

so limp-+oo IIfilp ~ M - f. Since this is true for all f > 0, we have shown
Iimp-+ oo IIfilp = IIfiloo.
Now the way we blend smoothness with LP norm measurements of size is
to consider LP norms of derivatives. These are distributional derivatives, and
a priori do not imply the existence of ordinary derivatives. Sobolev theory
allows you to "trade in" a certain number of these LP derivatives and get honest
derivatives in return. For example, it will cost you exactly n derivatives in LI
Sobolev inequalities 155

to get honest derivatives (n is the dimension of the space). We can state the
result precisely as follows:

Theorem 8.1.1
(L I Sobolev inequality): Let 1 be an integrable function on Jm.n. Suppose
the distributional derivatives (a/ax) a 1 are also integrable functions for all
lal ~ n. Then 1 is continuous and bounded, and
1111100 ~ C L II (a/ax)'" 1111·
lal::;n

More generally, if (a/ax) a 1 are integrable functions for lal ~ n + k, then 1

is C k .

The terminology "Sobolev inequality" may seem strange. Notice there is an

inequality in the middle of our stated theorem; this is the LI Sobolev inequality.
It turns out that this is the key idea in the proof, and the remaining parts of the
theorem follow from it in fairly routine fashion (once again we will omit most
of this techical routine). Sometimes the term "Sobolev embedding theorem" is
used. We will explain what this is about in the next section.
As usual, it is easiest to understand the case n = 1. Let us suppose, at first,
that the function 1 is very well behaved, say even 1 E V, and let's see if we
can prove the inequality. The fundamental theorem of the calculus allows us to
write

l(x) = iXoo f'(t) dt

(because 1 has compact support the lower endpoint is really finite and 1 vanishes
there). Then we have

11(x)1 ~ iXoo 1f'(t)1 dt ~ i: 1f'(t)1 dt

and taking the supremum over x we obtain

1111100 ~ 111%·
This result looks somewhat better than the LI Sobolev inequality, but it was
purchased by two strong hypotheses: differentiability and compact support. It
is clearly nonsense without compact support (or at least vanishing at infinity).
The constant function 1 == 1 has 1111100 = 1 and 111% = O. To remedy this,
we derive a consequence of our inequality by applying it to 't/J . 1 where 't/J is
a cut-off function. Since 't/J has compact support, we can drop that assumption
about 1. As long as 1 is Coo, 't/J. 1 E V and so

II't/Jl11oo ~ II('t/Jf)'lh·
156 Sobolev Theory and Microloeal Analysis

Now
('ljJf)' = 'IjJ' f + 'ljJf'
and

1I'IjJ' f + 'ljJf'llt ~ IW flit + 1I'ljJf'11t

by the triangle inequality (which is easy to establish for the L' nonn). We also
have the elementary observation

II f(x)g(x) dxl ~ IIglioo I If(x)1 dx

which can be written

Thus, altogether, we have

If we create a family of cut-off functions 'ljJt(x) = 'IjJ(tx) so that 'ljJt(x) -+ 1 as

t -+ 0, then IIflloo = limt-+o lI'ljJdlloo and lI'ljJtlloo ~ 1 and 1I'IjJ~1I00 ~ t ~ 1, so
we have

IIflloo ~ IIfllt + IIf""

which is exactly the L' Sobolev inequality.
While we have succeeded in establishing the desired inequality (at least for
smooth functions), we seem to have gotten less than nowhere in proving the
more interesting parts of the theorem. We would like to show that if f and
f' (in the distributional sense) are integrable, then f is continuous, but we
started out assuming that f is differentiable! However, there is a tremendous
power concealed in inequalities such as the L' Sobolev inequality, which will
be illustrated here. Start with f integrable. Regard it as a distribution, and
apply the convolution with approximate identity and multiplication by cut-off
function argument we gave in section 6.6 to approximate it by test functions.
If f and l' are integrable, then the L' Sobolev inequality implies that the test
functions converge to f uniformly, and uniform limits of continuous functions
are continuous. In fact we have already given the cut-off function argument, so
let us just consider the convolution argument. Let CP. denote an approximate
identity in V. Then CP. * f is Coo. Futhermore CP. * f is integrable with
IIcp. * flit ~ IIfllt· To see this just estimate
IIcp. * fll, = I II CP.(x - y)f(y) dyl dx

~ II CP.(x - y)lf(y)1 dydx

Sobolev inequalities 157

and interchange the order of integration, using J cp,(x - y) dx = 1. What is

more, (cp, * f)' = cp, * f' is also integrable with II (cp, * f)' lit :::; II f'll!· Thus

IIcp, * flloo :::; IIcp, * fll, + lI(cp, * f)""

:::; IIfll1 + IIf""
so in the limit we obtain that f is bounded and satisfies the L' Sobolev inequality,
and then

IIcp, f - flloo :::; IIcp, f - fll, + IIcp, * f' - f'll!·

The right side goes to zero as € --+ 0 by the approximate identity theorem, and
so cp, * f --+ f uniformly, as claimed.
This essentially completes the proof when n = 1, except for the observation
that if (d/ dX) j f is integrable for j = 0, 1, ... , k + 1 then by applying the
previous argument to the functions (d/dx)j f, j = 0, ... , k we can show these
are continuous, hence f E C k .
The argument in higher dimensions is similar, except that we have to apply the
one-dimensional fundamental theorem of calculus n times. The case n = 2 will
illustrate the idea without requiring complicated notation. Assume f E V(Jm.2 ).
We begin with 8 2 f /8x8y and integrate in the y-variable to obtain

8f
-8 (x,y) =
jY -828f
8 (x,t)dt
x -00 x y
for any fixed x and y. We then integrate in the x-variable to obtain

f(x,y) = j " jY
-00 -00
82f
-88 (s,t)dtds.
x y
We can then estimate

If(x,y)l:::; LooL
2
oo 8 f
{'" (Y 8x8y(s,t) dtds ! !

: ; 1:1: !::£y(S,t)! dtds

and take the supremum over x to obtain

This argument required compact support for f, and we remove this hypothesis
158 Sobolev Theory and Microloeal Analysis

as before by multiplying by a cut-off function 'IjJ. Now

L 'IjJ = 'IjJ 8 2 f + 8'IjJ 8 f + 8'IjJ 8 f + 8 2 'IjJ f

8x8y (f) 8x8y 8x 8y 8y 8x 8x8y
so we end up with the inequality

This is our L' Sobolev inequality for n = 2. Notice that it is slightly better than
advertised, because it only involves the mixed second derivative 8 2 f 18x8y, and
neither of the pure second derivatives 8 2 JI 8x 2, 8 2 f 18y2. The same argument
in n dimensions yields the L' Sobolev inequality

where A is the set of all multi-indexes a = (a" . .. , an) where each aj assumes
the value 0 or 1.
The proof of the rest of the theorem in n dimensions is the same as before,
except that we have to trade in n derivatives because of the nth-order derivative
on the right in the L' Sobolev inequality.
The L' Sololev inequality is sharp. It is easy to give examples of functions
with fewer derivatives in L' which are unbounded. Nevertheless, if we have
fewer than n derivatives to trade in, we can obtain the same conclusion if these
derivatives have finite LP norm with larger values of p. In fact the rule is that
we need more than nip derivatives. We illustrate this with the case p = 2.

Theorem 8.1.2
( L 2 Sobolev inequality): Suppose

is finite for all a satisfying lal ~ N for N equal to the smallest integer greater
than nl2 (so N = ~ if n is odd and N = ~ if n is even). Then f is
continuous and bounded, with

More generally, if

is finite for all a satisfying lal ~ N + k, then f is C k •

Sobolev inequalities 159

We can give a rather quick proof using the Fourier transform. We will show
i
in fact that is integrable, which implies that f is continuous and bounded by
the Fourier inversion formula. Since

F ((:x) "f) = (-iO" i(~)

and

the hypotheses of the theorem imply I 1~"12Ii(~)12 d~ is finite for all lal ~ N.
Just taking the cases ~" = 1 and ~" = ~f and summing we obtain

Now we are going to apply the Cauchy-Schwartz inequality to

We obtain

! li(~)1 a< ., (J (1+ t,1~il'N ) 1j(~)I'a<r

.(! (I + t,l<il'N) -, a<) 'f'

We have already seen that the first integral on the right is finite. We claim that
the second integral on the right is also finite, because 2N > n. The idea is that
2:';=1 l~jl2N ~ cl~12N, so

and when we compute the integral in polar coordinates we obtain cIt (1 +

r 2N )-l r n-1 dr, which is convergent at infinity when 2N > n. Thus we have
160 Sobolev Theory and Microlocal Analysis

shown that / is integrable as claimed, and in fact we have the estimate

which also proves the L2 Sobolev inequality since 1111100 ~ ell/III.

Notice that we also have the conclusion that 1 vanishes at infinity by the
Riemann-Lebesgue lemma, but this is of relatively minor interest. In fact, the
Sobolev inequalities are often used in local form. Suppose we want to show
that 1 is C k on some bounded open set U. Then it suffices to show

on all slightly smaller sets V, for lal ~ N +k. The reason is that for each x E U
we may apply the £2 Sobolev inequality to 'Ij; 1 where 'Ij; E V is supported on
V and 'Ij; == 1 on a neighborhood of x. Similarly, if

for all R < 00 and all lal ~ N + k, then 1 is C k on lm.n . Of course the same
remark applies to the LI Sobolev inequality.
The statement of the LP Sobolev inequality for p > 1 is similar to the case
p = 2, but the proof is more complicated and we will omit it.

Theorem 8.1.3
(LP Sobolev inequality): Let Np be the smallest integer greater than nlp,for
fixed p > 1. IfII (a I ax)'" 1II p is finite for all la I ~ N p , then 1 is bounded and
continuous and

More generally, if II (a I ax) a 1 lip is finite for all lal ~ Np + k, then 1 is C k •

The LP Sobolev inequalities are sharp in the sense that we cannot eliminate
the requirement Np > nip, and we should emphasize that the inequality must
be strict. But there is another sense in which they are flabby: if we are required
to trade in strictly more than nip derivatives, what have we gotten in return for
the excess Np - nip? It turns out that we do get something, and we can make
a precise statement as long as (3 = Np - nip < 1. We get Holder continuity of
order (3:

II(x) - l(y)1 ~ elx - YIII.

Sobolev inequalities 161

The proof of this is not too difficult when p = 2 and n is odd, so N2 = ~

and f3 = & (if n is even then f3 = 1 and the result is false). We will in fact
show

I/(x) - l(y)1
Ix - yjI/2
~c "
L.-
I (~)a III .
ax
lalsN2 2

We simply write out the Fourier inversion formula

and estimate

using the Cauchy-Schwartz inequality. Since the first integral is finite (domi-
nated by

as before), it suffices to show that the second integal is less than a multiple of
Ix - yl· To do this we use

(this is actually a terrible estimate for small values of I~I, but it turns out not to
matter). Then to estimate
162 Sobolev Theory and Microloeal Analysis

we break the integral into two pieces at I~I = Ix - yl-'. If I~I ~ Ix - yl-' we
dominate leix .€ - e- Y·€1 2 by 4, and obtain

1 le-ix.€ - e-y·€121~I-n-' d~
1€1:::::lx-yl-l

~ 41 1€1:::::lx-yl-l
I~I-n-' d~ = elx - yl
after a change of variable ~ --+ Ix - YI-'~ (the integral converges because
-n - 1 < -n). On the other hand, if I~I ~ Ix - yl-' we use the mean-value
theorem to estimate

hence

1 le-ix.€ - e-iY·€121~I-n-' d~
1€1::;lx-yl-l

~ elx - yl21 I~rn+' d~ = elx _ Yl2

1€1::;lx-yl-l
after the same change of variable (the integral converges this time because -n +
1 > -n). Adding the two estimates we obtain the desired HOlder continuity of
order 1/2.
If you try the same game in even dimensions you will get struck with a
divergent integral for I~I ~ Ix - yl-'. What is true in that case is the Zygmund
class estimate
If(x + 2y) - 2f(x + y) + f(x)1 ~ elyl
which is somewhat weaker than the Lipschitz condition If(x+y) - f(x)1 ~ elyl
which does not hold.

8.2 Sobolev spaces

Because of the importance of the Sobolev inequalities, it is convenient to con-
sider the collection of functions satisfying the hypotheses of such theorems as
forming a space of functions, appropriately known as Sobolev spaces. Athough
the definition of these spaces has become standard, there seems to be a lack of
agreement on how to denote them. Here I will write L~ to denote functions with
derivatives of order up to k having finite LP norm. The case p = 2 is especially
simple, and it is customary to write L~ = Hk; I don't know why. The trouble
is that H is an overworked letter in mathematical nomenclature, and the sym-
bol Hk could be easily interpreted as meaning something completely different
Sobolev spaces 163

(homology, Hardy space, hyperbolic space, for example, all with a good excuse
for the H). The L in our notation stands for Lebesgue. Other common notation
for Sobolev spaces includes Wp,k, and the placement of the two indices p and
k is subject to numerous changes.

Definition 8.2.1
The Sobolev space L~ (Jm.n) is defined to be the space of functions on Jm.n such
that

is finite for all lal ::; k. Here k is a nonnegative integer and 1 ::; p < 00. The
Sobolev space norm is defined by

When k = 0 we write simply LP.

Of course, when we call it a Sobolev space norm we imply that it satisfies

the three conditions of a norm described in section 8.1. The Sobolev spaces
are examples of Banach spaces, which means they are complete with respect
to their norms. When p = 2, L~ is in fact a Hilbert space (this explains the
positioning of the pth power in the definition). If you are not familar with these
concepts, don't worry; I won't be using them in what follows. On the other
hand, they are important concepts, and these are good examples to keep in mind
if you decide to study functional analysis, as the study of such spaces is called.
The Sobolev inequality theorems can now be stated succinctly as containment
e e
relationships between Sobolev spaces and k spaces (the space of k functions).
They say L~ (Jm.n ) ~ em (Jm.n ) provided
1. p = 1 and k - n = m, or
2. p > 1 and k - nip> m.
For this reason they are sometimes called the Sobolev embedding theorems.
There is another kind of Sobolev embedding theorem which tells you what
happens when k < nip, when you do not have enough derivatives to trade in
to get continuity. Instead, what you get is a boost in the value of p.

Theorem 8.2.2
LHJm.n ) ~ L~(Jm.n) provided 1 ::; p < q < 00 and k - nip ~ m - nlq. We
have the coo responding inequality II III L~ ::; ell III q;-
The theorem is most interesting in the case when we have equality k - nip =
m - nlq, in which case it is sharp.
164 Sobolev Theory and Microlocal Analysis

The L2 Sobolev spaces are easily characterized in tenns of Fourier tranfonns,

and this point of view allows us to extend the definition to allow the parameter
k to assume noninteger values, and even negative values. We have already used
this point of view in the proof of the L2 Sobolev inequalities. The idea is that
since

((:x) a/) A (0 = (-i~)ai(~)

we have

II (:x) a /112 = (27r)n/211~a i(OI/2

by the Plancherel formula; hence

II/liLt = (27r)n/2 (I (L l~aI2) li(~W d~)

lal9
'/2

Now the function Lla191~a12 is a bit complicated, so it is customary to replace

it by (1 + 1~12)k. Note that these two functions are of comparable size: there
exist constants c, and C2 such that

c, (1 + 1~12)k ~ L ICI 2 ~ c2(1 + 1~12)k.

lal9
This means that / E L~ if and only if J(1 + 1~12)kli(~Wd~ is finite. Also the
square root of this integral is comparable in size to Ilfll L2.
k
We say it gives an
equivalent norm.
There is no necessity for k to be an integer in this condition. If 8 ~ 0 is real
we may consider those functions in L2 such that J(1 + 1~12)s li(~W d~ is finite
to define the Sobolev space L;. Notice that since 1 ~ (1 + 1~12)s the finiteness of
this integral implies J li(~W d~ < 00 which implies / E L2 by the Plancherel
theorem. Thus there is no loss of generality in taking / E L2 in the first place.
If we wish to consider 8 < 0, however, this is not the case. The Sobolev space
L; will then be a space of distributions, not necessarily functions. We say that
i
a tempered distribution / is in L; for 8 < 0 if corresponds to a function and
J(1 + 1~12)Sli(~W d~ is finite. The spaces L; decrease in size as 8 increases.
Since (1 + 1~12)sl ~ (1 + 1~12)S2 if 8, ~ 82, it follows that L;2 ~ L;I (this is
true regardless of the sign of 8, or 82). Functions in L;
as 8 increases become
increasingly smooth, by the L2 Sobolev inequalities. Similarly, as 8 decreases,
the distributions in L; may become increasingly rough.
At least locally, the Sobolev spaces give a scale of smoothness-roughness that
describes every distribution. More precisely, if / is a distribution of compact
support, there must be some 8 for which / E L;. We write this as

s
Sobolev spaces 165

At the other extreme, if f E L; for every s, then f is Coo, or

The meaning of the Sobolev spaces L;

for noninteger s can be given directly
in terms of certain integrated Holder conditions. For simplicity suppose 0 <
s < 1. A function f E L2 belongs to L; if and only if the integral

r iRnr lJ(x + y) -
iRn f(x)j2 1
Y
1~~2S dx
in finite. The idea is that this integral is equal to a multiple of

I li(~WI~12s d~
and it is not hard to see that J li(~)j2(I + 1~12)s d~ is finite if and only if
J li(~)12 d~ and J li(~)j21~12s d~ are finite.
We begin with the observation

Fx(f(x + y) - f(x))(~) = i(O(e- Y·€ - 1)

and so by the Plancherel formula

Thus, after interchanging the orders of integration, we have

II If(x + y) - f(x)j2 IYI~~2S dx

= (27r)-n I (I
li(012 le- Y·€ - 112~)
lyln+2s
dx.

To complete the argument we have to show

Il e-Y·€ _112~ = cl~12s

lyln+2s '
which follows from the following three facts:
1. the integral is finite,
2. it is homogeneous of degree 2s, and
3. it is radial,
because any finite function satisfying 2 and 3 must be a multiple of 1~12s. The
finiteness of the integral follows by analyzing separately the behavior of the
integand near 0 and 00. Near 00, just use the boundedness of le- Y ·€ - 112 and
s > O. Near 0, use le- Y·€ - 112 :::; clyl2 by the mean value theorem, and s < 1.
166 Sobolev Theory and Microlocal Analysis

The homogeneity follows by the change of variable y ---+ I~I-Iy and the fact
that the integal is radial (rotation invariant) follows from the observation that
the same rotation applied to y and ~ leaves the dot product y . ~ unchanged.
For s > 1 we write s = k + s' where k is an integer and 0 < s' < 1. Then
functions in L; are functions in L~ whose derivatives of order k are in L;" and
then we can use the previous characterization.
An interpretation of L; for s negative can be given in terms of duality, in the
same sense that V' is the dual of V. If say s > 0, then L:'s is exactly the space
of continuous linear functionals on L;. If / E L:'s and 'P E L;, the value of
the linear functional (f, 'P) can be expressed on the Fourier transform side by

at least formally by the Plancherel formula (the appearance of the complex

conjugate is slightly different than the convention we adapted for distributions,
but it is not really significant). Of course the Plancherel formula is not really
valid since / is not in L2, but we can use the Fourier transform integral as
a definition of (f, 'P). The crucial point is that / E L:'s and 'P E L; exactly
implies that the integral exists and is finite. This follows by the Cauchy-Schwartz
inequality, if we write

Then we have

1/ j(Ocp(~) d~1 ~ (/ Ij(~W(l + 1~12)-s d~) 1/2

(/ Icp(~W(l + 1~12)s d~) 1/2 ~ eli/ilL:'. II'PIIL;.

This inequality also proves the continuity of the linear functional.

8.3 Elliptic partial differential equations (constant coefficients)

The Laplacian

82
... +8-
2
~=-+
8XI 8x~

on lffi.n is an example of a partial differential operator that belongs to a class of

operators called elliptic. A number of remarkable properties of the Laplacian
Elliptic partial differential equations (constant coefficients) 167

are shared by other elliptic operators. To explain what is going on we restrict

attention to constant coefficient operators

p= L aa (:x)a
lalsm
where the coefficients aa are constant (we allow them to be complex). The
number m, the highest order of the derivatives involved, is called the order of
the operator. As we have seen, the Fourier transform of Pu is obtained from u
by multiplication by a polynomial,

(Pu) A (~) = p(~)u(~)

where
p(O = L aa( _i~)a.
lalsm
This polynomial is called the full symbol of P, while

(Pm)(~) = L aa(-i~)a
lal=m
is called the top-order symbol. (To make matters confusing, the term symbol
is sometimes used for one or the other of these; I will resist this temptation).
For example, the Laplacian has _1~12 as both full and top-order symbol. The
operator I - ~ has full symbol 1 + 1~12 and top-order symbol 1~12. Notice that
the top-order symbol is always homogeneous of degree m.
We have already observed that the nonvanishing of the full symbol is ex-
tremely useful, for then we can solve the equation Pu = f by taking Fourier
transforms and dividing,
A 1 A

p(~)u(~) = f(~) so u(O = p(~/(~).

If the full symbol vanishes, this creates problems with the division. However,
it turns out that the problems caused by zeroes for small ~ are much more
tractable than the problems caused by zeroes for large~. We will define the
class of elliptic operators so that p(O has zeroes only for small ~.

Definition 8.3.1
An operator of order m is called elliptic if the top-order symbol Pm (~) has no
real zeroes except ~ = O. Equivalently, if the full symbol satisfies Ip( 0 I ~ cl~lm
for I~I ~ A, for some positive constants c and A.

The equivalence is not immediately apparent, but is not difficult to establish.

If Pm(~) =1= 0 for ~ =1= 0, then by homogeneity IPm(~)1 ~ cll~lm where CI is
168 Sobolev Theory and Microlocal Analysis

the minimum of Ip(~)1 on the sphere I~I = 1 (since the sphere is compact, the
minimum value is assumed, so CI > 0). Since p(~) = Pm(~) + q(~) where q
is a polynomial of degree:::; n - 1, we have Iq(~)1 :::; c21~lm-1 for I~I ;::: 1, so
Ip(OI ~ IPm(~)1 - Iq(~)1 ~ tcd~1 if I~I ~ A for A = tC2/CI. Conversely, if
Ip(~)1 ~ cl~1 for I~I ~ A then a similar argument shows IPm(~)1 ;::: tcl~1 for
I~I ;::: A', and by homogeneity Pm(~) =f:. 0 for ~ =f:. o.
From the first definition, it is clear that ellipticity depends only on the highest
order derivatives; for an mth order operator, the terms of order:::; m - 1 can be
modified at will. We will now give examples of elliptic operators in terms of
the highest order terms. For m = 2, P = Llal=2 aa (8/ 8x) a with aa real will
be elliptic if the quadratic form Llal=2 aa~a is positive (or negative) definite.
For n = 2, this means the level sets Llal=2 aa~a = constant are ellipses,
hence the etymology of the term "elliptic." For m = 1, the Cauchy-Riemann
operator

8 1(88x +z.8)
8z = 2" 8y

in Jm.2 = <C is elliptic, because the symbol is -;i (~ + i'TJ). It is not hard
to see that for m = 1 there are no examples with real coefficients or for
n ~ 3. Thus both harmonic and holomorphic functions are solutions to ho-
mogeneous elliptic equations Pu = O. For our last example, we claim that
when n = 1 every operator is elliptic, because Pm(~) = am(-i~)m vanishes
only at ~ = O.
For elliptic operators, we would like to implement the division on the Fourier
tansform side algorithm for solving Pu = f. We have no problem with division
by p(O for large ~, but we still have possibly small problems if p(~) has zeroes
for small f Rather than deal with this problem head on, we resort to a strategy
that might be cynically described as "defining the problem away." We pose an
easier problem that can be easily solved by the technique at hand. Instead of
seeking a fundamental solution, which is a distribution E satisfying P E = 8
(so E * f solves Pu = f, at least for f having compact support, or some other
condition that makes the convolution well defined), we seek a parametrix (the
correct pronunciation of this almost unpronouncable word puts the accent on
the first syllable) which is defined to be a distribution F which solves PF ~ 8,
for the appropriate meaning of approximate equality. What should this be?
Let us write P F = 8 + R. Then the remainder R should be small in some
sense, you might think. But this is not quite the right idea. Instead of "small," we
want the remainder to be "smooth." The reason is that we are mainly interested
in the singularities of solutions, and convolution with a smooth function produces
a smooth function, hence it does not contribute at all to the singularities. This
point of view is one the key ideas in microlocal analysis and might be described
as the doctrine of micro local myopia: pay attention to the singularities, and
other issues will take care of themselves.
Elliptic partial differential equations (constant coefficients) 169

So we define a distribution F to be a parametrix for P if P F = 0 + R where

R is a C= function. It is easy to produce a parametrix if P is elliptic. Simply
define F via it Fourier transform:
F(~) _ { l/p(~) if I~I ;::: A
- 0 if I~I < A.
Clearly F is a tempered distribution since II/p(OI :::; cl~l-m for I~I ~ A. Also
(PF) A = x(I~1 ~ A) so R = (PF) A_I = -x(I~1 :::; A) has compact
support, hence R is C=.
So what good is this parametrix? Well, for one thing, it gives local solutions
to the equation Pu = f. We cannot just set u = F * f, for then P(F * f) =
P F * f = f + R * f. So instead we try u = F * g, where g is closely related to
f. Since P(F * g) = g + R * g, we need to solve g + R * g = f, or rather the
local version g + R * g = f on U, for some small open set U. This problem
does not involve the differential operator P anymore, only the integral operator
of convolution with R. Here is how we solve it:
To localize to U, we choose a cut-off function 'Ij; E V which is one on a
neighborhood of U and is supported in a slightly larger open set V. If we can
=
find g such that g + 'lj;R *g = 'Ij; f on Jm.n , then on U (where 'Ij; 1) this equation
is g + R * g = f as desired. We also need a second, larger cut-off function
cp E V which is one on a neighborhood of V, so that cp'lj; = 'Ij;. If g has support
in V, then 'lj;g = g so we can write our equation as

Any solution of this equation with support in V will give u = F * g as a solution

of Pu = f on U.
Now let R stand for the integral operator Rg = 'lj;R* (cpg). Specifically, this
is an integral operator

Rg(x) = J 'Ij;(x)R(x - y)cp(y)g(y) dy

with kernel 'Ij;(x)R(x - y)cp(y), and Rg always has support in V since 'Ij;(x) has
support in V. We are temped to solve the equation g + Rg = 'Ij; f by the pertur-
bation series g = L,~o(-I)k(R)k('Ij;f). In fact, if we take the neighborhoods
U and V sufficiently small, this series converges to a solution (and the solution
is supported in V). The reason for this is that the kernel 'Ij;(x )R(x - y )cp(y)
actually does become small. The exact condition we need is

JJ1'Ij;(x)R(x - y)cp(Y)12 dx dy < 1

and so the size of the sets U and V depends only on the size of R.
In summary, the parametrix leads to local solutions because the smooth re-
minder term, when localized, becomes small. Of course this argument did not
really use the smoothness of R; it would be enough to have R locally bounded.
170 Sobolev Theory and Microloeal Analysis

For a more subtle application of the parametrix we return to the question of lo-
cation of singularities. If Pu = f and f is singular on a closed set K, what can
we say about the singularities of u? More precisely, we say that a distribution
f is Coo on an open set U if there exists a Coo function F on U such that
(f, cp) = J F(x)cp(x) dx for all test functions supported in U, and we define the
singular support of f (written sing supp (f) to be the complement of the union
of all open sets on which f is Coo. (Notice the analogy with the definition of
"support," which is the complement of the union of all open sets on which f
vanishes.) By definition, the singular support is always a closed set. When we
ask about the location of the singularities of a distribution, we mean: what is
its singular support?
So we can rephrase our question: what is the relationship between sing supp
Pu and sing supp u? We claim that there is one obvious containment:
sing supp Pu ~ sing supp u.
The reason for this is that if u is Coo on U, then so is Pu, so the comple-
ment of sing supp Pu contains the complement of sing supp u; then taking
complements reverses the containment. However, it is possible that applying
the differential operator P might "erase" some of the singularities of u. For
example, if u(x, y) = hey) for some rough function h, then u is not Coo on
any open set, so sing supp u is the whole plane. But (a/ax)u = 0 so sing
supp (a/ax)u is empty. This example is made possible because a/ax is not
elliptic in Jm.2 . For elliptic operators we have the identity sing supp Pu = sing
supp u. This property is called hypoellipticity (a very unimaginative term that
means "weaker than elliptic"). It says exactly that every solution to Pu = f
wiJI be smooth whenever f is smooth (f is Coo on U implies u is Coo on
U). In particular, if f is Coo everywhere then u is Coo everywhere. Note that
this implies that harmonic and holomorphic functions are Coo. Even more, it
says that if a distribution satisfies the Cauchy-Riemann equations in the distribu-
tion sense, then it corresponds to a holomorphic function. This can be thought
of as a generalization of the classical theorem that a function that is complex
differentiable at every point is continuously differentiable.
To prove hypoellipticity we have to show that if Pu = f is Coo on an open
set U, then so is u. Now we wiJI immediately localize by multiplying u by
cp E V which is supported in U and identically one on a slightly smaller open
set V. Observe that cpu = u on V so P(cpu) = 9 with 9 = f on V. So 9 is
Coo on V, and we would like to conclude cpu is Coo on V, which implies u is
Coo on V (because u = cpu on V). Since V can be varied in this argument, it
follows that u is Coo on all of U. The point of the localization is that cpu and
9 have compact support.
Reverting to our old notation, it suffices to show that if u and f have compact
support and Pu = f, then f being Coo on an open set U implies u is Coo on
U. Since u and f have compact support, we can convolve with the parametrix
F to obtain F * Pu = F * f. Since u has compact support, we have F * Pu =
Elliptic partial differential equations (constant coefficients) 171

P(F * u) = PF * u (if u did not have compact support F *u might not be

defined). Now we can substitute P F = 0 + R to obtain

(this is an approximate inverse equation in the reverse order). Since R * u is

Coo everywhere, the singularities of u and F * f are the same. To complete
the proof of hypoellipticity we need to show that convolution with F preserves
singularities: if f is Coo on U then F is Coo on U. In terms of singular
supports, this will follow if we can show that sing supp F is the origin, i.e.,
that F coincides with a Coo function away from the origin. Already in this
argument we have used the smoothness of the remainder R, and now we need
an additional property of the parametrix itself.
Now we have an explicit formula for ft, and this yields a formula for F by
the Fourier inversion formula

F(x) = _1_ { e-ix·ep(O-1 df

(21r)n Jlel~A
The trouble with this formula is that we only know Ip(~)-ll ::; cl~l-m, and
this in general leads to a divergent integral. Of course this is the way it would
have to be because F(x) is usually singular at the origin, and we have not done
anything yet to rule out setting x = O. The trick is to multiply by Ixl 2N for a
large value of N (this will tend to mask singularities at the origin). We then
have essentially

IxI2NF(x) = ~ { e-ix·e(_~e)N(p(O-I)d~
(211") Jlel~A
(we are ignoring the boundary terms at I~I = A which are all Coo functions).
Now from the fact that P is elliptic we can conclude that

(for Pm in place of p this follows by homogeneity, each derivative reducing the

homogeneity degree by one, and the lower order terms are too small to spoil
such an estimate). This decay is fast enough to make the integral converge for
m + 2N > n, and if m + 2N > n + k it implies that Ixl 2N F(x) is Ck. This,
of course, tells us that F(x) is C k away from the origin (but not at x = 0), and
by taking N large we can take k as large as we like. This shows sing supp F
is contained in the origin; hence it completes the proof of hypoellipticity.
We can think of hypoellipticity as a kind of qualitative property, since it
involves an infinite number of derivatives. There are related quantitative prop-
erties, involving finite numbers of derivatives measured in terms of Sobolev
spaces. These properties are valid for LP Sobolev spaces for all p with 1 < p <
00, but we will discuss only the case p = 2. The idea is that the elliptic oper-
ators are honestly of order m in all directions, so there can be no cancellation
172 Sobolev Theory and Microlocal Analysis

among derivatives. Since applying an operator of order mioses m derivatives,

undoing it should gain m derivatives. This idea is actually false for ordinary
derivatives, but it does hold for Sobolev space derivatives.
The properties we are going to describe go under the unlikely name of a
priori estimates (this Latin mouthful is pronounced ay pree oree).

Theorem 8.3.2
For P an elliptic operator of order m, if u E L2 and Pu E L~ then u E L~+m
with

This is true for any k ~ 0 (including non integer values).

To prove this a priori estimate we work entirely on the Fourier transform side.
We are justified in taking Fourier transforms because u E L2. Now to show
u E L~+m we need to show that J lu(~W(1 + 1~12)k+m d~ is finite. We break
the integral into two parts, I~I ~ A and I~I ~ A. For I~I ~ A we use the fact
that u E L2 and the bound (1 + 1~12)k+m ~ (1 + A2)k+m to estimate

r
JIt;I~A
lu(~W(1 + 1~12)k+m d~

~ (1 + A2)k+m r
JIt;I~A
lu(~)12 d~

~ cllulli2.
For I~I ~ A we use Pu E L~ and Ip(OI ~ cl~lm to estimate

r
J1t;I?A
lu(~W(1 + 1~12)k+m d~ ~ c r
J1t;I?A
Ip(Ou(~W(1 + 1~12)k d~

r IPu (~W(1 + 1~12)k d~ =

~ c JRn A cliPulli2 .
k

Together, these two estimates prove that u E L~+m and give the desired a priori
estimate.
The hypothesis u E L2 perhaps seems unnatural, but the theorem is clearly
false without it. There are plenty of global harmonic functions, ~u = 0 so
~u E L~ for all k, but none of them are even in L2. You should think of
u E L2 as a kind of minimal smoothness hypothesis (it could be weakened to
just u E L=-N for some N); once you have this minimal smoothness, the exact
Sobolev space L~+m for u is given by the exact Sobolev space L~ for Pu, and
we get the gain of m derivatives as predicted.
The a priori estimates can also be localized, but the story is more complicated.
Suppose u E L2, Pu = f, and f is in L~ on an open set U (just restrict all
integrals to U). Then u is in L~+m on a smaller open set V. The proof of
Elliptic partial differential equations (constant coefficients) 173

this result is not easy, however, because when we try to localize by multiplying
u by cp, we lose control of P(cpu). On the set where cp is identically one we
have P( cpu) = Pu = f, but in general the product rule for derivatives produces
many terms, P( cpu) = cpPu + terms involving lower order derivatives of u.
The idea of the proof is that the lower order terms are controlled by Pu also,
but we cannot give the details of this argument.
We have seen that the parametrix, and the ideas associated with it, yield a
lot of interesting information. However, I will now show that it is possible to
construct a fundamental solution after all. We wiII solve Pu = f for f E V as
u = F- 1(p-I j) with the appropriate modifications to take care of the zeroes
of p. It can be shown that the solution is of the form E * f for a distribution E
(not necessarily tempered) satisfying PE = 8, and then that u = E * f solves
Pu = f for any f E £'. Thus we really are constructing a fundamental solution.
We recall that the Paley-Wiener theorem tells us that j is actually an analytic
function. We use this observation to shift the integral in the Fourier inversion
formula into the complex domain to get around the zeroes of p. We write
e = (6,···, ~n) so ~ = (~I' e)· Then we wiII write the Fourier inversion
formula as

and we do all the modifications on the inner, one-dimensional integral. We first

fix e. Then p((, e)is a polynomial of degree m in the single complex variable
(. It has at most m distinct complex zeroes, and if, is a path in the complex
plane that does not contain any of these zeroes, and that coincides with the real
axis outside the interval [-A, AJ, such as

-B -A A B

FIGURE 8.1

then the integral

174 Sobolev Theory and Microlocal Analysis

converges absolutely. Indeed, for the portion of the integral that coincides with
the real axis we may use the estimate Ip(O-11 ~ cl~l-m and the rapid decay
of j(~) to estimate

for any N. For the remainder of the contour (the semicircle) we encounter no
zeroes of p, so the integrand is bounded and the path is finite.
The particular contour we choose will depend on e, so we denote it by ,Ce).
The formula for u is thus

When lei ~ A we choose ICe) to be just the real axis, while for Ifl ~ A
we choose ,Ce) among m + 1 contours as described above with B = A, A +
1, ... , A + m. Since the semicircles are all of distance at least one apart, at least
!
one of them must be at a distance at least from all the m zeroes of pC (, e).
We choose that contour (or the one with the smallest semi-circle if there is a
choice). This implies not only that pC (, e) is not zero on the contours chosen,
but there is a universal bound for IpC(, e)-II over all the semicircular arcs, and
there is a universal bound for the lengths of these arcs (they are chosen from a
finite number) and for the terms fC('e) and e- ix1 ( along these arcs. Thus the
integral defining uCx) converges and may be differentiated with respect to the x
variables. When we do this differentiation to compute Pu, we produce exactly
a factor of pC (,e). which cancels the same factor in the denominator:

But because jC(,~')e-ixl€ is an analytic function in (we use Cauchy's theorem

to replace the contour ,Ce) by the real axis

and hence we have

which is fCx) by the Fourier inversion formula.

Pseudodifferential operators 175

Observe that we were able to shift the contour ,(e) only after we had applied
P to eliminate p in the denominator. For the original integral defining u, the
zeroes of p prevent you from shifting contours at will. Because we move into
the complex domain, the fundamental solution E constructed may not be given
by a tempered distribution. More elaborate arguments show that fundamental
solutions can be constructed that are in S', in fact for any constant coefficient
operator (not necessarily elliptic). Although we have constructed a fundamen-
tal solution explicitly, the result is too complicated to have any algorithmic
significance, except in very special cases.

8.4 PseudoditTerential operators

Our discussion of differential operators, up to this point, has been restricted to
constant coefficient operators. For elliptic operators, everything we have done
remains true for variable coefficient operators, if properly interpreted. However,
the proofs are not as easy, so in this section we will only be able to give
the broad outlines of the arguments. As a reward for venturing beyond the
comfortable confines of the constant coefficient case, we will get a peek at the
theory of pseudodifferential operators, which is one of the glorious achievements
of mathematical analysis in the last quarter century.
A variable coefficient linear partial differential operator of order m has the
form

Pu(x) = L a,,(x) (:x)" u(x)

l"l~m

with the coefficients a,,(x) being functions. We will assume a,,(x) are Coo
functions. (There is life outside the Coo category, but it is considerably harder.)
One very naive way of thinking of such an operator is to "freeze the coefficients."
We fix a point x and evaluate the functions a,,(x) at x to obtain the constant
coefficient operator

L
l"l~m
a,,(x) (:x)"
As x varies we obtain a family of constant coefficient operators, and we are
tempted to think that the behavior of the variable coefficient operator should
be an amalgam of the behaviors of the constant coefficient operators. This
kind of wishful thinking is very misleading; not only does it lead to incorrect
conjectures, but it makes us overlook some entirely new phenomena that only
show up in the variable coefficient setting. However, for elliptic operators this
approach works very well.
176 Sobolev Theory and Microlocal Analysis

We define the full symbol and top-order symbol by

p(x,O = L aa(x)( _i~)a

lal:Sm
and
Pm(x,O = L aa(x)(-ioa.
lal=m
Now these are functions of 2n variables, which are polynomials of degree m
in the ~ variables and Coo functions in both x and ~ variables. The operator
is called elliptic if Pm(x,~) does not vanish if ~ =I 0, and uniformly elliptic if
there exists a positive constant such that

(such an estimate always holds for fixed x, and in fact for x in any compact
set).
Suppose we pursue the frozen coefficient paradigm and attempt to construct
a parametrix. In the variable coefficient case we cannot expect convolution
operators, so we will define a parametrix to be an operator F such that

PFu = u+Ru

where R is an integral operator with Coo kernel R(x, y):

Ru(x) = J R(x, y)u(y) dy.

Since we are in a noncommutative setting we will also want

FPu = u+R1u

where RI is an operator of the same type as R. We could try to take for F the
parametrix for the frozen coefficient operator at each point:

Fu(x) = -1-1 u(~)

(21r)n 1€Ie::A(x) p(x,~)
e-ix·e d~

where A(x) is chosen large enough that p(x, 0 has no zeroes in I~I ~ A(x). It
turns out that this guess is not too far off the mark.
The formula we have guessed for a parametrix is essentially an example of
what is called a pseudodifferential operator (abbreviated 'ljJDO). By definition,
a 'ljJDO is an operator of the form

(2~)n J (T(x, ~)u(Oe-ix.€ d~

where a(x, 0, the symbol, belongs to a suitable symbol class. There are actually
many different symbol classes in use; we will describe one that is known as the
Pseudodifferential operators 177

classical symbols. (I will not attempt to justify the use of the term "classical" in
mathematics; in current usage it seems to mean anything more than five years
old.) A classical symbol of order r (any real number) is a C= function O"(x, 0
that has an asymptotic expansion

O"(x, 0 '" O"r(X,~) + O"r-l (x,~) + O"r-2(X,~) + ...

where each O"r-k(X,O is a homogeneous function of degree r - k in the ~
variables away from the origin. Except for polynomials, it is impossible for
a function to be both C= and homogeneous near the origin; thus we require
O"r-k(X,tO = tr-kO"r_k(X,~) only for I~I ~ I and t ~ I. The meaning
of the asymptotic expansion is that if we take a finite number of terms, they
approximate O"(x,~) in the following sense:
m

100(x,O - LO"r-k(X,~)1 ~ cl~lr-m-1 for I~I ~ I.

k=O

In other words, the difference decays at infinity in ~ as fast as the next order
in the asymptotic expansion. (There is also a related estimate for derivatives,
with the rate of decay increasing with each ~ derivative, but unchanged by x
derivatives.)
The simplest example of a classical symbol is a polynomial in ~ of degree r
(r a postive integer), in which case 0" r-k (x,~) is just the homogeneous terms
of degree r - k. The asymptotic sum is just a finite sum in this case, and we
have equality

O"(x,O = O"r(X, 0 + O"r-l (x,~) + ... + O"o(x, ~).

The associated "pDO is just the differential operator with full symbol 0": if
O"(x,~) = Llal:=;raa(x)(-i~)a then

1
(27r)n J .
O"(x,~)u(~)e-tx.€ d~ = L aa(X) (a )a
ax u(x).
lal:=;r
Thus the class of pseudodifferential operators is a natural generalization of the
differential operators.
The theory of pseudodifferential operators embraces the doctrine of microlocal
myopia in the following way: two operators are considered equivalent if they
differ by an integral operator with C= kernel (J R(x, y) u(y) dy for R a C=
function). Such an operator produces a C= output regardless of the input (u
may be a distribution) and is considered trivial. The portion of the symbol
corresponding to small values of ~ produces such a trivial operator. For this
reason, in describing a "pDO one frequently just specifies the symbol for I~I ~ I.
(In what follows we will ignore behavior for I~I ~ I, so when we say a function
is "homogeneous," this means just for I~I ~ 1.) Also, from this point of view,
178 Sobolev Theory and Microlocal Analysis

a parametrix is a two-sided inverse: P F and F P are equivalent to the identity

operator.
We now describe, without proof, the basic properties of "pDQ's and their
symbols:
1. (Closure under composition) If P and Q are "pDQ's of order rand s then
PQ is a "pDQ of order r + s. (Some technical assumptions are needed to
ensure that the composition exists.) If p(x,~) and q(x,~) are the symbols
of P and Q, then the symbol of PQ has the asymptotic expansion

L a!1(8)'"
8x p(x,~) (8)'"
-i 8~ q(x, ~).
'"
Note that (8/ 8x)'" p(x, 0 is a symbol of order r, and

(-i :~) '" q(x,~)

is a symbol of order s - lal, and their product has order r + s - lal.
In particular, the top-order parts of the symbol multiply: Pr(X, ~)q8(X,~)
is the r + s order term in the symbol of PQ. Since multiplication is
commutative, the commutator [P, QJ = PQ - QP has order r + s - 1.
This is a familiar fact for differential operators.
2. (Symbolic completeness) Given any formal sum of symbols L~o Pr-k
(x,O where Pr-k(X, 0 is homogeneous of order r - k, there exists a
symbol p(x, 0 of order r that has this asymptotic expansion. This property
explains the meaning of the symbolic product for compositions.
3. (Pseudolocal property) If u is Coo in an open set, then so is Pu. Thus
sing supp Pu ~ sing supp u. This property is obvious for differential
operators and was one of the key properties of the parametrix constructed
in section 8.3. Another way of saying this is that a "pDQ is represented
as an integral operator with a kernel that is Coo away from the diagonal.
At least formally,

Pu(x) = (2~)n / a(x, 0 (/ eiy·eu(y) dY ) e-ix'{d~

= / (_1_
(21r)n
/a(x
'
~)e-i(x-y)·e d~) u(y)dy
is an integral operator with kernel

_1_ / a(x ~)e-i(x-y).{ d~

(21r)n' .
The interchange of the orders of integration is not valid in general (it is
valid if the order of the operator r < -n). However, away from the
diagonal, x - y =1= 0 and so the exponential e-i(x-y).{ provides enough
Pseudodifferential operators 179

cancellation that the kernel actually exists and is C=, if the integral is
suitably interpreted (as the Fourier transform of a tempered distribution,
for example). As x approaches y, the kernel may blow up, although it
can be interpreted as a distribution on JR2n .
4. (lnvariance under change of variable) If h : JRn -4 JRn is C= with a
C= inverse h- l , then (P(u 0 h- I )) 0 h is also a '1jJDO of the same
order whose symbol is (T(h(x), (h'(x)tr)-IO + lower order terms. This
shows that the class of pseudodifferential operators does not depend on
the coordinate system, and it leads to a theory of '1jJDO's on manifolds
where the top-order symbol is interpreted as a section of the cotangent
bundle.
5. (Closure under adjoint) The adjoint of a '1jJDO is a '1jJDO of the same
order whose symbol is (T(x, ~)+ lower order terms.
6. (Sobolev space estimates) A '1jJDO of order r takes L2 Sobolev spaces L;
into L;_r' at least locally. If u E L;and u has compact support, then
'1jJPu EL;_r if'1jJ E V. With additional hypotheses on the behavior of
the symbol in the x-variable, we have the global estimate

Using these properties of '1jJDO's, we can now easily construct a parametrix

for an elliptic partial differential operator of order m, which is a '1jJDO of order
-m. If
p(x,~) = Pm(x,~) + Pm-l (x,~) + ... + Po(x, 0
is the symbol of the given differential operator P, write

for the symbol of the desired parametrix Q. We want the composition QP to be

equivalent to the identity operator, so its symbol should be 1. But the symbol
of QP is given by the asymptotic formula

L 1(8)'" q(x,~) (8)'"

a! 8x 8~ -i p(x, ~).
'"
Note that this is actually a finite sum, since

(-i :~) '" p(x,O = 0

if lal > m because p(x,~) is a polynomial of degree m in ~. Substituting the

asymptotic expansion for q and the sum for P yields the asymptotic formula

L L?:
= 1(8)'" q-m-k(X,~) (-i 8~8)'" Pm-j(X,~)
a! 8x
m

l"'l~m k=O )=0

180 Sobolev Theory and Microloeal Analysis

where each term is homogeneous of degree -k - j -Ietl. Thus there are only a
finite number of terms of any fixed degree of homogeneity, the largest of which
being zero, for which there is just the one term q-m(X,~)Pm(X,~). Since we
want the symbol to be 1, which has homogeneity zero, we set q-m(x,~) =
I/Pm(x,O, which is well defined because P is elliptic, and has the correct
homogeneity -m. All the other terms involving q-m have lower order homo-
geneity. We next choose q-m-I to kill the sum of the terms of homogeneity
-1. These terms are just

q-m-IPm + q-mPm-1 + L
lal=1
(!) a q-m ( -i :~) a Pm,

and we can set this equal to zero and solve for q-m-I (with the correct homo-
geneity) again because we can divide by Pm. Continuing this process induc-
tively, we can solve for q-m-k by setting the sum of terms of order -k equal
to zero, since this sum contains q-m-kPm plus terms already determined.
This process gives us an asymptotic expansion of q(x, ~). By the symbolic
completeness of 'ljJDO symbols, this implies that there actually is a symbol
with this expansion and a corresponding 'ljJDO Q. By our construction QP has
symbol 1, hence it is equivalent to the identity as desired. A similar computation
shows that PQ has symbol 1, so Q is our desired parametrix.
Once we have the existence of the parametrix, we may deduce the equivalents
of the properties of constant coefficient elliptic operators given in section 8.3 by
essentially the same reasoning. The local existence follows exactly as before,
because once we localize to a sufficiently small neighbohood, the remainder term
becomes small. The hypoeIlipticity follows from the pseudo-local property of
the parametrix. Local a priori estimates follow from the Sobolev space estimates
for 'ljJDO's (global a priori estimates require additional global assumptions on the
coefficients). However, there is no analog of the fundamental solution. Elliptic
equations are subject to the "index" phenomenon: existence and uniqueness in
expected situations may fail by finite-dimensional spaces. We will illustrate this
shortly.
Most situations in which elliptic equations arise involve either closed mani-
folds (such as spheres, tori, etc.) or bounded regions with boundary. If n is a
bounded open set in Jm.n with a regular boundary r, a typical problem would
be to solve Pu = f on n with certain boundary conditions, the values of u
and some of its derivatives on the boundary, as given. Usually, the number of
boundary conditions is half the order of P, but clearly this expectation runs
into difficulties for the Cauchy-Riemann operator. We will not be able to dis-
cuss such boundary value problems here, except to point out that one way to
approach them is to reduce the problem to solving certain pseudodifferential
equations on the boundary.
One example of a boundary value problem in which the index phenomenon
shows up is the Neumann problem for the Laplacian. Here we seek solutions of
Hyperbolic operators 181

~ u = 0 on n with 8u / 8n = g given on r (8/ 8n refers to the normal derivative,

usually in the outward direction). To be specific, suppose n is the disk Ixl < 1 in
]R2 and r the unit circle, so 8/ 8n is just the radial derivative in polar coordinates.
It follows from Green's theorem that J: 7r
g(r, 0) dO = 0 is a necessary and
sufficient condition for the solution to exist, and the solution is not unique since
we can always add a constant. Here we have a one-dimensional obstruction
to both existence and uniqueness, and the index is zero (it is defined to be the
difference of the two dimensions). The reason the index is considered, rather
than the two dimensions separately, is that it is relatively robust. It is unchanged
by lower order terms, and in fact depends only on certain topological properties
of the top-order symbol. The famous Atiyah-Singer Index Theorem explains the
relationship between the analytically defined index and the topological quantities
in the general situation.

8.5 Hyperbolic operators

Another important class of partial differential operators is the class of hyper-
bolic operators. Before we can describe this class we need to introduce the
notion of characteristics. Suppose P = LI<>lsm a",(x) (8/8x)'" is an operator
of order m. We may ask in which directions it is really of order m. For ordi-
nary differential operators am(x) (d/dx)m + ... +ao(x), the obvious condition
am(x) i- 0 is all that we need to be assured that the operator is everywhere of
order m. But in higher dimensions, the answer is more subtle. An operator like
the Laplacian (8 218x 2 ) + (8 218y2) in ]R2 is clearly of order 2 in both the x and
y directions since it involves pure second derivatives in both variables, while
the operator 8 2 / 8x8y is not of order 2 in either of these directions. StilI, it is a
second-order operator, and if we take any slanty direction it will be of second
order. For example, the change of variable x + y = t, x - Y = S, transforms
818x into
8t 8 8s 8 8 8
- -+-
8x 8t
-=-
8x 8s 8t
+8s
-
and 81 8y into
at 8 8s 8 8 8
8y 8t + 8y 8s = 8t - 8s'

hence 8 218x8y into (8 218t2 ) - (8 2 /8s 2 ) which is clearly of order 2 in both

variables.
In general, given an operator P, a point x, and a direction v, we make an
orthogonal change of variable so that v lies along one of the axes, say the x)
axis. If the coefficient of (81 8x)) m (in the new coordinte system) is nonzero
at the point x, then it is reasonable to say that P is of order m at the point x in
182 Sobolev Theory and Microloeal Analysis

the direction v. We use the term nonchacracteristic to describe this situation,

so characteristic means the coefficient of (8/ 8xl) m is zero at x. From this
point of view, the notation 8/ 8x j for partial derivatives is unfortunate. It is
incomplete in the sense that 8/8xl means the rate of change as XI is varied
while X2, ... , Xn are held fixed. It is much better to think in terms of directional
derivatives, say

dvf(x) = lim
h-+O
-hI (f(x + hv) - f(x)).

If VI, •.• , Vn is any linearly independent basis of lm.n (not necessarily orthogonal),
then the chain rule implies that any first-order partial derivative is a linear
combination of dvl , ... , dvn . Thus any linear partial differential operator of
order m can be written as LI"'lsm b",(x)(dv)'" where

(d v )'" = (d v1 )"'1 (dV2 )"'2 ... (d vn )"'n .

At a given point x, the characteristic directions V are those for which b", (x) = 0
if 0: = (m, 0, ... ,0) and VI = v, while the noncharacteristic directions are those
for which b", (x) i=- O. It is easy to see by the chain rule that this definition only
depends on the direction v = VI and not on the other directions V2, . .. , Vn in
the basis.
Still, it is rather awkward to have to recompute the operator in terms of a new
basis for every direction v in order to settle the issue. Fortunately, there is a
much simpler approach. We claim that the characteristic directions are exactly
those for which the top-order symbol vanishes, Pm(x, v) = O. This is obvious if
v = (1,0, ... ,0), becausepm(x,v) = LI"'I=ma",(x)(-iv)'" and for this choice
of v we have (-iv)'" = 0 if any factor V2, ... , Vn appears to a nonzero power.
Thus the single term corresponding to 0: = (m, 0, ... ,0) survives (lower order
terms are not part of the top-order symbol), Pm(X,V) = (-i)ma",(x) and we
are back to the previous definition. More generally, if

L a",(x) (:x) '" = L bfJ(x)(dv)fJ

l"'l=m IfJl=m
for an orthonormal basis VI, ..• , Vn with VI = v then

L a",(x)( -iO'" = L bfJ(x)( -iv· ~)'"

l"'l=m IfJl=m
and so p(x, v) = (-i)mbfJ(x) for {3 = (m, 0, ... ,0) so bfJ(x) i=- 0 if and only
if p(x, v) i=- O.
With this criterion it is easy to compute characteristics. For example, the
definition of an elliptic operator is exactly that there are no characteristic direc-
tions. For the operator 8 2 / 8x8y the x and y axes are the only characteristic
directions. For the wave operator (8 2 / 8tl) - k 2 ~x, the characteristic directions
are of the form (r,~) where r = ±kl~l, and these form a double cone. For
Hyperbolic operators 183

constant coefficient operators the characteristic directions do not depend on the

point x, but this is not true in general.
Why is it important to know which directions are characteristic? For one
thing, it helps us understand the nature of boundary conditions. Suppose S is
a smooth hypersurface in ]Rn (hypersurface means of dimension n - 1, hence
codimension 1). The simplest example is the flat hypersurface Xn = O. In fact,
the implicit function theorem says that locally every smooth hypersurface can
be given by the equation Xn = 0 in the appropriate coordinate system (of course
for this we must allow curvilinear coordinate systems). At each point of the
surface we have a normal direction (we have to choose among two opposites)
and n - 1 tangential directions. For the surface Xn = 0 the normal direction
is (0, ... ,0, 1). We say that the hypersurface S is noncharacteristic for P if at
each point x in S, the normal direction is noncharacteristic. This means exactly
that the differential operator is of order m in the normal direction.
For noncharacteristic surfaces, it makes sense to consider the Cauchy problem:
find solutions to Pu = f that satisfy

"'" ~ :~Is ~ g"

I/O, Unr-'"," ~ gm-'

where f (defined on ]Rn) and go, g" ... , gm (defined on S) are called the Cauchy
data. The rationale for giving this sort of data is the following. Once we know
the value of u on S, we can compute all tangential first derivatives, so the
only first derivative it makes sense to specify is the normal one. Once all first
derivatives are known on S, we can again differentiate in tangential directions.
Thus, among second derivatives, only (8 2 /8n 2 )u remains to be specified on S.
If we continue in this manner, we see that there are no obvious relationships
among the Cauchy data, and together they determine all derivatives or order
:::; m - 1 on S. Now we bring in the differential equation. Because S is
noncharacteristic, we can solve for (8/ 8n) muon S in terms of derivatives
already known on S. By repeatedly differentiating the differential equation, we
eventually obtain the value of all derivatives of u on S, with no consistency
conditions arising from two different computations of the same derivative. In
other words, the Cauchy data gives exactly the amount of information, with no
redundancy, needed to determine u to infinite order on S.
There is still the issue of whether knowing u to infinite order on Sallows
us to solve the differential equation off S. In the real analyticcase this is the
content of the famous Cauchy-Kovalevska Theorem. We have to assume that
all the Cauchy data, and the coefficients of P as well, are real analytic functions
(that means they have convergent power series expansions locally). The theorem
says that then there exist solutions of the Cauchy problem locally (in a small
enough neighborhood of any point of S), the solutions being real analytic and
unique. This is an example of a theorem that is too powerful for its own good.
Because it applies to such a large class of operators, its conclusions are too
184 Sobolev Theory and Microlocal Analysis

weak to be very useful. In general, the solutions may exist in only a very small
neighborhood of S, it may not depend continuously on the data, and it may fail
to exist altogether if the data is not analytic (even for Coo data).
For this reason, it seems worthwhile to try to find a smaller class of operators
for which the Cauchy problem is well-posed. (By well-posed we mean that a
solution exists, is unique, and depends continuously on the data.)
This is the structural motivation for the class of hyperbolic equations. Of
course, to be useful, the definition of "hyperbolic" must be formalistic, only
involving properties of the symbol that are easily checked. In order to simplify
the discussion, we begin by assuming that the operator has constant coefficients
and contains only terms of the highest order m. In other words, the full symbol
is a polynomial p(O that is homogeneous of degree m, hence equal to its top-
order symbol.
To give the definition of hyperbolic in this context, we look at the polynomial
in one complex variable z, p( ~ + zv) for each fixed ~ and v a given direction.
This is a polynomial of degree ::; m, and is exactly of degree m when v is
noncharacteristic, the coefficient of zm being p( v). In that case we can factor
the polynomial
m

p(~ + zv) = p(v) IT (z - Zj(O)

j=l

where Zj(O are the complex roots. We say P is hyperbolic in the direction v,
if v is a noncharacteristic direction and all the roots Zj(~) are real, for all ~.
The prototype example is the wave equation, which is hyperbolic in the t-
direction. The notation is slightly different, in that p( T, 0 = _T2 + k21~12, so
V = (1,0, ... ,0) is the t-direction. Then

which clearly shows the two real zeroes.

For hyperbolic operators of the type we are considering, it is easy to write
down a fundamental solution:

u(x) = _1_
(271")n
Jj(~ ++
p(~
iAV) e-ix·(Hi,Xv) ~
iAV)
for A i- 0 real is well defined and solves Pu = f for any f E V. The idea is
that we do not encounter any zeroes of the symbol, and in fact we can estimate
p(~ + iAV) from below; since p(~ + iAV) = p(v) I1j:l (iA - Zj(~)) and Zj(~) is
real so liA-Zj(OI :::: IAI we obtain Ip(~+iAV)1 :::: Ip(v)IIAlm. This estimate and
the Paley-Wiener estimates for j(~ + iAV) guarantee that the integral defining u
converges and that we can differentiate with respect to x any number of times
inside the integral. Thus

Pu(x) = _1_
(271" )n
Jj(~ + iAv)e-ix·(Hi,Xv) d~.
Hyperbolic operators 185

By using the Cauchy integral formula we can shift the contour (one dimension
at a time) back to JRn, and then Pu = f by the Fourier inversion formula. In
fact, the same Cauchy integral formula argument can be applied to the integral
defining u to shift the value of A, provided we do not try to cross the A = 0
divide (where there are zeroes of the symbol). Thus our construction really
only produces two different fundamental solutions, corresponding to A > 0 and
A < o. We denote them by E+ and E-.
These turn out to be very special fundamental solutions. First, by a variant of
the Paley-Wiener theorem, we can show that E+ is supported in the half-space
x . v :::: 0, and E- is supported in the other half-space x . v :::; O. (By "support"
of E± we mean the support of the distributions that give E± by convolution.)
But we can say even more. There is an open cone r that contains the direction
v, with the property that P is hyperbolic with respect to every direction in r.
The fundamental solutions E± are the same for every v E r, so in fact the
support of E+ is contained in the dual cone r* defined by {x : x . v :::: 0 for
all v E r}, and the support of E- is contained in - r*. I use the word cone
here to mean a subset of JRn that contains an entire ray AX(A > 0) for every
point x in the cone. It can also be shown that the cones rand r* are convex.
The cone r* is referred to as the forward light cone, (or sometimes this term
is reserved for the boundary of r*, and r* is called the interior of the forward
light cone).
There are two ways to describe the cone r. The first is to take JRn and
remove all the characteristic directions; what is left breaks up into a number
of connected components. r is the component containing v. For the other
description, we again look at the zeroes of the polynomial p( ~ + zv), which are
all real. Then ~ is in r if all the real roots are negative (note that v is in r
according to this definition, since p(v + zv) = p(v)(z + 1)m which has z = -1
as the only root). It is not obvious that these two descriptions coincide nor that
P is hyperbolic with respect to all directions in r, but these facts can be proved
using the algebra of polynomials in JRn.
The fact that r is an open cone implies that the dual cone r* is proper, mean-
ing that it is properly contained in a half-space. In particular, this means that if
we slice the light cone by intersecting it with any hyperplane perpendicular to
v, we get a bounded set. This observation will have an interesting interpretation
concerning solutions of the hyperbolic equation.
We can illustrate these ideas with the example of the wave equation. The
characteristic directions were given by the equation T2 - k21~12 = 0 in JRn+1 (T E
JRI , ~ E JRn). The complement breaks up into 3 regions (or 4 regions if n = 1).
r is the region where T > 0 and kl~1 < ITI. The other two regions are -r(T < 0
and kl~1 < ITI) and the outer region where kl~1 > ITI.
Notice how the second description of r works in this example. The two
roots were computed to be -T - kl~1 and -T + kl~l. If these are both to be
negative, then by summing we see T > o. Then -T - kl~1 < 0 is automatic
and - T + kl~1 < 0 is equivalent to kl~1 < ITI.
186 Sobolev Theory and Microlocal Analysis

FIGURE 8.2

We can also compute r* for this example. For (t, x) to be in r* we must

have tT + X· ( ?: 0 for all (T,O in r. Taking (T,O = (1,0) shows that we
must have t ?: O. Now since Ix, (I :::; Ixll(1 :::; Ix ITI k for (T, () in r we have

tT + X· ( ?: tT -lxlTlk = T(t -Ixllk)

e
will be positive if Ixl :::; kt. However, if Ixl > kt we can choose in the direction
opposite x to make the inequality an equality, and then we get a negative value.
Thus r* is given exactly by the conditions t ?: 0 and Ixl :::; kt.
Using the fundamental solutions E±, we can solve the Cauchy problem for
any hypersurface S that is space like. The definition of space like is that the
normal direction must lie in r. The simplest example is the flat hyperplane
x . v = 0 whose normal direction is v. In the example of the wave equation,
if the surface S is defined by the equation t = hex), then the normal direction
is (1, -6..,h(x)), so the condition that S be spacelike is that IV'h(x)1 :::; 11k
everywhere. We will assume that the spacelike surface is complete (it cannot
be extended further) and smooth, and that it divides the remaining points of
~n into two pieces, a "past" and a "future," with the cone r pointing in the
future direction. In this situation it can be shown that the Cauchy problem is
well-posed. Because the argument is intricate we only give a special case.
Suppose we want to solve Pu = J,

where J is a COO function with support in the future. We claim that the solution
is simply u = E+ * J. We have seen that u solves Pu = J if J has compact
Hyperbolic operators 187

future portion
of x - f*
future
~

s
past

FIGURE 8.3

support. However, at any fixed point x, E+ * f only involves the values of fin
the set x - r*. For x in S or in the past, x - r* lies entirely in the past where
f is zero, so u vanishes on S and the past, hence vanishes to infinite order on
S. But for x in the future, x - r* intersects the future only in a bounded set,
as shown in Figure 8.3.
The convolution is well defined even if f does not have compact support. In
fact, this argument shows that the solution u at x in the future depends only
on the values of f in the future portion of x - r*. More generally, Jor the full
Cauchy problem Pu = f, (8/8n)k ul s = gk, k = 0, ... ,m - I, the solution
u(x) for x in the future depends only on the values of f in the future portion of
x - r* , and the values of gk on the portion of S that lies in x - r*. (A similar
statement holds for x in the past, with the past portion of x + r* in place of
the future portion of x - r*.) This is the finite speed of propagation property
of hyperbolic equations, which was already discussed for the wave equation in
section 5.3.
We have described completely the definition of hyperbolicity for constant
coefficient equations with no lower order terms. The story in general is too
complicated to describe here. However, there is special class of hyperbolic
operators, called strictly hyperbolic, which has the property that we can ignore
lower order terms and which can be defined in a naive frozen coefficients fashion.
Recall that for a noncharacteristic direction v, we required that the roots of the
polynomial p(( + zv) be real. For a strictly hyperbolic operator, we require in
addition that the roots be distinct, for ( not a multiple of v. Since we want
to deal with the case of variable coefficient operators, we take the top-order
symbol Pm (x, () and we say P is strictly hyperbolic at x in the direction v
if Pm (x, v) #- 0 and the roots of Pm (x, ( + zv) are real and distinct for all
( not a multiple of v. We then define a strictly hyperbolic operator P to be
one for which there is a smooth vector field v(x) such that P is hyperbolic
at x in the direction v(x). It is easy to see that the wave operator is strictly
188 Sobolev Theory and Microloeal Analysis

hyperbolic because the two roots - 7 - kl~1 and - 7 + kl~1 are distinct if ~ =I- 0
(this is exactly the condition that (7,~) not be a multiple of (1,0)). For strictly
hyperbolic operators we have a theory very much like what was described above,
at least locally. Of course the cone r may vary from point to point, and the
analog of the light cone is a more complicated geometric object. In particular,
we can perturb the wave operator by adding terms of order 1 and 0, even with
variable coefficients, without essentially altering what was said above. We can
even handle a variable speed wave operator (8 2 / 8t2) - k(x)2 C1 x where k(x) is
a smooth function, bounded and nonvanishing.
To say that the Cauchy problem is well-posed means, in addition to existence
and uniqueness, that the solution depends continuously on the data. This is
expressed in terms of a priori estimates involving L2 Sobolev spaces. A typical
estimate (for strictly hyperbolic P) is

where 'IjJ E V(JRn) is a localizing factor. Here, of course, the Sobolev norms
for u and f refer to the whole space Wn , while the Sobolev norms for gj refer
to the (n - 1)-dimensional space S (we have only discussed this for the case of
flat S). This estimate shows the continuous dependence on data when applied
to the difference of two solutions. If u and v have data that are close together in
the appropriate Sobolev spaces, then u and v are close in Sobolev norm. Using
the Sobolev inequalities, we can get the conclusion that u is close to v in a
pointwise sense, at least locally.
If u is in L~+m' then Pu will be in L~, since P involves m derivatives, and
so the appearance of the L~ norm for f on the right side of the a priori estimate
is quite natural. However, the Sobolev norms of the boundary terms seem to
be off by -1/2. In other words, since gj = (8/8n)j uls, we would expect
to land in L;"+k_j' not L;"+k-j-I/2. But we can explain this discrepency
because we are restricting to the hypersurface S. There is a general principle
that restriction of L2 Sobolev spaces reduces the number of derivatives by 1/2
times the codimension (the codimension of a hypersurface being 1).
To illustrate this principle, consider the restriction of f E L;(JRn) to the flat
hypersurface Xn = O. We can regard this as a function on JRn - l , and we want
to show that it is in L;_1/2(JRn-l) if s > 1/2. Write Rf(xI, ... , xn-d =
f(XI, ... , Xn-l, 0). By the one-dimensional Fourier inversion formula we have
Rf(XI, ... ,xn-d = 2~ J~oo J~oo f(x)eixnf.n dX n d~n' and from this we obtain
Rf (~l' ... ' ~n-d = 2~ J~oo j(6, ... , ~n) d~n. So the Fourier transform of
A

the restriction Rf is obtained from the Fourier transform of f by integrating

out the extra variable.
The wave front set 189

Now we apply the Cauchy-Schwartz inequality to the integral for Rf A ,

writing j(~) = [j(~)(1 + 1~12)s/2][(1 + 1~12)-s/21. We obtain

IRf (6, ... '~n-IW

~ (2~)2 (i: Ij(~W(I + 1~ls d~n) . (i: (I + 1~12)-s d~n) .

Now a direct computation shows

if s > 1/2 (this follows by the change of variable ~n ---+ (I + ~r + ... +

e'_I)I/2~n). Thus we have

and integrating with respect to ~I' ... '~n-I we obtain

f (I + ~~ + ... + ~;_I )s-I/2IRf (~I' ... '~n-I )1 2d~1 ... d~n-I

JRn-t A

~ c f Ij(~W(I + I~ns d~n.

JRn
This says exactly IIRfllL2 -1/2 ~ cllfllv. Actually, the pointwise restriction may
B 8

not initially be well defined. To have f continuous, by the Sobolev inequalities,

we would have to assume s > n/2. With just s > 1/2, neither f nor Rf need
be continuous. Nevertheless, the a priori estimate IIRfllL2s -1/2 ~ cllfllvs shows
that the restriction operator can be consistently defined.

8.6 The wave front set

If a function fails to be smooth, we can locate the singularity in space, and we
can further analyze the direction of the singularity. The key idea for doing this
is the wave front set. This is one place where it is better to think about spaces
of dimension greater than one right away; in one dimension there are only two
directions, and the ideas do not have immediate intuitive appeal.
For example, consider a function on the plane that is identically one on one
side of a smooth curve and zero on the other side.
The function is clearly smooth everywhere except along the curve, where it
has a jump discontinuity. If one had to choose a direction for the singularity of
190 Sobolev Theory and Microloeal Analysis

nonnal directions
/ -= I

/-= 0

FIGURE 8.4

the function at a point on the curve, it would seem to be the normal direction at
that point. It would also seem reasonable to say that the function is smooth in
the tangential direction. There is less intuitive justification for deciding about all
the other directions. We are actually going to give a definition that decides that
this function is smooth in all directions except the two normal directions. Here
is one explanation for that decision. Choose a direction v, and try to smooth
the function out by averaging it in the directions perpendicular to v (in ~n this
will be an (n - I)-dimensional space, but for n = 2 it is just a line). If you
get a smooth function, then the original function must have been smooth in the
v direction. Applying this criterion to our example, we see that averaging in
any direction other than the tangential direction wiII smooth out the singularity,
and so the function is declared to be smooth in all directions except the nonnal
ones. (Actually, this criterion is a slight oversimplification, because it does not
distinguish between v and -v.)
Let's consider another example, the function Ixl. The only singularity is at
the origin. Since the function is radial, all directions are equivalent at the origin,
so the function cannot be smooth in any direction there.
How do we actually define microlocal smoothness for a function (or distri-
bution)? First we localize in space. We take the function f and multiply by a
test function cp supported near the point x (we require cp(x) :f= 0 to preserve the
behavior of the function f near x). If f were smooth near x, then cpf would
be a Coo function of compact support, and so (cpf) (0 would be rapidly
A

decreasing,

Then we localize in frequency. There are many different directions to let ~ --+ 00.
In some directions we may find rapid decrease, in others not. If we find an open
cone r containing the direction v, such that

for all N, then we say f is smooth microlocally at (x, v).

The wave front set 191

FIGURE 8.S

More precisely, the order of quantifiers is the following: f is microlocally

smooth at (x, v) if there exists ep E V with ep( x) =I- 0 and an open cone r
containing v such that for all N there exists CN such that

I(ep!) A (~)I ::; CN(I + IW- N for all ~ E r .

The wave front set of f, denoted W F(f), is defined to be the complement of
all (x, v) where f is micro locally smooth.
From the definition it is clear that the points where f is microlocally smooth
form an open set, since it contains all (x,~) where ep( x) =I- 0 and ~ E r.
Thus the wave front set is closed. It is also clearly conical in the ~ variable:
if (x,~) E WF(f) then also (x,'\~) E WF(f) for any'\ > O. There is one
consistency condition that should be checked in the definition. We have to allow
ep to be variable, because there might be some singularities of f that are very
near x that will be cut away by multiplying by ep with very small support, but
would not be cut away if the support of ep is too large. However, once we have
found a ep that works for showing f is microlocally smooth at (x, v), we would
like to know that all "smaller" cut-off functions would also work. This is in
fact true if we intepret "smaller" to mean 'ljJep for some other test function 'IjJ.
The idea here is to use

('ljJep!) A (~) = (2~)n ~ * (ep!) A (0.

Suppose we know I(ep!) A (01 ::; CN(1 + I~)-N for all ~ Er. Then if we take
any smaller open cone r, ~ r, but still containing v, we will have a similar
estimate,

(The constant c'tv in this estimate depends on r" and blows up if we try to take
192 Sobolev Theory and Microlocal Analysis

ball of radius A about S

J
complement of r

FIGURE 8.6

r, = r.) The "proof by pictures" is to observe that as you go out to infinity in

r" r
the distance to the complement of also goes to infinity. In the convolution
formula

-J; * (cpf) (~) =

A J -J;(rJ)(cpf) A (~- 71) drJ
the function -J;( 71) is rapidly decreasing in all directions, so it is very close to
having compact support. If we knew that -J;(rJ) were supported in 1711 :S A, then
for ~ large enough in r, we would have ~ - 71 in r, so we could use the estimate
for (cpf) A to get

I-J; * (cpf) (01 :S C { (I + I~ - rJl}-N drJ

J1ryl 5,A
A

and this is dominated by a multiple of (1 + IW- N once I~I ~ 2A. Since -J; does
not actually have compact support, the argument is a little more complicated,
but this is the essential idea.
Although the definition of the wave front set is complicated, the concept turns
out to be very useful. We can already give a typical application. Multiplication
of two distributions is not defined in general. We have seen that it can be
defined if one or the other factor is a Coo function, and it should come as
no surprise that it can be defined if the singular supports are disjoint. If T,
and T2 are distributions with disjoint singular support, then T, is equal to a
Coo function on the complement of sing supp T 2, so T, . T2 is defined there.
Similarly T2 . T, is defined on the complement of sing supp T, because T2 is
equal to a Coo function there, and by piecing the two together we get a product
defined everywhere. Now I claim that we can even define the product if the
singular supports overlap, provided that the wave front sets satisfy a separation
condition. The condition we need is the following: we never have both (x, v)
The wave front set 193

in WF(Td and (x,-v) in WF(T2)' The occurance of the minus sign will be
clear from the proof.
To define TI ·T2 it suffices to define cp2T 1 ·T2 for test functions cp of sufficiently
small support, for then we can piece together TI . T2 = L: cpJT1 • T2 from such
test functions with L: cP~ == 1. Now we will try to define cp2TI . T2 as the inverse
Fourier transform of (2!)n (cpTI) * (CPT2) A The problem is that the integral
A •

defining the convolution may not make sense. Since cpTI and CPT2 have compact
support, we do know that they satisfy estimates

l(cpT1) A (01 ::; e(l + Iw Nl

l(cpT2) A (~)I ::; e(l + Iw N2

for some NI and N2. But

and these estimates will not make the integral converge. However, for any fixed
v, either (CPT2) (77) is rapidly decreasing in an open cone r containing v, or
A

(cpTI) (-77) is rapidly decreasing in r, if cp is taken with small enough support,

by the separation condition on the wave front sets. This is good enough to make
the portion of the integral over r converge absolutely (for every O. In the first
alternative

for any N, which will converge if NI - N < -n; in the second alternative we
need the observation that for fixed ~, ~ - 77 will belong to - r if 77 is in a slightly
smaller cone r l , for 77 large enough, so

and this converges if N2 - N < -no By piecing together these estimates, using
a compactness argument to show that we can cover Jm.n by a finite collection
of such cones, we see that the convolution is well defined. These estimates
also show that (cpTd * (cpT2) is of polynomial growth in~, so the inverse
A A

Fourier transform is also defined.

A more careful examination of the argument shows that the wave front set of
the product TI .T2 must lie in the union of three sets: the wave front sets of TI and
T 2, and the set of (x, 6 + 6) where (x, ~I) E W F(TI) and (x, 6) E W F(T2).
We will have more applications of the wave front set shortly, but now let's
go through the computation of the wave front set for the examples we discussed
from an intuitive point of view before. To be specific, let f(x, y) = x(y > 0)
be the characteristic function of the upper half-plane. For any point not on the
x-axis, we can choose cp equal to one at the point and supported in the upper
194 Sobolev Theory and Microloeal Analysis

or lower half-plane, so 'Pf is Coo. For a point (xo,O) on the x-axis, choose
'P(x,y) of the form 'P1(X)'P2(y) where 'PI(XO) = 1 and 'P2(0) = 1. Then
f(x,y)'P(x,y) = 'P1(x)h(y) where

h(y) = {'P20(Y) if Y > 0

ify ~ O.
(Since we are free to choose 'P rather arbitrarily, we make a choice that will
simplify the computations.) The Fourier transform of 'P . f is then 'PI (~)h( TJ).
Now <PI (0 is rapidly decreasing, but 1,,( TJ) is not, because h has a jump dis-
continuity. (In fact, h(TJ) decays exactly at the rate O(ITJI- I ), but this is not
significant.) Now suppose we choose a direction v = (v\, V2) and let (~, TJ) go to
infinity in this direction; say (~, TJ) = (tVI, tV2) with t -+ +00. Then the Fourier
transform restricted to this ray wiII be <PI (tvI)h( tV2). As long as VI =I- 0, the
rapid decay of <PI will make this product decay rapidly (for this it is enough
to have Ih(TJ)1 ~ c(l + ITJI)N for some N, which holds automatically since h
has compact support). But if VI = 0 then <PI does not contribute any decay and
1,,( tV2) does not decay rapidly. This shows that f is microlocally smooth in all
directions except the y-axis, so W F(f) consists of the pairs ((x, 0), (0, V2)), the
boundary curve, and its normal directions. This agrees with our intuitive guess.
Of course we took a very simple special case, where the boundary curve was
a straight line, and one of the axes in addition. For a more general curve the
computations are more difficult, and we prefer a roundabout approach. Any
smooth curve can be straightened out by a change of coordinates, so we would
like to know what happens to the wave front set under such a mapping. This
is an important question in its own right, because the answer will help explain
what kind of space the wave front set lies in.
Suppose we apply a linear change of variable y = Ax where A is an invertible
n x n matrix. Recall that if T is a distribution we defined To A as a distribution
by

and if T is a distribution of compact support then

(T 0 A) A (~) = IdetAI- 1(T, ei(A-1x H )

= IdetAI-lt((A-I)tr~)

because A-IX· ~ = X· (A-I)trf Now fix a point x and a direction~. If we

multiply To A by 'P where 'P(x) =I- 0, this is the same as ('l/JT) 0 A where
'l/J = 'P 0 A-I and 'l/J(y) = 'P(A-Iy) = 'P(x) =I- 0 where y = Ax. Then

('P(T 0 A)) A (t~) = (('l/JT) 0 A) A (t~) = IdetAI-I('l/JT) A (t(A-I)tro

so (X,~) E W F(To A) if and only if (Ax, (A -I )tr~) E W F(T). Note that if A

is a rotation, then (A -I )tr = A and so the same rotation acts on both variables.
The wave front set 195

Also a rotation preserves perpendicularity, so our conclusion that the wave front
set of the characteristic function of the upper half-plane points perpendicular to
the boundary carries over to any half-plane. But to understand regions bounded
by curves we need to consider nonlinear coordinate changes.
Let g : lH. n --t lH. n be a smooth mapping with smooth inverse g-I. Then for
any distribution T we may define Tog by

where leg) = Idetg'l is the Jacobian. The analogous statement for the wave
front set is the following: (x,e) E WF(Tog) ifandonlyif(g(x),(g'(x)-I)tre)
E W F(T). Note that this is consistent with the linear case because a linear
map is equal to its derivative.
This transformation law for wave front sets preserves perpendicularity to a
curve. Suppose x(t) is a curve and (x(to), e) is a point in W F(f 0 g) with e
perpendicular to the curve, x'(to)·e = O. Then (g(x(to)), (g'(X(tO))-1 )tre) is a
point in WF(f) with (g'(x(to))-I)tre perpendicular to the curve g(x(t)). This
follows because g'(x(t))x'(t) is the tangent to the curve at g(x(to)), and

g' (x(t o) )x' (to) . (g' (x( to)) -I )tr e

= (g'(X(tO))-Ig'(X(tO))x'(to))' e
x'(to) . e= o.

So if f is the characteristic function of a region bounded by a curve ,(t), we

choose g to map the upper half-plane to this region, and the x-axis to the curve
,(t). Since we know that W F (f 0 g) consists of pairs (x, e) where x lies on
the x-axis and e is perpendicular to it, it follows that W F(f) consists of pairs
(x,O where x lies on the curve ,(t) and e is perpendicular to it.

rex) g
f =I ~
t fog= I
x(t)

f=O fog =0

FIGURES.7

The transformation property of the wave front set indicates that the second
variable e should be thought of as a cotangent vector, not a tangent vector. In
the more general context of manifolds, the wave front set is a subset of the
cotangent bundle. The distinction between tangents and cotangents is subtle,
196 Sobolev Theory and Microlocal Analysis

but worth understanding. Tangents may be thought of as directions, or arrows.

The tangents at a fixed point (the base of the arrows) form a vector space. (Of
course if we are working in Euclidean space then the space is already a vector
space, so it is better to think about a surface, say the unit sphere in Jm.3. At
every point on the sphere, the tangent space is a two-dimensional vector space
of arrows tangent to the sphere at that point). The cotangent space is the dual
vector space. Although a finite-dimensional vector space may be identified with
its dual, this identification is not canonical, but requires the choice of an inner
product (in the manifold context this is a Riemannian metric). In our case, the
~ gives rise to a linear function on tangent directions v -+ v .~, but is not itself
a tangent direction. This explains why the intuitive motivation for the definition
of wave front set in terms of singularities in "directions" is so complicated.
The "directions" are not really tangent directions after all. (The same is true of
characteristic "directions" for differential operators, by the way.) However, it
does make sense to talk about the subspace of directions perpendicular to ~ (in
Jm.2 this is a one-dimensional space), since this is defined by v . ~ = o.
Now we consider another application of the wave front set, this time to the
question of restrictions of distributions to lower dimensional surfaces. This is
even a problem for functions. In the examples we have been considering of
a function with a jump discontinuity along a curve, it does not make sense to
restrict the function to the curve, since its value is in some sense undefined
along the whole curve. However, it does make sense to restrict it to a different
curve that cuts across it at an angle, for then there is only one ambiguous point
along the curve. In fact, it turns out that restrictions can be defined, even for
distributions, as long as the normal directions to the sUrface are not in the wave
front set of the distribution.
We illustrate this principle in the special case of restrictions from the plane
to the x-axis. Suppose T is a distribution on Jm.2 whose wave front set does
not contain any pair ((x, 0), (0, t)) consisting of a point on the x-axis and a
cotangent perpendicular to it. Then we claim there is a natural way to define
the restriction T(x,O) as a distribution on the line. It suffices to define the
restriction of cpT for cp a test function of small support, for then we can piece
together the restrictions of cpjT where 2:,cpj =I and get the restriction of T.
Now for any point on the x-axis, if we take the support of cp in a small enough
neighborhood of the point, then (cpT) will be rapidly decreasing in an open
A

cone r containing the vector (0, 1), and similarly for - r containing (0, -I),
since by assumption T is microlocally smooth in these cotangent directions. To
define the restriction of cpT we will define its Fourier transform. But we have
seen that the Fourier transform of the restriction should be given by integrating
the Fourier transform of cpT. In other words, if R denotes the restriction, we
should have

R(cpT) A (0 = 2~ i: (cpT) A (~,1J)d1J.

The wave front set 197

FIGURE 8.8

What we claim is that this integral converges absolutely. This will enable us
to define R( cpT) A by this formula, and that is the definition of R( cpT) as a
distribution on ~ 1 . The proof of the convergence of the integal follows from
e,
the picture that shows that for any fixed the vertical line (e, 'TJ) eventually lies
in r or -r as 'TJ -+ ±oo.
Once inside f or -f, the rapid decay guarantees the convergence, and the
finite part of the line not in these two cones just produces a finite contribution
to the integral, since (cpT) is continuous. A more careful estimate shows that
A

R(cpT) (e) has polynomial growth, so the inverse Fourier transform in Sf is

well defined.
To prove the restriction property for a general curve in the plane, we simply
apply a change of coordinates to straighten out the curve. The hypothesis that the
normal cotangents not be in the wave front set is preserved by the transformation
property of wave front sets, as we have already observed. A similar argument
works for the restriction from ~n to any smooth k-dimensional surface. Note
that in this case the normal directions to a point on the surface form a subspace
of dimension n - k of the cotangent space.
A more refined version of this argument allows us to locate the wave front set
of the restriction. Suppose x is a point on the surface S to which we restrict T.
e
If we take the cotangents at x for which (x,O E W F(T) and project them
orthogonally onto the cotangent directions to S, then we get the cotangents that
may lie in the wave front set of the restriction of T to S at x. In other words,
if we have a cotangent 'TJ to S at x that is not the projection of any cotangent
in the wave front set of T of x, then RT is microlocally smooth at (x, 'TJ).
198 Sobolev Theory and Microlocal Analysis

8.7 Microlocal analysis of singularities

Where do the singularities go when you solve a differential equation? We can
now begin to answer this question on a microlocal level. The first observation
is that the wave front set is not increased by applying a differential operator,

WF(Pu) ~ WF(u).
This is analogous to the fact that P is a local operator (the singular support is
not increased), and so we say that any operator with this property is a micro local
operator. In fact it is true that pseudodifferential operators are also microlocal.
We illustrate this principle by considering the case of a constant coefficient
differential operator. The microlocal property is equivalent to the statement that
if u ismicrolocally smooth at (x,~) then so is Pu. Now the idea is that if
(cpu) is rapidly decreasing in an open cone r, then so is P( cpu) since this
A A

is just the product of (cpu) A with the polynomial p. This is almost the entire
story, except that what we have to show is that (cpPu) is rapidly decreasing
A

in r, and cpPu =1= P( cpu). In this case this is just a minor inconvenience,
since cpPu = P(cpu) + 'L,CpjQjU where Qj are lower order constant coefficient
differential operators and Cpj are test functions with the same support as cpo Thus
we prove the assertion by induction on the order of the differential operator, and
the induction hypothesis takes care of the cpjQ jU terms.
Since applying a differential operator does not increase the wave front set,
the next natural question to ask is, can it decrease it? For elliptic operators,
the answer is no. This is the microlocal analog of hypoellipticity and so is
called microlocal hypoellipticity. It is easy to understand this fact in terms of
the parametrix. If P is an elliptic operator and Q is the 'ljJDO parametrix, so
QPu = u+Ru with Ru a Coo function, then the microlocal nature ofQ implies

WF(u) = WF(QPu) ~ WF(Pu).

So for elliptic operators WF(Pu) = WF(u). For nonelliptic operators we can
say where W F (u) may be located, in terms of the characteristics of P. Recall
that we said the cotangent ~ is characteristic for P at x if Pm(x,~) = 0, where
Pm is the top-order symbol of P. Let's call char(P) the set of all pairs (x,O
where this happens.

Theorem 8.7.1
W F (u) ~ W F (Pu) U char( P). Another way of stating this theorem is that if
Pu is microlocally smooth at (x, ~), and ~ is non characteristic at x, then u is
microlocally smooth at (x, ~).

The proof of this fact is just a variation on the construction of a parametrix

for an elliptic operator. In this case we construct a 'ljJDO of order -m which
Microlocal analysis of singularities 199

serves as a microlocal parametrix, so QP is not equal to the identity plus R, but

rather QP = L + R where L is a microlocalizer, restricting to a neighborhood
of the point x and an open cone r containing~. By its construction, Lu and
u will have the same microlocal behavior at (x, 0, Ru is always smooth, and
QPu is microlocally smooth at (x,O because of the microlocal property of Q.
In constructing Q we simply use the nonvanishing and the homogeneity of Pm
in a microlocal neighborhood of (x,~).
We can use this theorem to make sense of the boundary values in the Cauchy
problem. If S is a noncharacteristic surface for P, then none of the normal
directions to S are contained in char(P). If Pu = 0 (or Pu = f for smooth
j) then W F( u) ~ char( P) by the theorem. Thus W F( u) contains none of the
normals to S, so the restriction to S is well defined. Since the wave front set
of any derivative of u is contained in WF(u), the same argument shows that
boundary values of derivatives of any order are also well defined.
We are now going to consider a more refined statement about the singularities
of solutions of Pu = 0 (or Pu = f with smooth j). We know they lie in
char(P), but we would like to be able to say more, since char(P) may be
a very large set. We will break char(P) up into a disjoint union of curves
called bicharacteristics. Then we will have an all-or-nothing dichotomy: either
W F( u) contains the whole bicharacteristic or no points of it. This is paraphrased
by saying singularities propagate along bicharacteristics. To do this we have
restrict attention to a class of operators called operators of principal type. This is
a rather broad class of operators that includes both elliptic operators and strictly
hyperbolic operators. The definition of principal type is that the characteristics
be simple zeroes of the top-order symbol: if Pm(X, 0 = 0 for ~ i- 0 then
V' ePm(X,~) i- O. Note that this definition depends only on the highest order
terms, and it has a "frozen coefficient" nature in the dependence on x. The
intuition is that operators of principal type have the property that the highest
order terms "dominate" the lower order terms. We will also assume that the
operator has real-valued coefficients, so impm (x,~) is a real-valued function.
The definition of bicharacteristic in general (for operators with real-valued
coefficients) is a curve (x(t),~(t)) which satisfies the Hamilton equations

dx~;t) =im~;(X(t),~(t)) j= I, ... ,n

dXj(t) = _imapm(x(t),~(t)) j = 1, ... ,n.

dt aXj

It is important to realize that this is a system of first-order ordinary differential

equations in the 2n unknown functions x j (t) and ~j (t) (the partial derivatives on
the right are applied to the known function Pm (x, ~)). Therefore, the existence
and uniqueness theorem for ordinary differential equations implies that there is
a unique solution for every initial value of x and~. Now a simple computation
200 Sobolev Theory and Microlocal Analysis

using the chain rule shows that Pm is constant along a bicharacteristic curve,
because

d ~ (8 pm dXj(t)
dtPm(X, (t),~(t)) = ~ 8xj (x(t),~(t))dt

+ 8Pm(X(t) ~(t))d~j(t))
8~j , dt

= (_i)m ~ (_ d~j(t) dXj(t) dXj(t) d~j(t)) = O.

~ dt dt + dt dt
j=1

We will only be interested in the null bicharacteristics, which are those for
which Pm is zero. Thus the set char(P) splits into a union of null bicharacteristic
curves. Note that the assumption that P is of principal type means that the x-
velocity is nonzero along null bicharacteristic curves. This guarantees that they
really are curves (they do not degenerate to a point), and the projection onto the
x coordinates is also a curve. We will follow the standard convention in deleting
the adjective "nuIl"; all our bicharacteristic curves will be null bicharacteristic
curves.
If P is a constant coefficient operator, then 8Pm/8xj == 0 and so the bichar-
acteristic curves have a constant ~ value. Also 8Pm/8~j is independent of x,
and so dXj(t)/dt = (8Pm/8~j)(~) is a constant; hence x(t) is a straight line in
the direction V Pm (0. We caIl these projections of bicharacteristic curves light
rays.
For example, suppose P is the wave operator (8 2 /8t 2 ) - k 2 .6.x. Then
Pm(7,~) = _7 2 + k21~12, and the characteristics are 7 = ±kl~l, or in other
words the cotangents (±kl~I,~) for ~ :j:. O. Starting at a point (f, x), and choos-
ing a characteristic cotangent, the corresponding light ray (with parameter s) is
given by t = f ± 2kl~ls and x = x + 2k 2 sf If we use the first equation to
eliminate the parameter sand reparametrize the light ray in terms of t, the result
IS

which is in fact a light ray traveling at speed k in the direction ±~/I~I.

Now we can state the propagation of singularities principle.

Theorem 8.7.2
Let P be an operator of principal type, with real coefficients, and suppose
Pu = f is C=. For each null bicharacteristic curve, either all points are in
the wave front set of u or none are.
Microlocal analysis of singularities 201

Let's apply this theorem to the wave equation. Suppose we know the solution
in a neighborhood of (t,x) and we can compute which cotangents (±kl~I,~)
are in the wave front set at that point. Then we know the singularity will
propagate along the light rays in the ±~/I~I direction passing through that point.
If (±kl~l, 0 is a direction of microlocal smoothness, then it will remain so along
the entire light ray. Thus the "direction" of the singularity dictates the direction
it will travel.
In dealing with the wave equation, it is usually more convenient to describe
the singularities in terms of Cauchy data. Suppose we are given u(x, 0) = I(x)
and Ut(x, 0) = g(x) for U a solution of

What can we say about the singularities of u at a later time in terms of the
singularities of I and g. Fix a point x in space JRn, and consider the wave front
set of u at the point (0, x) in spacetime JRn+l. We know that the cotangents
(T,O must be characteristics of the equation, so must be of the form (±kl~l, O.
Note that exactly two of these characteristic spacetime cotangents project onto
each space cotangent~. So if (x,~) belongs to WF(f) or WF(g), then ei-
ther ((O,x),(kl~I,~)) or ((O,x),(-kl~I,~)) belongs to WF(u) (usually both,
although it is possible to construct examples where only one does). Conversely,
if both I and 9 are microlocally smooth at (x, ~), then it can be shown that u
is microlocally smooth at both points ((0, x), (±kl~l, ~)).
Now fix a point in spacetime (t, x). Given a characteristic cotangent
(±kl~l, ~), we want to know if u is microlocally smooth there. The light ray
passing through (t, x) corresponding to this cotangent intersects the t = 0 sub-
space at the point x =f kf(~/IW. So if I and 9 are mircolocally smooth in the
direction ~ at this point, we know that u is microlocally smooth along the whole
bicharacteristic curve, hence at ((t, x), (±kl~I,~)). In particular, if we want to
know that u is smooth in a neighborhood of (t, x), we have to show that I and
9 are microlocally smooth at all points (x =f kf(~/IW, O. This requires that we
examine the sphere of radius kf about x, but it does not require that I and 9
have no singularities on this sphere; it just requires that the singularities not lie
in certain directions.
To illustrate this phenomenon further, let's look at a specific choice of Cauchy
data. Let, be a smooth closed curve in the plane, and let I be the characteristic
function of the interior of ,. Also take 9 = o. So the singular support of I is
" and the wave front set consists of (x, 0 where x is in , at ~ is normal to ,.
This leads to singularities at a later time t at the points x ± kf(~/IW.
Clearly it suffices to consider ~ of unit length, and in view of the choice
of sign we may restrict ~ to be the unit normal pointing outward. Thus the
202 Sobolev Theory and Microlocal Analysis

singularities at the time f lie on the two curves x ± kfn where n denotes this
unit normal vector.

FIGURE 8.9

In other words, starting from the curve 'Y, we travel in the perpendicular
direction a distance kf in both directions to get the two new curves. Furthermore,
at each point on the two new curves, the wave front set of u(f, x) as a function
of x is exactly in the n direction.
Incidentally, we can verify in this situation that the direction n is still per-
pendicular to the new curves. If x(s) traces out 'Y with parameter s, then
x' (s) is the tangent direction so n( s) satisfies x' (s) . n( s) = O. Also we have
n(s) . n(s) = 1 because it is of unit length. Differentiating this identity we
obtain n'(s) . n(s) = O. But now the new curves are parametrized by s as
x(s) ± kfn(s) and so their tangent directions are given by x'(s) ± kfn'(s).
Since (x'(s) ± kfn'(s)) . n(s) = x'(s) . n(s) ± kfn'(s) . n(s) = 0 we see that
n( s) is still normal to the new curves.

8.8 Problems
1. Show that if I E LP' and I E £P2 with PI < P2, then I E LP for
PI ::s; P ::s; P2· (Hint: Split I in two pieces according to whether or not
III ::s; 1.)
2. Let

for 0 <a < 1/2. Show that I E Li(~2) but I is unbounded.

Problems 203

3. If f E L~(Jm.2), show that f satisfies the Zygmund class estimate

If(x + 2y) - 2f(x + y) + f(x)1 :::; elyl·

4. If fl E LHJm.I ) and h E L~(Jm.I), show that fl(xdh(X2) E L~(Jm.2).

5. If f E LHJm.2) show that (a I ar) f E L2 (Jm.2 ). Give an example to show
(ala())f is not necessarily in L2(Jm.2).
6. For which negative values of s does (j E L;(Jm.n)?
7. Show that f E L~(Jm.n) if and only if f E L2(Jm.n) and 6.f E L2(Jm.n).
If f E L2(Jm.2) and aflaxlax2 E L2(Jm.2), does it necessarily follow that
f E L~(Jm.2)? (Hint: Work on the Fourier transform side.)
8. If 0 < s < 2, show that f E L;(:ln) if and only if f E L2(Jm.n) and
fRn fRn If(x + 2y) - 2f(x + y) + f(x)1 2Iyl-n-2s dydx < 00.
9. For which values of p and k does e- ixi belong to L~(Jm.n)?
10. If f E L~(Jm.n) and 'P E V(Jm.n), show that 'P' f E L1(Jm.n).
11. Show that the heat operator (alat) - k6.x is not elliptic, but its full
symbol has no real zeroes except the origin. Show that it satisfies the a
priori estimate

12. Show that the heat operator is hypoelliptic by showing

1(::2 +6.~)N P(T,~)-II :::;e(T2+1~12)-(¥)

for T2 + 1~12 :::: 1.
13. Show that if P is any linear differential operator for order m, then

P(uv) = L ~ ((~)a
ax
u) p(a)v
iai:'S:m a.

where p(a) denotes the differential operator whose symbol is

(-i :~) a p(x,~).

14. Show that

a )m
~
n (
aXj

is an elliptic operator on Jm.n if m is even, but not if m is odd.

204 Sobolev Theory and Microlocal Analysis

15. Show that C1u = AU has no solutions of polynomial growth if A > 0, but
does have such solutions if A < O.
16. Let P" ... , Pk be first-order differential operators on JRn. Show that
P? + ... + P'f is never elliptic if k < n.
17. Show that a first-order differential operator on JRn cannot be elliptic if
either n 2': 3 or the coefficients are real-valued.
18. Show that a Coo function that is homogeneous of any degree must be a
polynomial. (Hint: A high enough derivative would be homogeneous of
negative degree, hence not continuous at the origin.)
19. Show that the Hilbert transform H f = ;:-1 (sgna(~)) is a 'lj;DO of order
zero in JR1. Show the same for the Riesz transforms

in JRn.
20. Verify that the asymptotic formula for the symbol of a composition of
'lj;DO's gives the exact symbol of the composition of two partial differ-
ential operators.
21. Verify that the commutator of two partial differential operators of orders
ml and m2 is of order ml + m2 - 1.
22. If P is a 'lj;DO of order r < -n with symbol p(x, ~), show that k(x, y) =
(2;)n J p(x, Oe-i(x-y).~ d~ is a convergent integral and Pu(x) =
J k( x, y )u(y) dy if u is a test function.
23. If u(x,~) is a classical symbol of order rand h : JRn -+ JRn is a Coo map-
ping with Coo inverse, show that u(h(x), (h'(x)tr)-I~) is also a classical
symbol of order r.
24. Show that u(x,~) = (I + 1~12)a/2 is a classical symbol of order a. (Hint:
Write (I + 1~12)a/2 = 1~la(1 + 1~1-2)a/2 and use the binomial expansion
for I~I > 1.)
25. If P is a 'lj;DO of order zero with symbol p(~) that is independent of x,
show that IIPul1L2 ~ Ilulip. (Hint: Use the Plancherel formula.)
26. If h is a Coo function, show that hC1 is elliptic if and only if h is never
zero.
27. Find a fundamental solution for the Hermite operator -(d2 /dx 2 ) + x 2 in
JRI using Hermite expansions. Do the same for -C1 + Ixl2 in JRn.
28. Let P be any second-order elliptic operator with real coefficients on JRn .
Show that (82/ 8t 2 ) - P is strictly hyperbolic in the t direction on JRn+ 1.
29. I~ which directions is the operator 8 2/ 8x8y hyperbolic in JR2?
Problems 205

30. Show that the surface y = 0 in JR3 is noncharacteristic for the wave
operator
82 82 82
8t2 - 8x2 - 8y2
but is not space like.
31. For a smooth surface of the form t = h(x), what are the conditions on h
in order that the surface be space like for the variable speed wave operator
(8 2j8t 2) - k(x?~x?
32. Show that if f E L~(JRn) with s > 1 then Rf(x" ... ,Xn-2) =
f(XI, .. . ,Xn-2, 0, 0) is welI defined in L~_I (JRn-2). (Hint: Interate the
codimension-one restrictions.)
33. Show that if n 2: 2, no operator can be both elliptic and hyperbolic.
34. Show that the equation Pu = 0 can never have a solution of compact
support for P a constant coefficient differential operator. (Hint: What
would that say about u?)
35. If u is a distribution solution of 8 2u j 8x8y = 0 and v is a distribution
solution of
8 2v 8 2v
---=0
8x 2 8y2
in JR2, show that the product is uv is welI defined.
36. For the anisotripic wave operator
82 82 82
8t2 - kf 8x 2
I
- k~ 8x22
in JR2,
compute the characteristics, the light cone, the bicharacteristic
curves and the light rays.
37. Show that the operator

(::2 - a :;2) (::2 - b :;2 )

2 2

in JR2 is of principal type if and only if a 2 =I- b2 •

38. Show that the operator
n 82
~aj(x) 8x]
in JR2is of principal type if the real functions aj (x) are never zero. When
is it elIiptic?
39. Let P = a(x, y) g: JR2
+ b(x, y) g~ be a first-order operator on with real
coefficients, and assume that the coefficients a(x, y) and b(x, y) do not
206 Sobolev Theory and Microlocal Attojysis

both vanish at any point. Compute the characteristic and the bicharac-
teristic curves, and show that the light rays are solutions of the system
x'(t) = a(x(t),y(t)),y'(t) = b(x(t),y(t)).
Suggestions for Further Reading

For a more thorough and rigorous treatment of distribution theory and Fourier
transforms:
"Generalized functions, vol. I" by I.M. Gelfand and G.E. Shilov, Academic
Press, 1964.

For a deeper discussion of Fourier analysis:

"An introduction to Fourier series and integrals" by R.T. Seeley, Benjamin,

1966.
"Fourier series and integrals" by H. Dym and H.P. McKean, Academic Press,
1985.
"Fourier analysis" by T.w. Korner, Cambridge Univ. Press, 1988.
"Introduction to Fourier analysis on Euclidean spaces" by E.M. Stein and G.
Weiss, Princeton Univ. Press, 1971.

For more about special functions:

"Special functions and their applications" by N.N. Lebedev, Dover Publica-

tions, 1972.

For a rigorous account of micro local analysis and pseudodifferential operators:

"Pseudodifferential operators" by M.E. Taylor, Princeton Univ. Press, 1981.

Expository articles on wavelets and quasicrystals:

"How to make wavelets" by R.S. Strichartz, Amer. Math. Monthly 100
(1993), 539-556.
"Quasicrystals: the view from Les Houches" by M. Senechal and J. Taylor,
Math. Intelligencer 12(2) 1990, 54-64.

For a complete account of wavelets:

"Ten Lectures on Wavelets" by I. Daubechies, SIAM, 1992.

207
Index

a priori estimates, 172, 180, 188, Cantor measure, 85, 104

203 Cantor set, 85, 104
absolutely continuous, 122 Cauchy data, 62, 183, 201
adjoint, 179 Cauchy integral theorem, 39
adjoint identities, 17, 44 Cauchy problem, 183, 199
analytic, 105, 112 Cauchy-Kovalevska Theorem, 183
analytic continuation, 48 Cauchy-Riemann equations, 70, 112,
analytic functions, 48 170
anisotripic wave operator, 205 Cauchy-Riemann operator, 168, 180
annihilation operator, 132 Cauchy-Schwartz inequality, 128
approximate, 13 chain rule, 182
approximate identity, 38, 93 change of variable, 179, 194
approximate identity theorem, 38 characteristic, 181, 196
approximation by test functions, 92 classical solution, 21
asymptotic expansion, 139, 177 classical symbol, 177, 204
Atiyah-Singer Index Theorem, 181 coefficients, 26
atlas, 99 comb, 118
average, 1 commutation identities, 133
average of Gaussians, 49
commutator, 128
compact set, 73
ball, 3
compact support, 73
Banach spaces, 163
Beckner's inequality, 111 complementary pairs, 129
Bessel functions, 135 complete system, 134
Bessel's differential equation, 151 completeness, 142
bicharacteristic, 199 completion, 77
Bochner's Theorem, 123, 124 composition, 178
bottom of the spectrum, 133 cone, 185
boundary conditions, 61, 180, 183 conjugate harmonic functions, 70
boundary value problems, 180 conservation of energy, 63
boundedness, 86, 88 conservation of momentum, 69
constant coefficient operators, 167,
calculus of residues, 39 175
Cantor function, 104 continuity of, 85

209
210 Index

contour integral, 39 elliptic partial differential equations,

convolution, 32, 33, 38, 52, 93 166
cotangent, 195 energy, 63
cotangent bundle, 195 entire functions, 113
creation and annihilation operators, equipartition of energy, 108
132 equivalent, 12, 177
crystal, 120 equivalent norm, 164
cut-off function, 93 Euler, 148
even, 24
V,3 expectation, 122
V', 4 exponential type, 113
Daubechies wavelets, 145 extension, 96, 98
decay at infinity, 29, 50
decrease at infinity, 36 F,29
degree, 101 finite order, 79
derivative of a convolution, 53 finite propagation speed, 71, 74
diagonal matrix, 135 finite speed of propagation, 149, 187
diagonalizes, 135 focusing of singularities, 67
differentiability, 36 forward light cone, 185
differential equations, 21, 33 Fourier analysis, 209
differentiation, 31, 32 Fourier cosine formula, 136
dilation equation, 145 Fourier integrals, 26
dilations, 23 Fourier inversion formula, 28, 29,
Dirac 8-function, 5, 14 35, 46, 53
directional derivatives, 182 Fourier series, 26, 62
Dirichlet boundary conditions, 70 Fourier sine transform, 137
Dirichlet problem, 58 Fourier transform, viii, 28, 209
distribution theory, vii, 209 Fourier transform of a Gaussian, 38
distributions, 4, 5, 7, 11, 12,43, 192 Fourier transform of a tempered dis-
distributions of compact support, 76 tribution, 44, 45
distributions with point support, 80 Francois Viete, 148
doctrine of microlocal myopia, 168, free Schrodinger equation, 68
177 freeze the coefficients, 175
dual cone, 185 full symbol, 167, 176
dual lattice, 120 functional analysis, 163
duality, 166 fundamental domain, 120
Duhamel's integral, 71 fundamental solution, 56, 168, 173,
180, 184
£, 76
£',76 Gabor transform, 130
eigenfunction, 131 Gaussian, 37
eigenvalue, 42, 131 generalized functions, 1, 2
elliptic, 166, 176 generating function identity, 150
elliptic operator, 182 groundstate, 133
Index 211

HOlder conditions, 165 kinetic energy, 63, 108

HOlder continuity, 160 Klein-Gordon equation, 70, 149
Haar function, 141
Haar series expansion, 142 ladder of eigenfunctions, 133
harmonic, 22, 56, 168 Laplace equation, 8, 22, 56, 58
harmonic oscillator, 132 Laplacian, 8, 166, 180, 181
Hausdorff-Young inequality, 111 lattice, 119
heat equation, 41, 60 Laurent Schwartz, vii, 7
heat operator, 203 Lebesgue, 163
Heaviside function, 9, 77 Lebesgue integration theory, vii, 11
Heisenberg uncertainty principle, 126 Leibniz' formula, 30
Henri Lebesgue, vii light cone, 185
Hennite expansion, 134 light rays, 200
Hennite functions, 131, 134 limits of a sequence of distributions,
Hennite operator, 204 92
Hennite polynomial, 134 linear functionals, 4
Hennitian, 123 linear operator, 13
Hennitian operators, 127 linear transfonn, 13
Hilbert space, 17, 163 linearity, 2
Hilbert transfonn, 55, 204 Liouville's theorem, 57
homogeneous, 24,54, 101 Lipschitz condition, 162
Huyghens' principle, 66, 71, 74 local coordinate system, 99
hyperbolic, 184 local operator, 198
hyperbolic differential equations, 117 local theory, 95
hyperbolic equations, 184 localization, 141
Hyperbolic operators, 181 locally integrable, 12
hypersurface, 183 location of singularities, 170
hypoellipticity, 170, 180 logarithmic potential, 56
LP norm, 154
index, 180
infinite order, 79 manifold, 99, 180
infinite product, 147 mathematical physics, viii
inhomogeneous heat equation, 70 maximum speed of propagation, 66,
initial conditions, 62 116
integrability, 153 Maxwell's equations, 66, 71
integrable, 12, 106 mean-square, 28
integral, 103 measure theory, vii
integral operator, 169, 176 method of descent, 65
interpolation, 111 microlocal hypoellipticity, 198
inverse Fourier transfonn, 29 microlocal analysis, 168, 209
microlocal operator, 198
Jacobian, 195 microlocal parametrix, 199
microlocal smoothness, 190
kernel,33 moments, 42
212 , Index

momentum, 68, 127 Poisson summation formula, 117, 135

Monte Carlo approximation, 122 polar coordinates, 98
multi-index notation, 78 positive, 83
multiplication, 18, 31, 32 positive definite, 123
multiplication of, 192 positive definite function, 123
multiplicity, 133 positive distribution, 122
positive measures, 84
Neumann boundary conditions, 70 potential, 8, 56
Neumann problem, 180 potential energy, 63, 108
Newtonian potential, 57 principal type, 199
non characteristic, 182, 183 probability measure, 122
nonnegative, 12, 83 product, 18
nonnegative definite, 123 product rule, 173
norm, 86, 154 progagation of singularities, 200
normal direction, 183, 190 proper, 185
null bicharacteristics, 200 pseudodifferentialoperator, 175, 176,
209
observables, 127 Pseudolocal property, 178
odd, 24 '1jJDO,I77
open set, 3
operation, 12, 44 quantum mechanics, 67, 126, 135
operations on, 12 quantum theory, 67, 126
order, 78, 79, 167 quasicrystals, 120, 209
order of a distribution, 79
orthogonality, 142 radial, 25, 41
orthonormal family, 142 radial function, 135
orthonormal system, 134 rapidly decreasing, 29
real analytic, 183
Paley-Wiener theorems, 112 recursion relation, 140, 150
parametrix, 168, 176, 180, 198 restriction, 95, 97, 188
Parseval identity, 134 restrictions of, 196
Parseval's identity, 26 Riemann, vii
partial differential equations, viii Riemann-Lebesgue lemma, 106
partial Fourier transform, 58 Riemannian metric, 196
partition of unity, 80 Riesz representation theorem, 84
periodic, 27 Riesz transforms, 204
periodic boundary conditions, 61 rotation, 24, 41
periodization, 118
perturbation series, 169 5,29
ping-pong table, 34 5 ' ,43
pitch, 130 sampling, 131
Plancherel formula, 28, 29, 45 scaling function, 145
Planck's constant, 68, 127 scaling identity, 145
Poisson integral formula, 59 SchrOdinger's equation, 41, 67, 135
Index 213

Schwartz class, 29 symbolic completeness, 178

self-adjoint, 132
signal processing, 130 tangential direction, 190
singular support, 170, 192 tangential directions, 183
singularities, 168, 198 Taylor expansion, 80
smooth manifolds, 97 temperature, I, 60
smooth surfaces, 97 tempered, 7, 43
smoothing process, 53 test functions, I, 2
smoothness, 29, 50 0- functions, 119
Sobolev embedding theorem, 155, top-order symbol, 167, 176
163 translation, 31
Sobolev inequalities, 153 translations of, 13
Sobolev space, 162, 179 traveling wave, 21
Sobolev space norm, 163 triangle inequality, 154
Sobolev theory, 153
spacelike, 186 uniformly elliptic, 176
special functions, 209
vanishes at infinity, 106
spectral theory, 132
variable coefficient operators, 175
spectrum, 133
variance, 126
sphere, 97
vibrating string equation, 8, 9, 21
spherical coordinates, 98
spherical harmonics, 82 wave equation, 62, 108, 184, 185,
strictly hyperbolic, 187 201
strip-projection, 120 wave front set, 189, 191
structure theorem, 77, 90 wave function, 67, 126
structure theorem for V', 78 wave operator, 187, 200
structure theorem for £', 78 wavelet, 141, 145,209
structure theorem for S', 78 weak solution, 21
summabiJity, 37 well-posed, 184
sup-norm, 86
support, 73, 170 x-ray diffraction, 120
surface integrals, 64
symbol, 167 Zygmund class, 162, 203

The Theory of Matrices - Lancaster PDF
100% (2)
The Theory of Matrices - Lancaster PDF
588 pages
(Student Mathematical Library 050) Mike Mesterton-Gibbons-A Primer On The Calculus of Variations and Optimal Control Theory-American Mathematical Society (2009)
100% (1)
(Student Mathematical Library 050) Mike Mesterton-Gibbons-A Primer On The Calculus of Variations and Optimal Control Theory-American Mathematical Society (2009)
274 pages
Foundation School Manual
85% (27)
Foundation School Manual
72 pages
Classical Descriptive Set Theory Alexander Springer 1995
No ratings yet
Classical Descriptive Set Theory Alexander Springer 1995
414 pages
(Science 6 WK 3 L5) - Perform Experiments Affecting Solubility
No ratings yet
(Science 6 WK 3 L5) - Perform Experiments Affecting Solubility
35 pages
A Guide To Distribution Theory and Fourier Transforms - Robert S Strichartz - 9780849382734
100% (2)
A Guide To Distribution Theory and Fourier Transforms - Robert S Strichartz - 9780849382734
222 pages
PDE Sci Eng PDF
No ratings yet
PDE Sci Eng PDF
192 pages
A Garden of Integrals (Burk) PDF
No ratings yet
A Garden of Integrals (Burk) PDF
9 pages
Tensor Analysis Springer
No ratings yet
Tensor Analysis Springer
66 pages
Bosq Nguyen A Course in Stochastic Processes PDF
100% (1)
Bosq Nguyen A Course in Stochastic Processes PDF
354 pages
Holiday Rambler
0% (1)
Holiday Rambler
9 pages
A Guide To Distribution Theory and Fourier Transforms by Robert S. Strichartz
No ratings yet
A Guide To Distribution Theory and Fourier Transforms by Robert S. Strichartz
238 pages
Ziemer-Modern Real Analysis
No ratings yet
Ziemer-Modern Real Analysis
403 pages
Courant - Dirichlet's Principle, Conformal Mapping, & Minimal Surfaces
No ratings yet
Courant - Dirichlet's Principle, Conformal Mapping, & Minimal Surfaces
339 pages
(Brian S Thomson) The Calculus Integral PDF
100% (1)
(Brian S Thomson) The Calculus Integral PDF
710 pages
Problems and Exercises in Integral Equations-Krasnov-Kiselev-Makarenko
100% (3)
Problems and Exercises in Integral Equations-Krasnov-Kiselev-Makarenko
224 pages
0b755df5-44c6-48da-9ad3-bafb0798629c
100% (5)
0b755df5-44c6-48da-9ad3-bafb0798629c
15 pages
Mathematics 281: Leonard Evans
No ratings yet
Mathematics 281: Leonard Evans
716 pages
gsm105 Endmatter
No ratings yet
gsm105 Endmatter
35 pages
Measure Theory
No ratings yet
Measure Theory
33 pages
Problems From Topology Proceedings Topology Atlas
No ratings yet
Problems From Topology Proceedings Topology Atlas
222 pages
Analysis Tools With Applications - Bruce Driver
No ratings yet
Analysis Tools With Applications - Bruce Driver
790 pages
Calculus 1 - 4
No ratings yet
Calculus 1 - 4
500 pages
Algebra Part II by G Chrystal
100% (2)
Algebra Part II by G Chrystal
640 pages
A Course of Modern Analysis
100% (1)
A Course of Modern Analysis
621 pages
Maths
No ratings yet
Maths
1,061 pages
Numerical Analysis Introduction
100% (1)
Numerical Analysis Introduction
24 pages
Problems in Mathematical Analysis - Demidovich - 2nd Edition PDF
No ratings yet
Problems in Mathematical Analysis - Demidovich - 2nd Edition PDF
505 pages
Advanced Calculus by George A Gibson
100% (1)
Advanced Calculus by George A Gibson
534 pages
Goldreich O. Introduction To Complexity Theory
100% (1)
Goldreich O. Introduction To Complexity Theory
375 pages
Geometric Measure Theory by The Book - Notes, Articles and Books by Kevin R. Vixie
No ratings yet
Geometric Measure Theory by The Book - Notes, Articles and Books by Kevin R. Vixie
5 pages
Lukes-Maly - Measure and Integral PDF
100% (3)
Lukes-Maly - Measure and Integral PDF
232 pages
A First Course of Partial Differential in Physical Sciences and Engineering
No ratings yet
A First Course of Partial Differential in Physical Sciences and Engineering
279 pages
Bellman Dynamic Programming (1957)
100% (1)
Bellman Dynamic Programming (1957)
365 pages
Craig A. Tracy - Lectures On Differential Equations
No ratings yet
Craig A. Tracy - Lectures On Differential Equations
165 pages
Neural Network Fundamentals With Graphs
No ratings yet
Neural Network Fundamentals With Graphs
6 pages
Modern Integration Theory and Applications PDF
No ratings yet
Modern Integration Theory and Applications PDF
52 pages
Introductory Mathematical Analysis For Quantitative Finance: Daniele Ritelli and Giulia Spaletta
No ratings yet
Introductory Mathematical Analysis For Quantitative Finance: Daniele Ritelli and Giulia Spaletta
237 pages
Physics - Feynman's Path Integral in Quantum Theory
No ratings yet
Physics - Feynman's Path Integral in Quantum Theory
10 pages
Asymptotic Analysis
No ratings yet
Asymptotic Analysis
56 pages
Point Set Topology
100% (1)
Point Set Topology
327 pages
Hilbert Space Problems
100% (2)
Hilbert Space Problems
77 pages
(Lalley) Random Walks On Infinite Groups
100% (1)
(Lalley) Random Walks On Infinite Groups
373 pages
Elementary Abstract Algebra
No ratings yet
Elementary Abstract Algebra
105 pages
Involute and Evolute PDF
No ratings yet
Involute and Evolute PDF
80 pages
Honors Calculus
100% (2)
Honors Calculus
349 pages
Gamma Function
No ratings yet
Gamma Function
15 pages
Zach Intermediate Logic
100% (1)
Zach Intermediate Logic
170 pages
High Quality Online Texts and Notes:: Sition To Advanced Mathematics, 2nd Ed., Addison Wesley, Reading, MA
No ratings yet
High Quality Online Texts and Notes:: Sition To Advanced Mathematics, 2nd Ed., Addison Wesley, Reading, MA
8 pages
Advanced Calculus
100% (2)
Advanced Calculus
730 pages
(Graduate Texts in Mathematics 12) Richard Beals (Auth.)-Advanced Mathematical Analysis_ Periodic Functions and Distributions, Complex Analysis, Laplace Transform and Applications-Springer-Verlag New
100% (5)
(Graduate Texts in Mathematics 12) Richard Beals (Auth.)-Advanced Mathematical Analysis_ Periodic Functions and Distributions, Complex Analysis, Laplace Transform and Applications-Springer-Verlag New
241 pages
Distribution Theory Lolina PDF
100% (1)
Distribution Theory Lolina PDF
91 pages
Foubook PDF
No ratings yet
Foubook PDF
341 pages
012 - Advanced Mathematical Analysis PDF
No ratings yet
012 - Advanced Mathematical Analysis PDF
243 pages
Distributions in Physics and Engineering - Saichev, Woyczyński
No ratings yet
Distributions in Physics and Engineering - Saichev, Woyczyński
345 pages
Distributions
No ratings yet
Distributions
20 pages
Speaking Foreign Language
No ratings yet
Speaking Foreign Language
4 pages
Harmonic Analysis
No ratings yet
Harmonic Analysis
27 pages
Distributions2022 Chap1-2
No ratings yet
Distributions2022 Chap1-2
124 pages
GAL Daniel Eceizabarrena - Distribution Theory and Fundamental Solutions of Differential Operators
No ratings yet
GAL Daniel Eceizabarrena - Distribution Theory and Fundamental Solutions of Differential Operators
73 pages
Functional Analysis
100% (7)
Functional Analysis
209 pages
A Panorama of Harmonic Analysis
No ratings yet
A Panorama of Harmonic Analysis
61 pages
Real Analysis by Liu F C
No ratings yet
Real Analysis by Liu F C
319 pages
Applied Geophysics: Course Instructor
No ratings yet
Applied Geophysics: Course Instructor
3 pages
Slow Burns A Qualitative Study of Burn Pit and Toxic 8284
No ratings yet
Slow Burns A Qualitative Study of Burn Pit and Toxic 8284
7 pages
Magick and Hypnosis
No ratings yet
Magick and Hypnosis
112 pages
Revealing The Secrets of Tibetan Inner Fire Meditation
100% (3)
Revealing The Secrets of Tibetan Inner Fire Meditation
4 pages
Chakra OTTIMA DISANIMA
No ratings yet
Chakra OTTIMA DISANIMA
25 pages
Preview of Road and Off Road Vehicle System Dynamics Handbook
No ratings yet
Preview of Road and Off Road Vehicle System Dynamics Handbook
2 pages
VCC Syllabus
No ratings yet
VCC Syllabus
2 pages
Kampala Campus Attendance Firm Lists Ay 2024-2025 RM
No ratings yet
Kampala Campus Attendance Firm Lists Ay 2024-2025 RM
54 pages
Document 35
No ratings yet
Document 35
25 pages
Secretary General
No ratings yet
Secretary General
11 pages
Why do we Read High Fiction Project (Harvard)
No ratings yet
Why do we Read High Fiction Project (Harvard)
70 pages
Gizi Pada Ibu Hamil Dan Menyusui
No ratings yet
Gizi Pada Ibu Hamil Dan Menyusui
9 pages
Chapitre Aw Beef Catthe PDF
No ratings yet
Chapitre Aw Beef Catthe PDF
10 pages
Sample of Technical Proposal For A Project
No ratings yet
Sample of Technical Proposal For A Project
5 pages
1987: Present Filipino Alphabet: Graduate School
No ratings yet
1987: Present Filipino Alphabet: Graduate School
7 pages
ObliCon Chapter III Sections 1 4
No ratings yet
ObliCon Chapter III Sections 1 4
23 pages
Personal Selling
100% (2)
Personal Selling
30 pages
Class-Vi Final Term Datesheet & Curriculum
No ratings yet
Class-Vi Final Term Datesheet & Curriculum
4 pages
Ce - Ii - 1
No ratings yet
Ce - Ii - 1
50 pages
Mahendra Kumar Pathak Oht
No ratings yet
Mahendra Kumar Pathak Oht
3 pages
I N D O N E S I A Indones IA: Fun Facts About Indonesia
No ratings yet
I N D O N E S I A Indones IA: Fun Facts About Indonesia
2 pages
Is 15537 2004
100% (1)
Is 15537 2004
14 pages
NAS620
No ratings yet
NAS620
2 pages
GUBAR DefiningChildrensLiterature 2011
No ratings yet
GUBAR DefiningChildrensLiterature 2011
9 pages
Bagan Temple Location 21 9
No ratings yet
Bagan Temple Location 21 9
60 pages
04 Sunbeam InstrManual SPD Easy Print
No ratings yet
04 Sunbeam InstrManual SPD Easy Print
32 pages
The Ambassador's Daughter
No ratings yet
The Ambassador's Daughter
150 pages
Action Research Project Proposal
100% (2)
Action Research Project Proposal
2 pages
(Solution manual) Elementary Surveying An Introduction to Geomatics 15th Edition download
100% (1)
(Solution manual) Elementary Surveying An Introduction to Geomatics 15th Edition download
38 pages
Piles - Axial Capacity and Axial Response
100% (1)
Piles - Axial Capacity and Axial Response
41 pages
Seating Plan for Monday, May 26, 2025
No ratings yet
Seating Plan for Monday, May 26, 2025
127 pages
First, Confirm The Following Items:: Setup Guide
No ratings yet
First, Confirm The Following Items:: Setup Guide
20 pages
Self Monitoring and Social Categorization (PSY 411)
No ratings yet
Self Monitoring and Social Categorization (PSY 411)
11 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.