0% found this document useful (0 votes)
90 views76 pages

Ye - Lecture Notes On Real Analysis

This document outlines lecture notes on real analysis. It covers topics including sets, functions, cardinality, topology of metric spaces, measure theory, measurable functions, Lebesgue integrals, signed measures, differentiation, Lp spaces, and probability theory. The notes are provided to students taking a class on real analysis and will be continuously updated by the instructor.

Uploaded by

Vladimir Egorov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views76 pages

Ye - Lecture Notes On Real Analysis

This document outlines lecture notes on real analysis. It covers topics including sets, functions, cardinality, topology of metric spaces, measure theory, measurable functions, Lebesgue integrals, signed measures, differentiation, Lp spaces, and probability theory. The notes are provided to students taking a class on real analysis and will be continuously updated by the instructor.

Uploaded by

Vladimir Egorov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Lecture Notes on Real Analysis

Xiaojing Ye

Contents
1 Preliminaries 3
1.1 Basics of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Cardinality of sets . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Topology of metric spaces . . . . . . . . . . . . . . . . . . . . . . 8

2 Measure and Measurable Sets 13


2.1 σ-algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Outer measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Measurable sets and Lebesgue measure . . . . . . . . . . . . . . . 14
2.4 Non-measurable sets . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Measurable Functions 21
3.1 Extended real numbers . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Simple functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Convergence almost everywhere . . . . . . . . . . . . . . . . . . . 25
3.4 Convergence in measure . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Measurable functions and continuous functions . . . . . . . . . . 28
3.6 Measurability of composite functions . . . . . . . . . . . . . . . . 29

4 Lebesgue Integrals 30
4.1 Integral of simple nonnegative functions . . . . . . . . . . . . . . 30
4.2 Integral of general nonnegative functions . . . . . . . . . . . . . . 31
4.3 Integral of general functions . . . . . . . . . . . . . . . . . . . . . 33
4.4 Relation between Riemann and Lebesgue integrals . . . . . . . . 39
4.5 Iterated integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.6 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5 Signed Measures and Differentiations 45


5.1 Signed measure and decomposition . . . . . . . . . . . . . . . . . 45
5.2 Radon-Nikodym theorem . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4 Functions of bounded variation . . . . . . . . . . . . . . . . . . . 53
5.5 Absolute continuity . . . . . . . . . . . . . . . . . . . . . . . . . . 57

1
6 Lp Spaces 60
6.1 Important inequalities . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2 Lp space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.3 L2 space and inner product . . . . . . . . . . . . . . . . . . . . . 65
6.4 Dual space of Lp . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7 Probability Theory 71
7.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2 The law of large numbers . . . . . . . . . . . . . . . . . . . . . . 73
7.3 Central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . 75

These notes outline the materials covered in class. Detailed derivations and
explanations are given in lectures and/or the referenced books. The notes will
be continuously updated with additional content and corrections. Questions
and comments can be addressed to xye@gsu.edu.

2
1 Preliminaries
1.1 Basics of sets
A set A is a collection of elements with certain properties
P , commonly written
as A = {x : x satisfies P } (e.g., A = x ∈ R : x2 > 1 ). Recall the following
definitions: subset A ⊂ B, union A ∪ B, intersection A ∩ B, complement Ac ,
empty set ∅, equal sets A = B, set minus A \ B = A ∩ B c .
Example 1.1. The following statements hold:
• A ∩ B ⊂ A ⊂ A ∪ B for any A, B.
• A ⊂ B iff B c ⊂ Ac .
• A ∩ B = ∅ iff A ⊂ B c .
• Let A, B ⊂ X. If E ∩ A = E ∪ B for any E ⊂ X, then A = X and B = ∅.
We frequently consider a set of sets, and call it a family (or collection) of sets:
F = {Aα : α ∈ I}, where I is the index set. Here I can be finite {1, . . . , n},
countably infinite N, or uncountably infinite. We also work with union and
intersection of multiple (often infinitely many) sets:

∪ Aα = {x : ∃ α ∈ I, s.t. x ∈ Aα } and ∩ Aα = {x : x ∈ Aα , ∀ α ∈ I}
α∈I α∈I

The union and intersection satisfy the distributive law :


   
A ∩ ∪ Bα = ∪ (A ∩ Bα ) and A ∪ ∩ Bα = ∩ (A ∪ Bα )
α∈I α∈I α∈I α∈I

Example 1.2. Let Ak = [a + k1 , b], then ∪∞ 1


k=1 Ak = (a, b]. Let Ak = (a, b + k ),

then ∩k=1 Ak = (a, b].
Example 1.3. Let Aα = [0, − log α) where α ∈ I = (0, 1] ⊂ R, then ∪α∈I Aα =
[0, ∞) and ∩α∈I Aα = {0}.
Example 1.4. Suppose f : [a, b] → R. Show that {x ∈ [a, b] : |f (x)| > 0} =
∪∞ 1
n=1 {x ∈ [a, b] : |f (x)| > n }

Theorem 1.5 (De Morgan’s law). (∩α∈I Aα )c = ∪α∈I Acα and (∪α∈I Aα )c =
∩α∈I Acα .
Example 1.6. Some basic tricks in proofs.
• Use of Venn diagram. For example, define the symmetric difference of A
and B by A△B = (A \ B) ∪(B \ A), show A△B = (A ∪ B) \ (A ∩ B).
• A ⊂ B iff x ∈ A ⇒ x ∈ B.
• A = B iff A ⊂ B and B ⊂ A.
Definition 1.7 (Limit of a sequence of monotone sets). Suppose A1 ⊃ A2 ⊃
· · · Ak ⊃ · · · , then we say {Ak } is non-increasing or simply decreasing (to be
distinguished from strictly decreasing where Ak+1 ⊊ Ak for all k), and ∩∞ k=1 Ak
is called the limit of {Ak }, denoted by limk→∞ Ak or simply limk Ak . Similarly,
suppose A1 ⊂ · · · Ak ⊂ · · · , then {Ak } is non-decreasing or simply increasing,
and ∪∞ k=1 Ak is the limit of {Ak }, also denoted by limk Ak .

3
Example 1.8. Let Ak = [k, ∞) ⊂ R for k = 1, . . . ,, then limk Ak = ∅.
Example 1.9. Suppose {fk } is a sequence of real-valued functions defined on
R, and f1 (x) ≤ f2 (x) ≤ · · · ≤ fk (x) ≤ · · · and fk (x) → f (x) as k → ∞ for
every x ∈ R. For any t ∈ R, define Ak = {x ∈ R : fk (x) > t}. Show that {Ak }
is increasing, and limk Ak = {x ∈ R : f (x) > t}.
Proof. It is clear that Ak is increasing and limk Ak ⊂ A := {x ∈ R : f (x) > t}.
For every x ∈ A, there are f (x) > t, and fk (x) ↑ f (x) as k → ∞. Hence
let ϵ = (f (x) − t)/2 > 0, then there exists k ′ such that fk′ (x) > f (x) − ϵ =
(f (x) + t)/2 > t, and therefore x ∈ Ak′ ⊂ ∪∞k=1 Ak = limk Ak .

Definition 1.10 (Upper and lower limit of a sequence of sets). Suppose {Ak }
is a sequence of sets. Denote Bj = ∪k≥j Ak , then {Bj } is non-increasing. The
upper limit of {Ak } is denoted by
∞ ∞ ∞
lim sup Ak = lim Bk = ∩ Bj = ∩ ∪ Ak
k→∞ k→∞ j=1 j=1 k=j

Similarly, the lower limit of {Ak } is denoted by


∞ ∞
lim inf Ak = ∪ ∩ Ak
k→∞ j=1 k=j

Note that x ∈ lim supk→∞ Ak = ∩∞


j=1∪∞
k=j Ak means that: ∀ j ≥ 1, ∃ k ≥ j,
such that x ∈ Ak . Similar for the lower limit.
Example 1.11. Show that lim inf k→∞ Ak ⊂ lim supk→∞ Ak .
Proof. If x ∈ lim inf k→∞ Ak , then there exists j ≥ 1, such that x ∈ Ak for all
k ≥ j, which obviously implies that x ∈ lim supk→∞ Ak .
Example 1.12. Suppose fn , f : R → R. Show that
∞ ∞ ∞
n 1o
{x ∈ R : lim fn (x) ̸= f (x)} = ∪ ∩ ∪ x ∈ R : |fn (x) − f (x)| ≥ .
n→∞ k=1 N =1 n=N k
Proof. Note that fn (x) ↛ f (x) at x means that there exists ϵ0 > 0 (or k ∈ N
such that k1 ≤ ϵ0 ), such that for any N ∈ N, there is |fn (x) − f (x)| ≥ ϵ0 ≥ k1 for
some n ≥ N . Therefore, “there exists k ≥ 1 (∪∞ k=1 ), such that for any N ≥ 1
(∩∞ ∞ 1
N =1 ), there exists an n ≥ N (∪n=N ) for which |fn (x) − f (x)| ≥ k .”

Example 1.13. Suppose fn (x) → f (x) for every x ∈ R. Show that, for any
t ∈ R, there is
∞ ∞ ∞
n 1o
{x ∈ R : f (x) ≤ t} = ∩ ∪ ∩ x ∈ R : fn (x) ≤ t + .
k=1 N =1 n=N k
Definition 1.14 (Cartesian product). The Cartesian product of A and B is
A × B = {(a, b) : a ∈ A, b ∈ B}.
Definition 1.15. A few examples of Cartesian product:
• A = {1, 2, 3}, B = {4, 5}, then A×B = {(1, 4), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5)}.
• [0, 1] × [0, 1] = {(x, y) : 0 ≤ x, y ≤ 1}.

4
1.2 Functions
Definition 1.16. We have a series of defintions regarding functions:
• Let X and Y be two sets. f : X → Y is called a function (or mapping, or
transformation) if f assigns every x ∈ X to one element y ∈ Y .
• Let A ⊂ X, then f (A) = {y ∈ Y : y = f (x) for some x ∈ X} is called the
image of A under f . Let B ⊂ Y , then f −1 (B) = {x ∈ A : f (x) ∈ B} is
called the inverse image (or pre-image) of B under f .
• X is called the domain of f . f (X) is the range of f .
• If f (X) = Y then f is called a mapping from X onto Y (or f is surjective).
If x1 ̸= x2 implies f (x1 ) ̸= f (x2 ) then f is called one-to-one (or f is
injective).
• If f is both injective and surjective, then f is called bijective, or a one-to-
one correspondence between X and Y . In this case f −1 exists and is also
a one-to-one correspondence.
• Suppose f : X → Y and g : Y → Z, then g ◦ f : X → Z is called the
composition of f and g, defined by (g ◦ f )(x) = g(f (x)).

Example 1.17 (Characteristic function). Suppose A ⊂ X. Define the charac-


teristic function of A by
(
1, if x ∈ A,
χA (x) =
0, if x ∈ Ac .

Then one can verify the following statements for every x ∈ X:


• A ⊂ B ⇒ χA (x) ≤ χB (x)
• χA ∪ B (x) = χA (x) + χB (x) − χA ∩ B (x)
• χA ∩ B (x) = χA (x)χB (x)
• χA\B (x) = χA (x)(1 − χB (x))
• χA△B (x) = |χA (x) − χB (x)|

1.3 Cardinality of sets


We denote |A| the cardinal number of A (informally the “number of elements
in A”). This is clear if A is finite. However it is not obvious if A is infinite. We
need the help of functions to “count” |A|.

Definition 1.18. X and Y is said to have the same cardinal number if there
exists a one-to-one correspondence f : X → Y . In this case, we denote X ∼ Y .
Then it is obvious that ∼ represents an equivalence relation: (i) A ∼ A; (ii)
A ∼ B iff B ∼ A; (iii) If A ∼ B and B ∼ C, then A ∼ C.

Example 1.19. Sets with the same cardinality.


• N ∼ Z ∼ {0, 1, . . . } ∼ {2n : n ∈ N}.
• N × N ∼ N by setting f ((i, j)) = 2i−1 · (2j − 1) (since every integer n can
be uniquely represented by n = 2p · q for some nonnegative integer p and
odd integer q).

5
• Q ∼ N.
• (−1, 1) ∼ R by setting f (x) = x
1−x2 for x ∈ (−1, 1). [Or f (x) = tan( π2 x).]
Lemma 1.20 (Decomposition of sets by functions). Suppose f : X → Y and
g : Y → X. Then there exist A1 , A2 ⊂ X and B1 , B2 ⊂ Y , such that f (A1 ) =
B1 , g(B2 ) = A2 , A1 ∩ A2 = ∅, B1 ∩ B2 = ∅, A1 ∪ A2 = X, and B1 ∪ B2 = Y .

Proof. Define Γ := {E ⊂ X : E ∩ g(Y \ f (E)) = ∅}, A1 := ∪E⊂Γ E, B1 :=


f (A1 ), B2 := Y \ B1 = Y \ f (A1 ), and A2 := g(B2 ). Then it remains to show
that A1 ∩ A2 = ∅ and A1 ∪ A2 = X.
For any E ∈ Γ, we know E ⊂ A1 and hence E ∩ g(Y \ f (A1 )) ⊂ E ∩ g(Y \
f (E)) = ∅. Therefore A1 ∩ g(Y \ f (A1 )) = ∪E∈Γ (E ∩ g(Y \ f (A1 ))) = ∅, i.e.,
A1 ∈ Γ. Hence A1 ∩ A2 = ∅.
If there exists x0 ∈ X \ (A1 ∪ A2 ), then define A = A1 ∪{x0 } and hence
there is B1 = f (A1 ) ⊂ f (A). This implies that Y \ f (A) ⊂ B2 , and hence
g(Y \ f (A)) ⊂ g(B2 ) = A2 and A ∩ g(Y \ f (A)) = ∅ which means A ∈ Γ. This
contradicts to A1 = ∪E∈Γ E.
Theorem 1.21 (Cantor-Bernstein). If U ⊊ X and V ⊊ Y , and X ∼ V and
U ∼ Y , then X ∼ Y .
Proof. Let f : X → V and g : Y → U be one-to-one correspondences. By
Lemma 1.20, there exist A1 , A2 ⊂ X and B1 , B2 ⊂ Y , such that A1 ∩ A2 = ∅,
B1 ∩ B2 = ∅, A1 ∪ A2 = X, B1 ∪ B2 = Y , f (A1 ) = B1 and g(B2 ) = A2 (which
are still one-to-one correspondences as they are restrictions of f and g on A1
and B2 respectively). Define
(
f (x), if x ∈ A1
h(x) = −1
g (x), if x ∈ A2

Then it is clear that h is a one-to-one correspondence between X and Y , so


X ∼Y.
Corollary 1.22. If C ⊂ A ⊂ B and C ∼ B, then C ∼ A ∼ B.

Proof. If C = A or A = B then trivial. Otherwise C ⊊ A ⊊ B, then setting


X = A, Y = B, U = C and V = A in Theorem 1.21 yields A ∼ B.
Example 1.23. (−1, 1) ∼ (−1, 1] ∼ [−1, 1] ∼ R.
Definition 1.24 (Cardinality of N). N is said to have cardinality ℵ0 (pro-
nounced as “aleph zero”). An infinite set of cardinality ℵ0 is called countable;
otherwise called uncountable.
Theorem 1.25 (ℵ0 is the smallest cardinality of infinite sets). Every infinite
set contains a countable set.
Proof. Suppose E is infinite. Then we can pick a1 , a2 . . . , one by one from E,
such that an+1 ∈ E \ {a1 , . . . , an } =
̸ ∅, to get {ak : k ∈ N} ⊂ E.

6
Example 1.26. A few examples of sets of cardinality ℵ0 .
• If A ∼ N and B ∼ N then A ∪ B ∼ N.
• If An ∼ N for every n ≥ 1, then ∪∞
n=1 An ∼ N.
• Q ∼ N. (Note that this only means that it is possible to list the elements
of Q in some order, but not necessarily by their values.)
Example 1.27. The set of mutually disjoint open intervals in R is at most
countable.
Proof. Define the function f that maps each interval to a rational number r in
that interval. Then f is injective to Q.
Example 1.28. If f : R → R is monotone, then {x ∈ R : limy→x− f (y) ̸=
limy→x+ f (y)} is at most countable.
Proof. WLOG, assume non-decreasing. Then for each point in the set above,
there exists rx ∈ Q such that limy→x− f (y) < rx < limy→x+ f (y). Define
g : x 7→ rx , then g is injective.
Example 1.29. If E is a countable subset of R, then ∃ x0 ∈ R such that
E ∩(E + {x0 }) = ∅. [Hint: consider A = {rn − rm : rn , rm ∈ E, n ̸= m} which
is countable, hence ∃ x0 ∈ R \ A.]
Theorem 1.30. If A is an infinite set and B is at most countable, then A ∼
A ∪ B.
Proof. Suppose B = {b1 , b2 , . . . }. Extract a countable set A1 = {a1 , a2 , . . . }
from A, and denote A2 = A \ A1 . Then define

a2i−1 if x = bi ∈ B


f (x) = a2i if x = ai ∈ A1

a if x = a ∈ A2

Hence f : A ∪ B → A is a one-to-one correspondence.


Theorem 1.31. X is infinite iff X ∼ A for some A ⊊ X.
Proof. The necessity is obvious. Extract a finite set B from X and define
A = X \ B, then A is inifite, and A ∼ A ∪ B = X.
Definition 1.32 (Cardinality of R). R is said to have cardinality ℵ1 , also called
cardinality of the continuum c = ℵ1 = 2ℵ0 .
We considerP the cardinality of (0, 1] ∼ R. For every x ∈ (0, 1], it can be

written as x = n=1 a2nn for an ∈ {0, 1} and infinitely many an ’s being 1. To
see this, note that every irrational number is a limit point of rational numbers,
and if ak = 0 for all k > n then we can instead set an = 0 and ak = 1 for all
k > n. We can show that A = {(a1 , a2 , . . . ) : an ∈ {0, 1}} is an uncountable set,
and (0, 1] ∼ A (we only removed a subset of A, consisting of those with finitely
many 1’s, which correspond to some rational numbers that are collectively at
most countable). We can interprete |A| = 2ℵ0 as A is the set of binary sequences.

7
Example 1.33. The following statements hold:
• If |An | = ℵ1 for all n ≥ 1, then | ∪∞
n=1 An | = ℵ1 . [Hint: Ak ∼ (k, k + 1].]
• |Rn | = |R| = ℵ1 . [Hint: Ak = R for k = 1, . . . , n.]
Theorem 1.34 (There is no “cap” on cardinal number). Suppse A ̸= ∅, then
A ≁ 2A := {E : E ⊂ A}.
Proof. If not, then there exists a one-to-one correspondence f : A → 2A . Let
B = {x ∈ A : x ∈ / f (x)}. Since B ∈ 2A , there exists y ∈ A such that f (y) = B.
If y ∈ B, then y ∈ / f (y) = B; if y ∈/ B = f (y), then y ∈ B. Both yield
contraditions.

1.4 Topology of metric spaces


Definition 1.35 (Euclidean space and norm). We denote Rn = {(x1 , . . . , xn ) :
xi ∈ R, ∀ i} the n-dimensional Euclidean space. The norm of x = (x1 , . . . , xn ) ∈
Rn is defined by |x| = (x21 + · · · + x2n )1/2 .
One can verify the following properties of norms:
• |x| ≥ 0; |x| = 0 iff x = (0, . . . , 0).
• |ax| = |a||x| for any a ∈ R.
• |x + y| ≤ |x| + |y|. [Use the Cauchy-Schwarz inequality below.]
n
Theorem 1.36 Pn(Cauchy-Schwarz).
Pn Let x =
P(x 1 , . . . , xn ), y = (y1 , . . . , yn ) ∈ R ,
2 1/2 n 2 1/2
then there is ( i=1 xi yi ) ≤ ( i=1 xi ) ·( i=1 yi ) . In addition, the equality
holds iff x = ay or y = ax for some a ∈ R+ .
Proof. Note that λ2 + bλ +P
c ≥ 0 for all λ iff b2 ≤ 4c. Use this fact and that the
n
quadratic function f (λ) = i=1 (xi + λyi )2 ≥ 0 for all λ.
Definition 1.37 (Metric space). Let X be a set. Then d : X × X → R is called
a distance (or a metric) on X if the followings hold for all x, y, z ∈ X:
• d(x, y) ≥ 0 for all x, y ∈ X; and d(x, y) = 0 iff x = y.
• d(x, y) = d(y, x).
• d(x, y) ≤ d(x, z) + d(y, z).
A set X with a distance d is called a metric space, denoted by (X, d) or simply
X. Throughout this class, we set d(x, y) = |x − y| for x, y ∈ Rn by default.
Definition 1.38. There are a series of definitions given (X, d):
• diam(E) := sup{d(x, y) : x, y ∈ E} is the diameter of E. E is said to be
bounded if diam(E) < ∞.
• For any x ∈ X and δ > 0, B(x, δ) := {y ∈ X : d(x, y) < δ} is called the
open ball with center x and radius δ. B(x, δ) = {y ∈ X : d(x, y) ≤ δ} is
the closed ball.
• x is called an interior point of E if there exists an open ball B(x, δ) ⊂ E
(i.e., there exists δ > 0 such that B(x, δ) ⊂ E).
• E is called open if every point of E is an interior point. E is called closed
if E c is open. [It is easy to show that an open ball B(x, δ) is literally open
by definition, and a closed ball is closed.]

8
• (Only in Rn ) Suppose ai < bi for i = 1, . . . , n, then I = (a1 , b1 ) × · · · ×
(an , bnQ) is called an open box in Rn . The volume of I is denoted by
n
|I| = i=1 (bi − ai ).
• A sequence {xk } in X is said to converge to x if limk→∞ d(xk , x) = 0 (or
simply denoted by xk → x).
• A sequence {xk } in X is said to be Cauchy if for any ϵ > 0, there exists
N , such that d(xn , xm ) < ϵ for all n, m ≥ N .
• Let E be an infinite subset of X. If there exists a sequence of distinct
points {xk } such that xk → x, then x is called a limit point (or accumu-
lation point) of E. [Note that a limit point of E needs not be in E.]
• The set of limit points of E is denoted by E ′ . The union Ē := E ∪ E ′ is
called the clousure of E. [Ē is a closed set; see below.]
• If A ⊂ B and Ā = B, then A is called dense in B, or A is a dense subset
of B.
• If x ∈ E and x is not a limit point of E, then x is called an isolated point
of E (i.e., ∃ δ > 0, such that B(x, δ) ∩ E = {x}).
• If Gα is open for every α ∈ I and E ⊂ ∪α∈I Gα , then {Gα : α ∈ I} is
called an open cover of E.
• E is called compact if every open cover of E contains a finite subcover.
[In Rn , E is compact iff E is closed and bounded; see below.]
Theorem 1.39. x ∈ E ′ iff for any δ > 0 there is (B(x, δ) \ {x}) ∩ E ̸= ∅.
Proof. Necessity is clear. Let δ1 = 1 and select x1 ∈ (B(x, δ1 ) \ {x}) ∩ E. Then
for any k ≥ 1, let δk+1 = 21 d(xk , x) and select xk+1 ∈ (B(x, δk+1 ) \ {x}) ∩ E,
then we obtain a sequence {xk } which are distinct and xk → x, i.e., x ∈ E ′ .
Theorem 1.40. E is closed iff E ′ ⊂ E.
Proof. Suppose E is closed, then E c is open. If x ∈ E ′ \ E, then x ∈ E c and
there exists {xk } ⊂ E and xk → x. But this is a contradiction since x is an
interior point of E c .
Suppose E ′ ⊂ E. For any x ∈ E c , we know x ∈ / E ′ , i.e., there exists δ > 0
c
such that B(x, δ) ∩ E = ∅. Hence B(x, δ) ⊂ E , i.e., x is an interior point of
E c . As x is arbitrary, we know E c is open, and hence E is closed.
Theorem 1.41. Ē is closed.

Proof. For any x ∈/ Ē = E ∪ E ′ , there exists δ > 0 such that B(x, δ) ∩ E = ∅.


If ∃ y ∈ E such that y ∈ B(x, δ), then there exists δ ′ > 0 and x′ ∈ B(y, δ ′ ) ⊂

B(x, δ), contradiction. Hence B(x, δ) ∩ E ′ = ∅. Therefore B(x, δ) ⊂ (E ∪ E ′ )c ,


implying that (E ∪ E ′ )c is open.
Example 1.42. A few examples of limit points.
• Let E = { n1√: n ∈ √
N}. Then E ′ = {0}. All points in E are isolated points.
• Let E = p { m − n : m,√n ∈ N}. Then E ′ = R. [Hint: for any x ∈ R,
2 2
p xn = ⌊(x + n) ⌋ − n (where ⌊x⌋
let := max{n ∈ Z : n ≤ x}). Then
(x + n)2 − 1 − n < xn < x and xn → x.]

9
Theorem 1.43. Let E1 , E2 ⊂ Rn . Then (E1 ∪ E2 )′ = E1′ ∪ E2′ .
Proof. It is clear that Ej′ ⊂ (E1 ∪ E2 )′ for j = 1, 2. If x ∈ (E1 ∪ E2 )′ , then there
exists a sequence of distinct points {xk } ⊂ E1 ∪ E2 , such that xk → x. Then at
least one of E1 and E2 contains a subsequence of {xk } which also converges to
x. Hence (E1 ∪ E2 )′ ⊂ E1′ ∪ E2′ .

Theorem 1.44 (Bolzano-Weierstrass). Every bounded infinite set E of Rn has


at least one limit point.
Qn
Proof. Let E be contained in i=1 [ai , bi ], then by focusing on the first com-
ponents of the points in E we can extract a convergent sequence in [a1 , b1 ] (by
Weierstrass theorem on R) with limit c1 ; then we focus on the second compo-
nents of this sequence and extract a convergent subsequence with limit c2 , and
so on, until we finish the n-th component with cn . Then the sequence have
distinct points and its limit is c = (c1 , . . . , cn ).
Theorem 1.45. f ∈ C(Rn ) iff for every t ∈ R the sets E1 = {x ∈ Rn : f (x) >
t} and E2 = {x ∈ Rn : f (x) < t} are open.

Proof. The necessity is clear. To show the sufficiency, suppose that for every
t both E1c and E2c are closed. If f is not continuous at x0 , then there exists
ϵ0 > 0 and a sequence xk → x0 such that |f (xk ) − f (x0 )| ≥ ϵ0 . WLOG,
suppose f (xk ) ≤ f (x0 ) − ϵ0 for all k. Then set t = f (x0 ) − ϵ0 , we know
E1c = {x ∈ Rn : f (x) ≤ t} is closed, which is a contradiction since xk → x0 and
{xk } ⊂ E1c but x ∈
/ E1c .
Theorem 1.46 (Operations on open and closed sets). Union of (finitely or
infinitely many) open sets is open; Intersection of finitely many open sets is
open. Contrary for closed sets. Namely,
• If Fα is closed and Gα is open for every α ∈ I, then ∩α∈I Fα is closed,
and ∪α∈I Gα is open.
• If Fk is closed and Gk is open for k = 1, . . . , n, then ∩nk=1 Fk is closed,
and ∪nk=1 Gk is open.
Proof. We only show this for open sets. Then applying De Morgan’s law implies
those for closed sets.
For every x ∈ ∪α∈I Gα , there exists α′ ∈ I such that x ∈ Gα′ , and hence
∃ B(x, δ) ⊂ Gα′ ⊂ ∪α∈I Gα .
For every x ∈ ∩nk=1 Gk , there exist δk > 0 such that B(x, δk ) ⊂ Gk for every
k = 1, . . . , n. Let δ = min{δk : k = 1, . . . , n} > 0 (require finiteness!) then
B(x, δ) ⊂ Gk for all k.
Definition 1.47 (Gδ -set and Fσ -set). We call H = ∩∞ k=1 Gk a Gδ -set if Gk is
open for all k, and K = ∪∞
k=1 Fk an Fσ -set if Fk is closed for all k.

Theorem 1.48 (Compact sets are closed). If K is a compact set, then K is


closed.

10
Proof. It suffices to show that K c is open, i.e., every point of K c is an interior
point. For every y ∈ K c and x ∈ K, denote δx = 12 d(x, y) > 0. Then {B(x, δx ) :
x ∈ K} is an open cover of K, and hence has a finite subcover {B(xi , δxi ) : 1 ≤
i ≤ k} of K. Let δ := min1≤i≤k δxi (which is > 0), then B(y, δ) ∩ B(xi , δxi ) = ∅
for all i. Hence B(y, δ) ⊂ K c , i.e., y is an interior point of K c .

Theorem 1.49 (Closed subsets of a compact set are compact). If F ⊂ K, F


is closed and K is compact, then F is compact.
Proof. Let {Gα : α ∈ I} be any open cover of F , then {Gα , F c : α ∈ I} is an
open cover of X and hence also of K. As K is compact, there is a finite subcover
{Gαi , F c : 1 ≤ i ≤ k} of K. Therefore {Gαi : 1 ≤ i ≤ k} is a finite subcover of
F by noting that F c does not help covering F .
Theorem 1.50 (Cantor). Suppose Fk ̸= ∅ is compact for every k, and F1 ⊃
F2 ⊃ · · · ⊃ Fk ⊃ . . . , then ∩∞
k=1 Fk ̸= ∅.

Proof. Supose not, then F1 ⊂ X = (∩∞ c ∞ c c


k=1 Fk ) = ∪k=1 Fk . Therefore {Fk } is
c :
an open cover of F1 . Hence there exists a finite subcover {Fki 1 ≤ i ≤ l} of
F1 , i.e, F1 ⊂ ∪li=1 Fkci . Therefore ∩li=1 Fki ⊂ F1c . But ∩li=1 Fki ⊂ F1 . Hence
∩li=1 Fki = ∅, which is a contradiction since Fk ̸= ∅ for all k.
Lemma 1.51. I = [a1 , b1 ] × · · · × [an , bn ] is compact in Rn .

Proof. Suppose not, then I has an infinite open cover A = {Gα : α} which
does not have a finite subcover. Perform bisection of each side of I, we obtain
2n closed boxes, and at least one of them cannot be covered by finitely many
open sets in A. Hence we perform bisection of this box again, and obtain a
smaller closed box that does not have a finite subcover either, and so on, which
never ends. It is clear that the size of the box shrinks to 0 and converges
to a point x ∈ I, which must be an interior point of some Gα′ , i.e., ∃ δ > 0
such that B(x, δ) ⊂ Gα′ . Then we should have stopped within finitely many
iterations of the bisection once the box is in B(x, δ) which is covered by Gα′ , a
contradiction.
Theorem 1.52 (Heire-Borel). Bounded closed sets in Rn are compact.

Proof. Let F be closed and bounded. Then there exists a bounded closed box I
such that F ⊂ I. Since I is compact and F is closed, we know F is compact.
Example 1.53. Suppose F ⊂ Rn is closed and bounded, and G ⊂ Rn is
open, and F ⊂ G. Then ∃ δ > 0 such that for every x ∈ B(0, δ), there is
F + {x} := {y + x : y ∈ F } ⊂ G.
Proof. Since every y ∈ F is an interior point of G, we know ∃ δy > 0 such that
δ
B(y, δy ) ⊂ G. Also {B(y, 2y ) : y ∈ F } is an open cover of F , and hence has
δyi
a finite subcover {B(yi , 2 ) : 1 ≤ i ≤ k}. Namely, for every y ∈ F , we know
δyi δyi
y ∈ B(yi , 2 ) for some i ∈ {1, . . . , k}. Set δ = min1≤i≤k 2 . Then for any

11
x ∈ B(0, δ) and y ∈ F , x + y ∈ B(yi , δyi ) for some i ∈ {1, . . . , k}, and hence
x + y ∈ G.
Before closing this section, we consider several examples of distances between
sets.

Definition 1.54 (Distance between sets). d(x, E) := inf{d(x, y) : y ∈ E} and


d(E1 , E2 ) := inf{d(x, y) : x ∈ E1 , y ∈ E2 }.
Example 1.55. Suppose E1 = {x = (x1 , x2 ) ∈ R2 : x2 = 0} and E2 = {x =
(x1 , x2 ) ∈ R2 : x1 x2 = 1}. Then d(E1 , E2 ) = 0.
Theorem 1.56. If F ⊂ Rn is nonempty and closed, and x0 ∈ Rn , then ∃ y ∈ F
such that d(x0 , F ) = d(x0 , y).
Proof. Choose δ > 0 large enough such that K = B(x0 , δ) ∩ F is nonempty.
Define f : Rn → R by f (x) = d(x0 , x) for any x ∈ Rn . Then f is continuous.
Since K is compact, we know f attains its minimum on K at some y ∈ K.
Theorem 1.57. Suppose E ⊂ Rn is nonempty. Then f (x) : Rn → R defined
by f (x) = d(x, E) is uniformly continuous.
Proof. We can even show that f (x) is Lipschitz continuous on Rn , which implies
uniform continuity. To this end, for any x, y ∈ Rn and ϵ > 0, there exists
z ∈ E, such that d(y, z) − ϵ < d(y, E) = f (y) ≤ d(y, z). Hence f (x) − f (y) <
d(x, z) − (d(y, z) − ϵ) = d(x, z) − d(y, z) + ϵ ≤ d(x, y) + ϵ. Since ϵ > 0 is
arbitrary, we know f (x) − f (y) ≤ d(x, y). Similiarly f (y) − f (x) ≤ d(x, y).
Hence |f (x) − f (y)| ≤ d(x, y), i.e., f is 1-Lipschitz.
Corollary 1.58. If F1 , F2 are nonempty and closed, and at least one of them is
bounded, then there exist x1 ∈ F1 and x2 ∈ F2 , such that d(x1 , x2 ) = d(F1 , F2 ).
Proof. If F1 ∩ F2 ̸= ∅, then trivial. Otherwise, WLOG, suppose F1 is bounded
and hence is compact. Define f (x) := d(x, F2 ), then f : Rn → R is continuous
and hence attains minimum over F1 at some x1 ∈ F1 . Note that there exists
δ > 0 such that K = B(x1 , δ) ∩ F2 is nonempty. Since K is compact (since
K is closed and bounded) and g(x) : Rn → R defined by g : x 7→ d(x1 , x)
is continuous, we know g attains minimum over K at some x2 ∈ K. Hence
d(x1 , x2 ) = g(x2 ) = d(x1 , K) = d(x1 , F2 ) = f (x1 ) = d(F1 , F2 ).

Example 1.59. Suppose F1 , F2 ⊂ Rn are nonempty, closed, and disjoint. Then


there exists a continuous function f : Rn → R such that 0 ≤ f ≤ 1, F1 = {x :
f (x) = 1} and F2 = {x : f (x) = 0}. [Hint: f (x) = d(x,Fd(x,F 2)
1 )+d(x,F2 )
.]

12
2 Measure and Measurable Sets
2.1 σ-algebra
Definition 2.1 (σ-algebra). Let X be a nonempty set. Then Γ ⊂ 2X (i.e., Γ
is a collection of subsets of X) is called a σ-algebra of X if:
1. ∅ ∈ Γ;
2. If A ∈ Γ then Ac ∈ Γ;
3. If An ∈ Γ for n = 1, 2, . . . , then ∪∞
n=1 An ∈ Γ.

Given the definition above, it is also easy to verify that the following state-
ments hold if Γ is a σ-algebra of X:
1. X ∈ Γ;
2. If A, B ∈ Γ, then A \ B ∈ Γ;
3. If Ak ∈ Γ for k = 1, . . . , n, then ∪nk=1 Ak ∈ Γ;
4. If An ∈ Γ for k = 1, 2, . . . , then ∩∞
k=1 Ak , lim supk Ak , lim inf k Ak ∈ Γ.

Definition 2.2 (Generated σ-algebra). Suppose Σ ⊂ 2X , and consider A =


{Γ : Σ ∈ Γ, and Γ is a σ-algebra of X} (obviously 2X ∈ A and hence A =
̸ ∅).
Then Γ(Σ) := ∩Γ∈A Γ is called the σ-algebra of X generated by Σ.
Remarks. It is easy to verify that Γ(Σ) is a σ-algebra of X (i.e., Γ(Σ) ∈ A):
for example, if A ∈ Γ(Σ), then A ∈ Γ for all Γ ∈ A. Since every Γ is a σ-algebra,
Ac ∈ Γ. Therefore Ac ∈ Γ(Σ) = ∩Γ∈A Γ. Similar for the other two conditions.
Hence Γ(Σ) is the “smallest” σ-algebra of X containing Σ.
Definition 2.3 (Borel σ-algebra of Rn ). Let Σ = {G ⊂ Rn : G is open}. Then
the σ-algebra Γ(Σ) generated by Σ, formally denoted by B(Rn ) or simply B, is
called the Borel σ-algebra of Rn . A set B ∈ B is called a Borel set.

2.2 Outer measure


Definition 2.4 (Outer measure). Let {Ik : k ∈ N} be a countable set of open
boxes in Rn . We call {Ik }k an open-box-cover of E ⊂ Rn if E ⊂ ∪∞
k=1 Ik . Then
the outer measure of E is defined by

nX o
µ∗ (E) = inf |Ik | : {Ik }k is an open-box-cover of E
k=1
P
Remarks. If k |Ik | = ∞ for every open-box-cover of E, then we define
µ∗ (E) = ∞; otherwise µ∗ (E) < ∞. Note that µ∗ (E) ≥ 0 for any E.
Example 2.5. For any x ∈ Rn , µ∗ ({x}) = 0. For any t ∈ R and E = {x =
(x1 , . . . , xn ) : xi = t, xj ∈ R, ∀ j ̸= i}, there is µ∗ (E) = 0.
¯ = |I|. [Hint: WLOG consider
Theorem 2.6. Let I be an open box, then µ∗ (I)
|I| < ∞. For any ϵ > 0, there exists an open box J such that I ⊂ I¯ ⊂ J and
|J| < |I| + ϵ.]

13
Theorem 2.7 (Properties of outer measure). The outer measure µ∗ in Rn has
the following properties:
1. µ∗ (E) ≥ 0; µ∗ (∅) = 0 [Note that µ∗ (E) = 0 dose not imply E = ∅.]
2. If E1 ⊂ E2 then µ∗ (E1 ) ≤ µ∗ (EP
2 ).

3. (Sub-additivity) µ∗ (∪∞ E
k=1 k ) ≤ ∗
k=1 µ (Ek ).

Proof. The first two are trivial. We only show the sub-additivity. For any ϵ > 0
and any Ek , there exists an open-box-cover {Ik,l : l ∈ N} of Ek , such that

X ϵ
|Ik,l | < µ∗ (Ek ) + .
2k
l=1

Then {Ik,l : k, l ∈ N} is an open-box-cover of E := ∪∞


k=1 Ek , and hence
∞ X
∞ ∞  ∞
X X ϵ  X ∗
µ∗ (E) ≤ |Ik,l | ≤ µ∗ (Ek ) + = µ (Ek ) + ϵ.
2k
k=1 l=1 k=1 k=1

P∞ ∗
Since ϵ > 0 is arbitrary, we have µ (E) ≤ k=1 µ (Ek ).
Example 2.8 (Countable sets have measure 0). If E = {xk ∈ Rn : k ∈ N},
then µ∗ (E) = 0.
Example 2.9. Suppose E ⊂ [a, b] ⊂ R and µ∗ (E) > 0. Then for any t ∈
(0, µ∗ (E)), there exists A ⊂ E such that µ∗ (A) = t. [Hint: Define f (x) =
µ∗ ([a, x) ∩ E) for every x ∈ R. Then show that f is 1-Lipschitz on R, i.e.,
|f (x + ∆x) − f (x)| ≤ |∆x| for all x, ∆x ∈ R. Then apply the Intermediate Value
Theorem of Continuous Functions to f .]
Theorem 2.10 (Outer measure is invariant of shifting). For any x ∈ Rn and
E ⊂ Rn , there is µ∗ (E) = µ∗ (E + x).
Example 2.11. For any λ ∈ R and E ⊂ R, there is µ∗ (λE) = |λ|µ∗ (E).
Example 2.12. If µ∗ (A) = 0, then µ∗ (A ∪ B) = µ∗ (B) = µ∗ (B \ A).
Proof. It follows from µ∗ (B) ≤ µ∗ (A ∪ B) ≤ µ∗ (A) + µ∗ (B) = µ∗ (B) and
µ∗ (B \ A) ≤ µ∗ (B) ≤ µ∗ (B \ A) + µ∗ (A) = µ∗ (B \ A).

2.3 Measurable sets and Lebesgue measure


The problem with the outer measurePis that there exist mutually disjoint Ek

for k = 1, 2, . . . , but µ∗ (∪∞
k=1 Ek ) <

k=1 µ (Ek ). We
P∞will restrict µ∗ to the
∗ ∞ ∗ ∗
so-called measurable sets, such that µ (∪k=1 Ek ) = k=1 µ (Ek ), i.e., µ is
countably additive. Note that countable additivity implies finite additivity.
Definition 2.13 (Measurable set). A set E ⊂ Rn is called a measurable set, or
simply that E is measurable, if for any T ⊂ Rn , there is
µ∗ (T ) = µ∗ (T ∩ E) + µ∗ (T ∩ E c ).
We call T a test set (note that it can be any set). The collection of all measurable
sets in Rn is denoted by M(Rn ), or simply M in no danger of confusion.

14
Remarks. In order to show E ∈ M, it suffices to show µ∗ (T ) ≥ µ∗ (T ∩ E) +
µ∗ (T ∩ E c ) for any test set T , since µ∗ (T ) ≤ µ∗ (T ∩ E) + µ∗ (T ∩ E c ) is already
implied by the sub-additivity of µ∗ .
Example 2.14. If µ∗ (E) = 0, then E ∈ M. [Hint: µ∗ (T ) ≥ µ∗ (T ∩ E c ) =
µ∗ (T ∩ E) + µ∗ (T ∩ E c ) as 0 ≤ µ∗ (T ∩ E) ≤ µ∗ (E) = 0.]

Example 2.15. Let E1 , E2 ⊂ Rn (not necessarily measurable). If there exists


S ∈ M such that E1 ⊂ S and E2 ⊂ S c , then µ∗ (E1 ∪ E2 ) = µ∗ (E1 ) + µ∗ (E2 ).
Proof. Let E1 ∪ E2 be the test set for S, then µ∗ (E1 ∪ E2 ) = µ∗ ((E1 ∪ E2 ) ∩
S) + µ∗ ((E1 ∪ E2 ) ∩ S c ) = µ∗ (E1 ) + µ∗ (E2 ).

Theorem 2.16 (Properties of M). The following statements hold for M:


1. ∅ ∈ M.
2. If E ∈ M, then E c ∈ M.
3. If E1 , E2 ∈ M, then E1 ∪ E2 , E1 ∩ E2 , E1 \ E2 ∈ M.
4. If Ek ∈ M for all k = 1, 2, . . . , then ∪∞
k=1 Ek ∈P M. If in addition that all

Ek are mutually disjoint, then µ∗ (∪∞ k=1 E k ) = ∗
k=1 µ (Ek ).

Proof. Items 1 and 2 are trivial. For Item 3, we only need to show E1 ∪E2 ∈ M,
since E1 ∩ E2 = (E1c ∪ E2c )c and E1 \ E2 = E1 ∩ E2c . For any test set T ,
consider T ∩ (E1 ∪ E2 ), which can be partitioned into three mutually disjoint
sets: T ∩ E1 ∩ E2 , T ∩ E1 ∩ E2c , and T ∩ E1c ∩ E2 (use Venn diagram). Also note
that T ∩ (E1 ∪ E2 )c = T ∩ E1c ∩ E2c . Therefore

µ∗ (T ) = µ∗ (T ∩ E1 ) + µ∗ (T ∩ E1c )
= [µ∗ (T ∩ E1 ∩ E2 ) + µ∗ (T ∩ E1 ∩ E2c )] + [µ∗ (T ∩ E1c ∩ E2 )
+ µ∗ (T ∩ E1c ∩ E2c )]
≥ µ∗ (T ∩ (E1 ∪ E2 )) + µ∗ (T ∩ E1c ∩ E2c )

where the first equality is due to E1 ∈ M with T as test set, the second equality
is due to E2 ∈ M and T ∩ E1 and T ∩ E1c as test sets, and the inequality is due
to the sub-additivity
Pk of µ∗ applied to the first three
Pk terms. Note that it is easy
to show that i=1 Ek ∈ M and µ∗ (∪ki=1 Ek ) = i=1 µ∗ (Ek ).
For Item 4, WLOG, we assume Ek are mutually disjoint; otherwise replace
Ek by Fk := Ek \ (∪k−1 k
i=1 Ei ) for k ≥ 2 which are also in M. Denote Sk = ∪i=1 Ei
∞ c c
and S = ∪i=1 Ei . Note that Sk ∈ M due to Item 3, and S ⊂ Sk . Therefore,
for any T , there is
k
X
µ∗ (T ) = µ∗ (T ∩ Sk ) + µ∗ (T ∩ Skc ) ≥ µ∗ (T ∩ Ei ) + µ∗ (T ∩ S c )
i=1
P∞
Letting k → ∞, we obtain µ∗ (T ) ≥ i=1 µ∗ (T ∩ Ei ) + µ∗ (T P
∩ S c ) ≥ µ∗ (T ∩ S) +

µ (T ∩ S ). Hence S ∈ M. Letting T = S yields µ (S) ≥ k=1 µ∗ (S ∩ Ek ) =
∗ c ∗
P ∞ ∗
k=1 µ (Ek ).

15
The theorem above implies that M, the collection of measurable sets in Rn ,
is a σ-algebra of Rn . This is formally called the Lebesgue measure of Rn .
Definition 2.17 (Lebesgue measure). For any E ∈ M, the Lesbesgue measure
µ of E is defined by µ(E) = µ∗ (E). The space Rn , the set M, and µ, constitute
the Lebesgue measure space (Rn , M, µ) on Rn .
Remarks. In general, for any set X and its σ-algebra A, and an extended
real-valued µ : A → R ∪ {∞}, we call (X, A, µ) a measurable space if (i) 0 ≤
µ(E) ≤ ∞ for any PE ∈ A; (ii) µ(∅) = 0; and (iii) µ is countably additive,

i.e., µ(∪∞
k=1 E k ) = k=1 µ(Ek ) for any countable collection of mutually disjoint
sets {Ek ∈ A : k ∈ N}. If µ(X) = ∞, but X = ∪∞ k=1 Ek where Ek ∈ M and
µ(Ek ) < ∞ for all k, then µ is called σ-finite.
Theorem 2.18 (Continuity of µ from below). Suppose Ek ∈ M for all k =
1, 2, . . . , and E1 ⊂ E2 ⊂ . . . , then there is µ(limk Ek ) = limk µ(Ek ).
Proof. If limk µ(Ek ) = ∞ then trivial. Denote E0 = ∅ and Dk = Ek −Ek−1 ∈ M
for all k = 1, 2, . . . . Then limk Ek = ∪∞ ∞
k=1 Ek = ∪k=1 Dk . Hence,

  ∞  X∞ k
X
µ lim Ek =µ ∪ Dk = µ(Dk ) = lim µ(Dj ) = lim µ(Ek )
k→∞ k=1 k k
k=1 j=1

where we used the countable additivity of measures in the second equality.


Corollary 2.19 (Continuity of µ from above). Suppose Ek ∈ M for all k =
1, 2, . . . , µ(E1 ) < ∞, and E1 ⊃ E2 ⊃ . . . , then µ(limk Ek ) = limk µ(Ek ).
Proof. Denote Fk = E1 \ Ek for all k. Then ∅ = F1 ⊂ F2 ⊂ · · · . Therefore,
 
µ(E1 ) − lim µ(Ek ) = lim (µ(E1 ) − µ(Ek )) = lim µ(Fk ) = µ lim Fk
k→∞ k→∞ k→∞ k→∞
∞  ∞   ∞

=µ ∪ Fk = µ ∪ (E1 ∩ Ek ) = µ E1 ∩ ( ∪ Ekc )
c
k=1 k=1 k=1

   
c
=µ E1 ∩ ( ∩ Ek ) = µ(E1 ) − µ lim Ek .
k=1 k→∞

Hence µ(limk Ek ) = limk µ(Ek ).


Remarks. We need the boundedness in Corollary 2.19: for example, let Ek =
[k, ∞) for every k, then limk Ek = ∅ and µ(limk Ek ) = 0 ̸= ∞ = limk µ(Ek ). So
we need Ek to be bounded starting from some k (WLOG, k = 1 in Corollary
2.19). This boundedness is not required in Theorem 2.18 since Ek can grow
unbounded and we will just get ∞ both sides.
P∞
Example 2.20. Suppose Ek ∈ M for all k and k=1 µ(Ek ) < ∞, then
µ(lim supk Ek ) = 0.
P∞
Proof. Let Bk = ∪∞ j=k Ej , then Bk is non-increasing and µ(Bk ) ≤ j=k µ(Ej ) →
0 as k → ∞. Hence there is µ(lim supk Ek ) = µ(limk Bk ) = limk µ(Bk ) = 0.

16
Corollary 2.21 (Fatou’s lemma for measures). Suppose Ek ∈ M for all k.
Then there is µ(lim inf k Ek ) ≤ lim inf k µ(Ek ).
Proof. Let Bk = ∩j≥k Ej for every k ∈ N. Then Bk is non-decreasing, Bk ⊂ Ek
for all k, and hence µ(lim inf k Ek ) = µ(limk Bk ) = limk µ(Bk ) ≤ lim inf k µ(Ek ).

Remarks. We have two remarks regarding the Fatou’s lemma for measures.
• In general we do not have µ(lim inf k Ek ) = lim inf k µ(Ek ). For example,
let Ek = [0, 12 ] if k is odd and ( 12 , 1] if even, then lim inf k Ek = ∅, and
µ(lim inf k Ek ) = 0 < 21 = lim inf k µ(Ek ) for all k.
• If Ek ⊂ E for all k and µ(E) < ∞, then we also have µ(lim supk Ek ) ≥
lim supk µ(Ek ) by substituting Ek with Ekc for every k and observing that
(lim supk Ek )c = lim inf k Ekc .
Lemma 2.22 (Carathéodory). Suppose G is an open proper subset of Rn , E ⊂
G (E needs not be in M). Let Ek = {x ∈ E : d(x, Gc ) ≥ k1 } for every k ∈ N.
Then limk µ∗ (Ek ) = µ∗ (E).
Proof. It is clear that E1 ⊂ E2 ⊂ · · · ⊂ Ek ⊂ · · · ⊂ E. Hence µ∗ (Ek ) ≤ µ∗ (E)
for all k. For any x ∈ E ⊂ G, x is an interior point of G, and hence there exists
δ > 0 such that B(x, δ) ⊂ G, i.e., d(x, Gc ) ≥ δ. Hence E = ∪∞ k=1 Ek . WLOG,
assume µ∗ (E) < ∞.
Let Dk = Ek+1 \ Ek for k = 1, 2, . . . , then we have
k
k X
∞ > µ∗ (E) ≥ µ∗ (E2k ) ≥ µ∗ ( ∪ D2j ) = µ∗ (D2j ), ∀ k ∈ N,
j=1
j=1

where we used the fact that d(D2i , D2j ) > 0 for all i < j ≤ k to obtain the
equalityP(µ∗ is additive if two sets are separated with
P∞a positive distance).

Hence j=k+1 µ∗ (D2j ) → 0 as k → ∞. Similarly, ∗
j=k µ (D2j+1 ) → 0 as
k → ∞. Therefore, for any P∞ ϵ >∗ 0, there exists k large enough, such that
P ∞ ∗ ϵ ϵ
j=k+1 µ (D2j ) < 2 and j=k µ (D2j+1 ) < 2 . On the other hand, note that
E = E2k ∪ (∪∞ ∞
j=k+1 D2j ) ∪ (∪j=k D2j+1 ), we know that

X ∞
X
µ∗ (E) ≤ µ∗ (E2k ) + µ∗ (D2j ) + µ∗ (D2j+1 )
j=k+1 j=k
∗ ∗
< µ (E2k ) + ϵ ≤ lim µ (Ek ) + ϵ.
k→∞

Since ϵ > 0 is arbitrary, we have µ (E) ≤ limk µ∗ (Ek ).


Theorem 2.23 (Closed sets are measurable). If F ⊂ Rn is closed, then F ∈ M.


Proof. For any test set T ⊂ Rn , consider T ∩ F c , which is a subset of the open
set F c . Denote Fk = {x ∈ T ∩ F c : d(x, F ) ≥ k1 }. Then by Lemma 2.22,
limk µ∗ (Fk ) = µ∗ (T ∩ F c ). Therefore
µ∗ (T ) = µ∗ (T ∩ F ) ∪ (T ∩ F c ) ≥ µ∗ ((T ∩ F ) ∪ Fk )


= µ∗ (T ∩ F ) + µ∗ (Fk ) → µ∗ (T ∩ F ) + µ∗ (T ∩ F c )

17
as k → ∞, where we used the fact that d(T ∩ F, Fk ) ≥ k1 > 0 to obtain the
second equality above. Hence µ∗ (T ) ≥ µ∗ (T ∩ F ) + µ∗ (T ∩ F c ). So F ∈ M.
Corollary 2.24 (Open sets are measurable). If G ⊂ Rn is open, then G ∈ M.
Corollary 2.25 (Borel sets are measurable). If A ∈ B(Rn ), then A ∈ M.
Proof. Since B(Rn ) is the smallest σ-algebra generated by the family of open
sets, and M is a σ-algebra also containing all open sets due to Corollary 2.24,
we know B(Rn ) ⊂ M.
Theorem 2.26. Suppose E ∈ M. For any ϵ > 0, there exist an open set G
such that E ⊂ G and µ(G \ E) < ϵ, and a closed set F such that F ⊂ E and
µ(E \ F ) < ϵ.
Proof. First assume µ(E) < ∞. Then there exists an open-box-cover {Ik } of
E such that G = ∪∞ k=1 Ik and µ(G) < µ(E) + ϵ. Hence 0 ≤ µ(G \ E) =
µ(G) − µ(E) < ϵ.
If µ(E) = ∞, then denote Ek = E ∪ B(0, k) for every k ∈ N. Then µ(Ek ) <
∞ for all k, and E = ∪∞k=1 Ek . For every k, there exists Gk such that Ek ⊂ Gk
and µ(Gk \ Ek ) < 2ϵk . Now let G = ∪∞ k=1 Gk , then E ⊂ G and G \ E =
∪∞
k=1 (Gk \ E). Hence

∞ ∞ ∞
X X X ϵ
µ(G \ E) ≤ µ(Gk \ E) ≤ µ(Gk \ Ek ) < = ϵ.
2k
k=1 k=1 k=1

For any E ∈ M, we know E c ∈ M. Hence there exists an open set G, and


therefore a closed set F = Gc , such that µ(E \ F ) = µ(E ∩ F c ) = µ(E ∩ G) =
µ(G \ E c ) < ϵ.
Theorem 2.27. Suppose E ∈ M. Then there eixsts a Gδ -set H such that
E ⊂ H and µ(H \ E) = 0. Similarly, there exists an Fσ -set K such that K ⊂ E
and µ(E \ K) = 0.
Proof. There exists an open set Gk such that E ⊂ Gk and µ(Gk \ E) < k1
for every k ∈ N. Let H = ∩∞ k=1 Gk which is a Gσ -set, then E ⊂ H and
µ(H \ E) ≤ µ(Gk \ E) < k1 for all k. Hence µ(H \ E) = 0.
Theorem 2.28. For any E ⊂ Rn (needs not be in M), there exists a Gδ -set
H such that E ⊂ H and µ(H) = µ∗ (E).
Proof. There exists an open Gk such that E ⊂ Gk and µ(Gk ) < µ∗ (E) + k1 for
every k ∈ N. Let H = ∩∞ ∗ ∗
k=1 Gk then E ⊂ H and µ (E) ≤ µ(H) ≤ µ (E) + k
1

for all k. Hence µ(H) = µ (E).
Remarks. However, there may not exist an Fσ -set K such that K ⊂ E and
µ(K) = µ∗ (E) if E ∈
/ M. See Example 2.35 below.
Theorem 2.29. Suppose Ek ⊂ Rn but need not be in M for all k ∈ N. Then
µ∗ (lim inf k Ek ) ≤ lim inf k µ∗ (Ek ).

18
Proof. By Theorem 2.28, for every Ek , there exists a Gσ -set Hk (hence Hk ∈
M), such that Ek ⊂ Hk and µ(Hk ) = µ∗ (Ek ). Hence µ∗ (lim inf k Ek ) ≤
µ(lim inf k Hk ) ≤ lim inf k µ(Hk ) = lim inf k µ∗ (Ek ).
Corollary 2.30. Suppose Ek ⊂ Rn but need not be in M, and E1 ⊂ E2 ⊂
· · · ⊂ Ek ⊂ · · · , then µ∗ (limk Ek ) = limk µ∗ (Ek ).

Proof. Note that µ∗ (limk Ek ) ≤ limk µ∗ (Ek ) by Theorem 2.29. The converse is
obvious because Ek ⊂ limk Ek for all k.
Example 2.31 (Cantor set). Consider the closed interval [0, 1] ⊂ R. Let C1 =
[0, 31 ] ∪ [ 23 , 1] by removing the open middle third ( 13 , 32 ) from [0, 1]. Then let
C2 = [0, 19 ] ∪ [ 29 , 13 ] ∪ [ 23 , 97 ] ∪ [ 89 , 1], by removing the middle thirds from both of
[0, 31 ] and [ 23 , 1], and so on. Then Ck is compact for every k, and C1 ⊃ C2 ⊃ · · · .
Then C = ∩∞ k=1 Ck is called the Cantor set. In addition, C has the following
properties: (i) C is compact, nowhere dense, totally disconnected, and has no
isolated point; (ii) µ(C) = 0; and (iii) |C| = c.
Proof. (i) Note that C is closed and bounded, hence compact. The remaining
three are easy to check; (ii) The total length of intervals removed is 13 + 29 + 27
4
+
P∞ 2k−1 P∞ ak
· · · = k=1 3k = 1; (iii) It can be shown that C = { k=1 3k : ak = {0, 2}},
hence |C| = c (see comment below Definition 1.32).

2.4 Non-measurable sets


We provide example of non-measurable sets. We have shown that E ∈ M if
µ∗ (E) = 0. Hence a non-measurable set must have positive outer measure. To
show that there are non-measurable sets, we consider R in this subsection, and
recall that λ + E = {λ + x : x ∈ E} is the translation set of E ⊂ R by λ ∈ R.
Note that µ∗ (λ + E) = µ∗ (E) since (outer) measure is translation invariant.

Lemma 2.32. Suppose E ∈ M and µ(E) < ∞. If there exists a bounded


countably infinite set Λ ⊂ R such that the collection of translation sets {λ + E :
λ ∈ Λ} are disjoint, then µ(E) = 0.
Proof. Since {λ + E : λ ∈ Λ} is a collection of countably many disjoint
P sets, by
the countable
P additivity of measures, we know µ(∪λ∈Λ (λ + E)) = λ∈Λ µ(λ +
E) = λ∈Λ µ(E). Since both Λ and E are bounded, we know ∪λ∈Λ (λ + E) is
bounded
P and hence µ(∪λ∈Λ (λ + E)) < ∞. If µ(E) > 0, then µ(∪λ∈Λ (λ + E)) =
λ∈Λ µ(E) = ∞, which is a contradiction. Hence µ(E) = 0.
For any E ⊂ R, we define the equivalence relation ∼ for any x, y ∈ E: x ∼ y
iff x − y ∈ Q (it is easy to verify that ∼ is an equivalence relation by checking
that it is reflexive, symmetric, and transitive). Then E is a disjoint union of
its equivalence classes. Let CE ⊂ E be the set that contains exactly one point
from each equivalence class in E. Then we know the two properties below hold:
1. For any x, y ∈ CE , x − y ∈ / Q since x, y belong to different equivalence
classes of E.

19
2. For any x ∈ E, there exists c ∈ CE and q ∈ Q such that x = c + q, since
CE contains a point from the same equivalence class which x belongs to.
Note that the first property also implies that {λ + CE : λ ∈ Λ} is a collection
of disjoint sets provided that Λ ⊂ Q. Now the following theorem shows that
non-measurable sets exist.

Theorem 2.33 (Vitali: non-measurable sets exist). Any set E with µ∗ (E) > 0
contains non-measurable subset.
Proof. WLOG, we assume 0 < µ∗ (E) < ∞ and E ⊂ [−b, b] ⊂ R (otherwise
we take B(0, b) ∩ E for some b ∈ N). Let Λ = [−2b, 2b] ∩ Q. Let CE be
the subset of E containing exactly one point from each equivalence class, then
{λ + CE : λ ∈ Λ} is a collection of countably many disjoint sets.
If CE ∈ M, then by Lemma 2.32, we know µ(CE ) = 0. Now for any x ∈ E,
there exists c ∈ CE and q ∈ Q, such that q = x − c. Since CE ⊂ E ⊂ [−b, b],
we know q ∈ [−2b, 2b] and hence q ∈ Λ. Therefore x ∈ ∪λ∈Λ (λ + CE ). As x

P we know E ⊂ ∪P
is arbitrary, λ∈Λ (λ + CE ). However 0 < µ (E) ≤ µ(∪λ∈Λ (λ +
CE )) = λ∈Λ µ(λ + CE ) = λ∈Λ µ(CE ) = 0, which is a contradiction. Hence
CE ∈ / M.
Theorem 2.34. There exist disjoint sets A, B ⊂ R such that µ∗ (A ∪ B) <
µ∗ (A) + µ∗ (B).
/ M and T ⊂ R, there is µ∗ (T ) ≥ µ∗ (T ∩ E) +
Proof. If not, then for any E ∈

µ (T ∩ E ) since T ∩ E and T ∩ E c are disjoint. This implies E ∈ M by
c

definition, a contradiction.
Example 2.35. Let E = [0, 1) ⊂ R and CE be the subset of E containing
exactly one point from each equivalence class in E as above. Let Dλ = {x +
λ (mod 1) : x ∈ CE } for every λ ∈ Λ := [0, 1) ∩ Q. Then E = ∪λ∈Λ Dλ is a
countable disjoint union, µ∗ (Dλ ) = µ∗ (CE ) > 0 for all λ ∈ Λ, and
[ X
1 = µ∗ ([0, 1)) = µ∗ (E) = µ∗ ( Dλ ) < µ∗ (Dλ ) = ∞.
λ∈Λ λ∈Λ

Moreover, for any measurable subset K ⊂ CE , we know {λ + K : λ ∈ Λ} is a



P sets, µ(λ+K) = µ(K), and ∞ > µ ∗(∪λ∈Λ (λ+
family of measurable and disjoint
CE )) ≥ µ(∪λ∈Λ (λ + K)) = λ∈Λ µ(λ + K). Hence µ(K) = 0 < µ (CE ) (see
remark after Theorem 2.28).

20
3 Measurable Functions
3.1 Extended real numbers
Recall the following properties of the extended real numbers R̄ = R ∪ {±∞}:
1. For any x ∈ R, there is −∞ < x < ∞.
2. For any x ∈ R, there are

x + (+∞) = (+∞) + x = (+∞) + (+∞) = +∞


x − (+∞) = (−∞) + x = (−∞) + (−∞) = −∞
x − (+∞) = (−∞) − (+∞) = −∞
x − (−∞) = (+∞) − (−∞) = +∞
±(+∞) = ±∞
±(−∞) = ∓∞
±(∓∞) = −∞
| ± ∞| = +∞

3. For any x ∈ R and x ̸= 0, we denote sign(x) = 1 if x > 0 and −1 if x < 0.


Then there are x · (±∞) = ±sign(x) · ∞, and

(±∞) · (±∞) = +∞, (±∞) · (∓∞) = −∞

4. (±∞) − (±∞) and ±(∞) + (∓∞) are not defined. We define 0 · (±∞) = 0
in this course.
Definition 3.1 (Measureable function). Suppose E ⊂ Rn . We call the function
f : E → R̄ measurable, or f is a measurable function on E, if {x ∈ E : f (x) >
t} ∈ M for any t ∈ R.
It turns out that we only need to show {x ∈ E : f (x) > r} ∈ M for all r in
a dense set D of R, for example D = Q.
Theorem 3.2. Suppose f : E → R̄ and D is a dense subset of R. If {x ∈ E :
f (x) > r} ∈ M for any r ∈ D, then f is measurable.
Proof. For any t ∈ R, there exist {rk } ⊂ D such that rk ↓ t. Hence {x ∈ E :
f (x) > t} = ∪∞
k=1 {x ∈ E : f (x) > rk } ∈ M.

Example 3.3. If f : [a, b] → R̄ is monotone, then f is measurable.


Proof. WLOG, assume f is non-decreasing. Then for any t ∈ R, {x ∈ [a, b] :
f (x) > t} is either the empty set, or a single point, or a subinterval of [a, b],
which are all in M.
Theorem 3.4. If f : E → R̄ is measurable, then the sets on the left hand side
below are all measurable sets for any t ∈ R:
1. {x ∈ E : f (x) ≤ t} = E \ {x ∈ E : f (x) > t}.
2. {x ∈ E : f (x) ≥ t} = ∩∞ 1
k=1 {x ∈ E : f (x) > t − k }.

21
3. {x ∈ E : f (x) < t} = E \ {x ∈ E : f (x) ≥ t}.
4. {x ∈ E : f (x) = t} = {x ∈ E : f (x) ≥ t} ∩ {x ∈ E : f (x) ≤ t}.
5. {x ∈ E : f (x) < ∞} = ∪∞ k=1 {x ∈ E : f (x) < k}.
6. {x ∈ E : f (x) = +∞} = E \ {x ∈ E : f (x) < +∞}.
7. {x ∈ E : f (x) > −∞} = ∪∞ k=1 {x ∈ E : f (x) > −k}.
8. {x ∈ E : f (x) = −∞} = E \ {x ∈ E : f (x) > −∞}.
Remarks. We can use any of the first three as the definition of measurable
sets.
Theorem 3.5. If f : E1 ∪ E2 → R̄, and f is measurable on E1 and E2 , then f
is measurable on E1 ∪ E2 .
Proof. {x ∈ E1 ∪ E2 : f (x) > t} = {x ∈ E1 : f (x) > t} ∪ {x ∈ E2 : f (x) > t} ∈
M.
Theorem 3.6. If f : E → R̄, and A ⊂ E is measurable, then f is measurable
on A.
Proof. {x ∈ A : f (x) > t} = A ∩ {x ∈ E : f (x) > t} ∈ M.
Example 3.7. Suppose E ∈ M. Then χE : Rn → R̄ is measurable.
Proof. Note that all of the three sets below are measurable:

n
R

 if t < 0
n
{x ∈ R : χE (x) > t} = E if 0 ≤ t < 1

∅

if t ≥ 1

Theorem 3.8. Suppose f, g : E → R are measurable. Then cf , f + g, f · g are


measurable for any c ∈ R.
Proof. If c = 0 then trivial. If c > 0 then {x ∈ E : cf (x) > t} = {x ∈ E :
f (x) > t/c} ∈ M. Similarly for c < 0.
Let Q = {rk }, then {x : f +g > t} = ∪∞k=1 {x : f > rk }∩{x : g √
> t−rk } ∈ M.
2 2
√ We first can show f is measurable: {x : f > t} = {x : f > t} ∪ {x : f <
− t} if t ≥ 0 and {x : f 2 > t} = E if t < 0. In either case, {x : f 2 > t} ∈ M,
hence f 2 is measurable. Then note that f · g = 14 ((f + g)2 − (f − g)2 ).
The result above can be generalized to functions f : E → R̄.
Theorem 3.9. If fk : E → R̄ is measurable for every k, then supk fk (x),
inf k fk (x), lim supk fk (x), lim inf k fk (x) are all measurable functions.
Proof. Denote f = supk fk . Then for any t ∈ R, {x : f > t} = ∪∞ k=1 {x :
fk > t} ∈ M. The other three can be verified by noting that inf k fk =
− supk (−fk ), lim supk fk = limj supk≥j fk = inf j supk≥j fk and lim inf k fk =
− lim supk (−fk ).

22
Corollary 3.10. If fk , f : E → R̄, fk is measurable, and fk → f for every
x ∈ E, then f is measurable.
Example 3.11. Suppose that f : E → R̄ is measurable. Define f + (x) =
max(f (x), 0) and f − (x) = max(−f (x), 0) for all x ∈ E. Then f + , f − are
measurable.

It is clear that f is measurable iff f + , f − are measurable.


Example 3.12. If f : E → R̄ is measurable, then |f | is measurable. However
the converse may not be true. [Hint: f (x) = 2(χE (x) − 12 ) for some E ∈
/ M.]
Example 3.13. Suppse f : R2 → R. f (x, y) is continuous in y for every x and
measurable in x for every y. Then f is measurable on R2 .
Proof. For every k ∈ N, define fk (x, y) = f (x, kj ) for j−1
k <y ≤ j
k. Then for
every t ∈ R, there is
[∞ n  j o j − 1 j 
{(x, y) : fk (x, y) > t} = x : fk x, >t × ,
j=−∞
k k k

which implies that fk is measurable. Then limk f (x, y) = f (x, y) for all y implies
that f (x, y) is measurable.

Example 3.14 (Continuous functions are measurable). If E ∈ M and f ∈


C(E), then f is measurable. [Hint: for every t, {x : f (x) > t} is open.]
Definition 3.15. Let E ⊂ Rn . We say a property P holds almost everywhere
(a.e. in short) if there exists A ⊂ E such that µ(E \ A) = 0 and P (x) holds for
every x ∈ A.

Example 3.16. If f, g : E → R̄ and µ({x ∈ E : f (x) ̸= g(x)}) = 0, then f = g


a.e. E.
Definition 3.17. If f : E → R̄ and µ({x ∈ E : |f (x)| = +∞}) = 0, then
|f | < ∞ a.e. E, or f is finite a.e. E.

Remarks. Note that this is different from “bounded by M a.e. E”, which is
µ({x ∈ E : |f (x)| > M }) = 0.
Theorem 3.18. If f, g : E → R̄, f = g a.e. E, and f is measurable, then g is
measurable.
Proof. Let A = {x : f = g}, then µ(E \ A) = 0 and A ∈ M. Therefore
{x ∈ E : g(x) > t} = {x ∈ A : g(x) > t} ∪ {x ∈ Ac : g(x) > t} = {x ∈ A :
f (x) > t} ∪ {x ∈ Ac : g(x) > t} ∈ M since both sets are measurable.
Example 3.19. Suppose E ∈ M and 0 < µ(E) < ∞. If 0 < f < ∞ a.e. E,
then for any δ > 0, there exits Eδ ⊂ E and K > 0, such that µ(Eδ ) < δ and
1
K ≤ f (x) ≤ K for all x ∈ E \ Eδ .

23
Proof. Define Ak = {x ∈ E : k1 ≤ f (x) ≤ k}. Then it is clear that Ak is
non-decreasing. Let A = ∪∞k=1 Ak . Then E = A ∪ Z0 ∪ Z1 where Z0 = {x ∈ E :
f (x) = 0} and Z1 = {x ∈ E : f (x) = ∞}. Note that µ(Z0 ) = µ(Z1 ) = 0. Hence
µ(E) = µ(E \ (Z0 ∪ Z1 )) = µ(A) = µ(∪∞k=1 Ak ) = limk µ(Ak ). Thus there exists
1
K, such that for Eδ = Z0 ∪ Z1 ∪ AcK , there are µ(Eδ ) < δ and K ≤ f (x) ≤ K
c
for all x ∈ AK = E \ Eδ = E \ (Z0 ∪ Z1 ∪ AK ).

3.2 Simple functions


Definition 3.20 (Simple function). A function f : E → R̄ is called simple
if the range set f (E) is finite. That is, E P can be partitioned into p mutually
p
disjoint sets E1 , . . . , Ep , such that f (x) = i=1 ci χEi (x) for p distinct values
c1 , . . . , cp ∈ R̄.
It is clear that, if Ei ∈ M for all i, then the simple function f is measurable.
Theorem 3.21 (Pointwise convergence to a nonnegative measurable function
by a monotone sequence of simple functions). If f : E → R̄, f is measurable,
and f ≥ 0, then there exists a sequence of non-decreasing simple functions {fk }
such that limk fk (x) = f (x) for every x ∈ E.
Proof. For every k ∈ N, we partition [0, ∞] into [0, k) and [k, ∞], and further
partition [0, k) into k · 2k segments of length 21k . Then define Ek,j = {x ∈ E :
j−1
2k
≤ f (x) < 2jk } for j = 1, . . . , k · 2k and Ek = {x ∈ E : f (x) ≥ k}, and a
simple function fk as follows:
(
j−1
k , if x ∈ Ek,j , j = 1, . . . , k · 2k ,
fk (x) = 2
k, if x ∈ Ek .

Then it is clear that fk ≤ f and fk is non-decreasing. For any x ∈ E, if f (x) <


∞, then there exists an integer K > f (x), and hence 0 ≤ f (x) − fk (x) ≤ 21k for
all k ≥ K, and hence limk fk (x) = f (x); if f (x) = ∞, then fk (x) = k and hence
limk fk (x) = ∞ = f (x).
Theorem 3.22 (Pointwise (uniform) convergence to a (uniformly bounded)
measurable function by a sequence of simple functions). If f : E → R̄ is mea-
surable, then there exists a sequence of simple functions {fk } such that |fk | ≤ f
and fk → f . If in addition |f | ≤ M , then fk ⇒ f .
Proof. The first statement can be verified by noting that f = f + − f − . If
|f | ≤ M , then for any ϵ > 0, by the same construction above, there exists an
integer K ≥ M , such that |f (x) − fk (x)| ≤ 21k ≤ 21K < ϵ for all x ∈ E and
k ≥ K.
Definition 3.23 (Support of a function). Suppose f : E → R̄. Then the support
of f , denoted by supp(f ), is defined by the closure of {x ∈ E : f (x) ̸= 0} (hence
a support is always closed). If supp(f ) is bounded, then f is said to have a
compact support.

24
Example 3.24. Let gk (x) = fk (x)χB(0,k) (x) where fk is the simple function
in the previous theorem, then gk is also simple and has compact support, and
gk (x) → f (x) for all x ∈ E.

3.3 Convergence almost everywhere


Definition 3.25 (Convergence almost everywhere). Suppose fk , f : E → R̄.
Then fk is said to converge to f almost everywhere on E, denoted by fk → f
a.e. E, if there exists Z ⊂ E, such that µ(Z) = 0 and limk fk (x) = f (x) for all
x ∈ E \ Z.
It is clear that if fk is measurable for every k and fk → f a.e. E, then f is
measurable.
Lemma 3.26. Suppose fk , f : E → R̄ where µ(E) < ∞, and fk → f a.e. E.
Then for any ϵ > 0, define the set Ek (ϵ) = {x ∈ E : |fk (x) − f (x)| ≥ ϵ}, there
is limk µ(∪∞
j=k Ej (ϵ)) = µ(lim supk Ek (ϵ)) = 0.

Proof. Note that fk → f a.e. E means that µ(∪∞ ∞ ∞ 1


i=1 ∩k=1 ∪j=k Ej ( i )) = 0 (To
see this, recall that limk fk (x) ̸= f (x) means that there exists i ∈ N, such that
for any k ∈ N there exists j ≥ k and |fj (x) − f (x)| ≥ 1i . But the set of such
“non-convergent” points has measure 0). Also note that Ek (ϵ1 ) ⊂ Ek (ϵ2 ) for
any 0 < ϵ2 < ϵ1 . Hence for any ϵ > 0, there exists i0 ∈ N, such that i10 < ϵ and
µ(∩∞ ∞ ∞ ∞ 1 ∞ ∞ ∞ 1
k=1 ∪j=k Ej (ϵ)) ≤ µ(∩k=1 ∪j=k Ej ( i0 )) ≤ µ(∪i=1 ∩k=1 ∪j=k Ej ( i )) = 0. Note
∞ ∞ ∞
that lim supk Ek (ϵ) = ∩k=1 ∪j=k Ej (ϵ) and ∪j=k Ej (ϵ) is non-increasing in k, we
obtain that limk µ(∪∞ ∞
j=k Ej (ϵ)) = µ(limk ∪j=k Ej (ϵ)) = µ(lim supk Ek (ϵ)) = 0,
where we needed µ(E) < ∞ to obtain the first equality.
Theorem 3.27 (Egorov: almost everywhere convergence + bounded domain
⇒ “nearly” uniform convergence). Suppose fk , f : E → R̄ where µ(E) < ∞. If
fk and f are finite and fk → f a.e. E, then for any δ > 0, there exists Eδ ⊂ E,
such that µ(Eδ ) < δ and fk ⇒ f on E \ Eδ .
Proof. For any δ > 0 and i ∈ N, denote Ek ( 1i ) = {x ∈ E : |fk (x) − f (x)| ≥
1
i }, then by the lemma above, we know that there exists ji ∈ N such that
µ(∪∞ 1 δ
k=ji Ek ( i )) < 2i . P∞
Now consider Eδ = ∪∞ ∞ 1
i=1 ∪k=ji Ek ( i ), there is µ(Eδ ) ≤
∞ 1
i=1 µ(∪k=ji Ek ( i )) <
P∞ δ
i=1 2i = δ. Moreover, there is

∞ ∞
  1 c ∞ ∞
n 1o
E \ Eδ = ∩ ∩ Ek = ∩ ∩ x ∈ E : |fk (x) − f (x)| <
i=1 k=ji i i=1 k=ji i
1
Therefore, for any i, there exists ji , such that |fk (x)−f (x)| < i for all x ∈ E\Eδ
and k ≥ ji . This means fk ⇒ f on E \ Eδ .
Remarks. A few remarks are in place:
1. The boundedness of E is necessary. [Hint: consider fk (x) = χ[0,k] (x) and
f (x) = 1 for all x ∈ R. Or consider fk (x) = xk and f (x) = 0 for x ∈ R.]

25
2. If µ(E) = ∞, we can still show that for any M > 0, there exists EM ⊂ E
such that µ(EM ) > M and fk ⇒ f on EM . [Hint: choose any set FM ⊂ Rn
with µ(FM ) = M + δ, and EM = FM \ Eδ by applying Theorem 3.27
(Egorov) to E = FM .]
3. There exists a sequence of sets {Ej } with non-decreasing measure to µ(E),
such that fk ⇒ f on Ej for every j, and µ(E \ (∪∞ j=1 Ej )) = 0.
4. We can choose Eδ such that E \ Eδ is also closed. To this end, just choose
Eδ/2 at the first place such that µ(Eδ/2 ) < δ/2 and fk ⇒ f on E \ Eδ/2 ,
and choose a closed set F ⊂ E \ Eδ/2 such that µ((E \ Eδ/2 ) \ F ) < δ/2
(by Theorem 2.26), then µ(E \ F ) < δ, and fk ⇒ f on F .
Example 3.28. fk (x) = xk on [0, 1]. Then for any δ > 0 we can show fk ⇒ f
on [0, 1 − δ].

3.4 Convergence in measure


Definition 3.29 (Convergence in measure). Suppose fk , f : E → R̄ and fk , f
µ
finite a.e. E. We say that fk converges to f in measure, denoted by fk → f , if
for any ϵ > 0, there is limk→∞ µ({x ∈ E : |fk − f | ≥ ϵ}) = 0. [Using the Ek (ϵ)
µ
notation above, this definition can be stated as: fk → f on E if for any ϵ > 0
there is limk µ(Ek (ϵ)) = 0.]
Note that µ({x ∈ E : |fk | = ∞}) = 0 for all k, so it does not affect the
convergence in measure.
Theorem 3.30 (Convergence in measure ⇒ unique limit in the sense of a.e.).
µ µ
If fk → f and fk → g, then f = g a.e.
Proof. For any ϵ > 0, denote Ek (ϵ) = {x : |fk (x) − f (x)| ≥ ϵ} and Fk (ϵ) = {x :
|fk (x) − g(x)| ≥ ϵ}. Note that |f (x) − g(x)| ≤ |fk (x) − f (x)| + |fk (x) − g(x)|.
Hence, there is {x : |f −g| ≥ ϵ} ⊂ Ek ( 2ϵ )∪Fk ( 2ϵ ). Since µ(Ek ( 2ϵ )), µ(Fk ( 2ϵ )) → 0
as k → ∞, we know µ({x : |f −g| ≥ ϵ}) = 0. Setting ϵ = n1 and taking countable
union for n ∈ N yield that µ({x : |f − g| > 0}) = 0.
Theorem 3.31 (Convergence a.e. + bounded domain ⇒ convergence in µ).
Suppose fk , f : E → R̄ where µ(E) < ∞, fk , f finite a.e. E, and fk → f a.e. E,
µ
then fk → f .
Proof. Since fk → f a.e. E, we know that for any ϵ > 0 there is µ(∩∞ ∞
j=1 ∪k=j
Ek (ϵ)) = 0 where Ek (ϵ) = {x ∈ E |fk (x) − f (x)| ≥ ϵ}. Note that Ak (ϵ) =
:
∪∞j=k Ej (ϵ) is non-increasing in k and Ek ⊂ Ak , we have limk µ(Ek (ϵ)) ≤
limk µ(Ak (ϵ)) = µ(limk Ak (ϵ)) = 0 (the first equality requires µ(E) < ∞).
µ
Hence fk → f on E.
Remarks. We have two remarks regarding Theorem 3.31.
• An alternative proof: If fk → f a.e. E, then for any ϵ > 0, there
is µ(lim supk Ek (ϵ)) = 0. By the Fatou’s lemma for sets and µ(E) <
∞, we have 0 = µ(lim supk Ek (ϵ)) ≥ lim supk µ(Ek (ϵ)) ≥ 0. Hence
limk µ(Ek (ϵ)) = 0.

26
• The boundedness of E in Theorem 3.31 is again necessary: consider
fk (x) = χ[0,k] (x) and f (x) = 1 for all x ∈ R. Or consider fk (x) = xk
and f (x) = 0 for x ∈ R.
Theorem 3.32 (Almost uniform convergence ⇒ convergence in µ). Suppose
fk , f : E → R̄, fk , f finite a.e. E. If for any δ > 0, there exists Eδ ⊂ E such
µ
that µ(Eδ ) < δ and fk ⇒ f on E \ Eδ , then fk → f .
Proof. For any δ > 0, there is Eδ ⊂ E, µ(Eδ ) < δ, and fk ⇒ f on E \ Eδ .
Hence for any ϵ > 0, there exists an integer K depending on ϵ and δ, such that
|fk (x) − f (x)| < ϵ for all x ∈ E \ Eδ and k ≥ K. Therefore Ek (ϵ) = {x ∈ E :
|fk (x) − f (x)| ≥ ϵ} ⊂ Eδ , i.e., µ(Ek (ϵ)) ≤ µ(Eδ ) < δ for all k ≥ K. Therefore
µ
limk µ(Ek (ϵ)) = 0, i.e., fk → f on E.

Definition 3.33 (Cauchy in measure). Suppose fk : E → R̄. We say {fk } is


Cauchy in measure if for any ϵ > 0, there is µ({x ∈ E : |fk (x)−fj (x)| ≥ ϵ}) → 0
as k, j → ∞. In other words, for any ϵ, δ > 0, there exists K ∈ N, such that
µ({x ∈ E : |fk (x) − fj (x)| ≥ ϵ}) < δ for all k, j ≥ K.
Theorem 3.34 (Cauchy in measure ⇒ convergence in measure). Suppose {fk }
is Cauchy in measure on E. Then there exists f : E → R̄ finite a.e. E such that
µ
fk → f on E.
Proof. We first show that there exists a subsequence of {fk } that converges to
µ
f in measure; then we show that the entire sequence fk → f .
Since {fk } is Cauchy in measure, we know that for every i ∈ N (and let
ϵ = δ = 21i ), there exists ki , such that µ({x ∈ E : |fl (x) − fj (x)| ≥ 21i }) < 21i
for all l, j ≥ ki . WLOG, we assume ki < ki+1 , hence we have a subsequence
{fki }, denoted by {gi } for short, of {fk }, such that µ(Ei ) < 21i where Ei = {x ∈
E : |gi (x) − gi+1 (x)| ≥ 21i }. Now consider S = ∩∞ ∞ ∞
l=1 ∪i=l Ei . Since ∪i=l Ei is
decreasing in l, we have

∞ ∞ X 1
µ(S) = µ( lim ∪ Ei ) = lim µ( ∪ Ei ) ≤ lim µ(Ei ) < lim = 0.
l→∞ i=l l→∞ i=l l→∞ l→∞ 2l−1
i=l
P∞ P∞ 1
Also, for any x ∈ S c = ∪∞ ∞ c
l=1 ∩i=l Ei , we know i=l |gi (x)−gi+1 (x)| ≤ i=l 2i =
1
2 l−1 → 0 as l → ∞, which means {gi (x) − gi+1 (x)} is absolutely convergent
and {gi (x)} is Cauchy. Let f (x) be the limit of {gi (x)} for every x ∈ S c and
arbitrary in S (it does not matter as µ(S) = 0). Hence gi → f on S c , and
gi → f a.e. E.
µ
Now we show gi → f on E (note that we cannot directly get this from
gi → f a.e. E usingP∞ Theorem 3.31 since E may be unbounded). Note that
|gl (x) − f (x)| ≤ i=l |gi (x) − gi+1 (x)| a.e. E. Denote Fl (ϵ) = {x ∈ E : |gl (x) −
1
f (x)| ≥ ϵ}. Then for any ϵ, δ > 0, there exists l large enough, such that 2l−1 <
1 ∞
P∞ 1
min(ϵ, δ), and µ(Fl (ϵ)) ≤ µ(Fl ( 2l−1 )) ≤ µ(∪i=l Ei ) ≤ i=l µ(Ei ) < 2l−1 < δ.
µ
Hence gi → f .

27
µ
Finally, we will show fk → f . To this end, for any ϵ > 0, consider
n ϵo n ϵo
{x : |fk (x)−f (x)| ≥ ϵ} ⊂ x : |fk (x) − fki (x)| ≥ ∪ x : |fki (x) − f (x)| ≥
2 2
Note that the two sets on the right hand side have measure approaching 0 as
k, i → ∞. Hence µ({x : |fk (x) − f (x)| ≥ ϵ}) → 0 as k → ∞.
Remarks. It is easy to show the converse of Theorem 3.34. Hence {fk } is
µ
Cauchy in measure iff fk → f .
Theorem 3.35 (Reisz: Convergence in µ ⇒ ∃ subsequence converges a.e.). If
µ
fk → f on E, then there exists a subsequence fki → f a.e. E.
Proof. From the proof of Theorem 3.34, there exists a subsequence {fki } of
µ
{fk }, such that fki → g a.e. E for some g and fki → g. On the other hand,
µ
fki → f . Hence f = g a.e. E. Therefore fki → f a.e. E.

3.5 Measurable functions and continuous functions


Theorem 3.36 (Lusin: finite a.e. ⇒ “nearly” continuous). Suppose f : E → R̄
is finite a.e. E, then for any δ > 0, there exists F ⊂ E where F is closed and
µ(E \ F ) < δ, such that f is continuous on F .
Proof. WLOG, assume that f : E → R (since µ({x : |f | = ∞}) Pp= 0).
We first prove the case where f is simple, i.e., f (x) = i=1 ci χEi (x) for
mutually disjoint E1 , . . . , Ep . To this end, for every Ei , there exists a closed
subset Fi ⊂ Ei and µ(Ei \ Fi ) < pδ (by Theorem 2.26). Let F = ∪pi=1 Fi . Then
F is closed, and µ(E \ F ) < δ. Moreover F1 , . . . , Fp are also mutually disjoint,
and f is constant on each Fi . Hence f is continuous on F .
Next we prove the case where f is a general measurable function. WLOG,
assume |f | ≤ 1 (note that the transformation g(x) = f (x)/(1 + |f (x)|) and its
inverse f (x) = g(x)/(1−|g(x)|)). Then there exists a sequence of non-decreasing
simple functions {fk }, such that fk → f a.e. E. Then for every k, there exists
a closed subset Fk of E with µ(E \ Fk ) < 2δk , such that fk is continuous on Fk .
Let F = ∩∞ ∞
k=1 Fk , then F is closed, and µ(E \ F ) = µ(∪k=1 (E \ Fk )) < δ. Hence
fk ⇒ f on F . Since fk is continuous, we know f is continuous on F .
Corollary 3.37. Suppose f : E → R̄ is finite a.e. E. Then for any δ > 0, there
exists a continuous function g : E → R̄ such that µ({x : f (x) ̸= g(x)}) < δ.
Corollary 3.38. Suppose f : E → R̄ is finite a.e. E. Then there exists a
sequence of continuous functions {gk } such that gk → f a.e. E.
Proof. Consider sequences ϵk , δk ↓ 0. Then for every k, there exists gk such that
µ
µ({x : |f (x) − gk (x)| ≥ ϵk }) < δk . Hence gk → f . By Reisz theorem above,
there exists a subsequence of {gk }, still denoted by {gk }, such that gk → f
a.e. E.

28
3.6 Measurability of composite functions
Lemma 3.39. Suppose f : Rn → R. Then f is measurable iff f −1 (G) ∈ M for
any open set G ⊂ R.
Proof. Necessity is obvious. Now suppose f is measurable. Then for any (a, b) ⊂
R, we know f −1 ((a, b)) = f −1 ((a, ∞)) \ f −1 ([b, ∞)) ∈ M. Recall that for any
open set G ⊂ R, it can be written as the union of countably many disjoint open
intervals as G = ∪∞k=1 (ak , bk ). Hence f
−1
(G) = ∪∞k=1 f
−1
((ak , bk )) ∈ M.
Theorem 3.40. Suppose f : R → R is continuous and g : Rn → R is measur-
able, then f ◦ g : Rn → R is measurable.

Proof. Since f is continuous, we know that for any open G, f −1 (G) is open.
Hence (f ◦ g)−1 (G) = g −1 (f −1 (G)) ∈ M.
Remarks. Note that if f is measurable and g is continuous, f ◦ g is not neces-
sarily measurable.

Theorem 3.41. Suppose T : Rn → Rn is continuous, and µ(T −1 (Z)) = 0 for


any Z ⊂ Rn with µ(Z) = 0, and f : Rn → R is measurable, then f ◦ T is
measurable.
Proof. For any open set G in R, f −1 (G) ∈ M. Hence there exist a Gδ -set H
and a measure zero set Z such that f −1 (G) = H \ Z. Therefore T −1 (f −1 (G)) =
T −1 (H) \ T −1 (Z) ∈ M since T −1 (H), T −1 (Z) ∈ M.
Corollary 3.42. Suppose f : Rn → R is measurable and T : Rn → Rn is a
linear nondegenerate transformation, then f ◦ T is measurable.
Example 3.43. Suppose f : R2 → R is measurable. Show that g(x, y) =
f (x − y) : R2 × R2 → R is measurable.

Proof. Define h(x, y) = f (x). Then for any t ∈ R, {(x, y) : h(x, y) > t} = {x :
f (x) > t} × R ∈ M(R2 ). Hence h is measurable. Let T (x, y) = (x − y, x + y)
then T is a linear nondegenerate transformation. Therefore g(x, y) = h(T (x, y))
is measurable.

29
4 Lebesgue Integrals
4.1 Integral of simple nonnegative functions
n
Definition 4.1 (Integral of simple nonnegative function). PpSuppose f : R →
R̄+ is a simple measurable function such that f (x) = i=1 ci χAi (x), where
{Ai } are mutually disjoint and Rn = ∪pi=1 Ai . Then for any E ∈ M, the integral
of f on E is defined by
Z p
X
f (x) dµ(x) = ci µ(E ∩ Ai ).
E i=1

Recall that we define 0 · ∞ = 0. Hence the integral is not affected if ci = 0


or µ(E ∩ Ai ) = 0. For notation simplicity, we write the integral of f over E in
either of the ways below:
Z Z Z Z Z Z
f (x) dµ(x) = f (x)χE (x) dµ(x) = f dµ = f, and f = f.
E X E E X
Pp
Example 4.2. Let f = Ri=1 ci χAi be a simple nonnegative function. Show
that µ(Ai ) < ∞ for all i if f < ∞.
Example 4.3. Consider the function f (x) = χQ (x) = 1 if x ∈ Q and 0 other-
R1
wise. Then 0 f = 0.
Theorem 4.4 (Linearity of integral). Suppose f, g : Rn → R̄ are simple non-
negative
R measurable
R R any E ∈ M and c ∈ R, there are
R functions. RThen for
E
cf = c E
f and E
(f + g) = E
f + E
g.
Pp
Proof. The
Pq first one is trivial to show. Now suppose f (x) = P
i=1 ai χ
PAqi (x) and
p
g(x) = j=1 bj χBj (x). Then it is clear that f (x) + g(x) = i=1 j=1 (ai +
bj )χAi ∩Bj (x). Hence
Z p X
X q
(f + g) = (ai + bj )µ(E ∩ Ai ∩ Bj )
E i=1 j=1
Xp q
X
= ai µ(E ∩ Ai ) + bj µ(E ∩ Bj )
i=1 j=1
Z Z
= f+ g,
E E

which proves the linearity.


Theorem 4.5. Suppose Ek ∈ M and is increasing in k. Let E = ∪R∞ k=1 Ek .
If f R: Rn → R is a simple nonnegative measurable function, then E f =
limk Ek f .

30
Pp
Proof. Suppose f (x) = i=1 ai χAi (x). Then there is
Z p
X p
X  
lim f = lim ai µ(Ek ∩ Ai ) = ai µ lim (Ek ∩ Ai )
k→∞ Ek k→∞ k→∞
i=1 i=1
p
X   p
X Z
= ai µ lim (Ek ) ∩ Ai = ai µ(E ∩ Ai ) = f,
k→∞ E
i=1 i=1

which completes the proof.

4.2 Integral of general nonnegative functions


Definition 4.6. Suppose f : E → R̄ is nonnegative measurable function. The
integral of f on E is defined by
Z nZ o
f = sup h : h is simple nonnegative, h(x) ≤ f (x), ∀ x ∈ E
E E
R
We call f integrable if E
f < ∞.
Theorem 4.7 (Properties of integral). TheR following
R statements hold:
1. If f, g : E → R+ and f ≤ g, then E f ≤ E g. Hence if g is integrable
then f is integrable. If f ≤ MR and µ(E)
R < ∞, then f is integrable.
2. If A ⊂ E is measurable,R then A
f = E
f χA .
3. If f = 0 a.e. E, then E f = 0.
Proof. Item 1 is trivial. Item 2 can be verified by
Z nZ o
f = sup h : h is simple nonnegative, h ≤ f
A E
nZ o
= sup hχA : h is simple nonnegative, hχA ≤ f χA
E
Z
= f χA
E

For Item 3, let ERk = {x R∈ E : f (x) ≥ k1 }. Then Ek is non-decreasing.


Moreover, k1 µ(Ek ) ≤ Ek f ≤ E f = 0, which implies that µ(Ek ) = 0 for all k.
Moreover {x ∈ E : f (x) > 0} = ∪∞k=1 Ek , therefore µ({x ∈ E : f (x) > 0}) ≤
P∞
k=1 µ(E k ) = 0.
Theorem 4.8 (Integrable functions are finite a.e.). Suppose f : E → R̄+ is
integrable. Then f is finite a.e. E.
Proof. Let Ek = {x : f (x) > k}. ThenR Ek is decreasing and A := {x : f (x) =
∞} = ∩∞
R
E
k=1 k . Note that k · µ(Ek ) ≤ f ≤ f < ∞ and µ(Ek ) < ∞ for all
1
R Ek E
k, we know µ(A) = limk µ(Ek ) ≤ k E f → 0 as k → ∞. Hence µ(A) = 0.

31
R R
Theorem 4.9 (Beppo Levi: fk ↑ f ⇒ fk ↑ f ). Suppose fk : E → R̄+ ,
fk (x) ≤ fk+1
R (x) forR every x ∈ E and k, and limk fk (x) = f (x) for all x ∈ E.
Then limk E fk = E f .
R R
Proof.
R It is clear
R that E f is well defined. Since fk ↑ f , we know E fk ↑ and
limk E fk ≤ E f .
For any simple function h ≤ f and any c ∈ (0, 1), consider Ek = {x : fk (x) ≥
ch(x)}. Then Ek ↑ E. Hence
Z Z Z Z Z
lim fk ≥ lim fk ≥ lim ch = ch = c h.
k→∞ E k→∞ Ek k→∞ Ek E E
R R R R
Letting c → 1, we have limk E fk ≥ E h. Hence limk E fk ≥ E
f.
R R R
Remarks. If E f1 < ∞ and fk ↓ f ≥ 0, then E fk ↓ E f .
Theorem
R 4.10 (Linearity
R of integral).
R Suppose f, g : E → R̄+ and α, β ≥ 0.
Then E (αf + βg) = α E f + β E g.
Proof. Only need to show this for α = β = 1. Let fk ↑ f and gk ↑ g, then
fk + gk ↑ f + g. Hence
Z Z nZ Z o Z Z
(f + g) = lim (fk + gk ) = lim fk + gk = f+ g.
E k→∞ E k→∞ E E E E

R R
Example 4.11. Suppose f, g : E → R̄+ and f = g a.e. E. Then E f = E g.
[Hint: f = f χE1 + f χE2 where E1 = {x : f = g} and E2 = E \ E1 .]
R P∞ P∞ R
Theorem 4.12. Suppose fk : E → R̄+ . Then E k=1 fk = k=1 E fk .
Pk P∞
Proof. Let sk (x) =R i=1 fiR(x) and s(x) = k=1 fk (x) for every x and k. Then
sk ↑ s. Hence limk E sk = E s.

R {EkP
Corollary 4.13. Suppose }k ⊂ M
∞ R
are mutually disjoint. If f is integrable
on E = ∪∞ E
k=1 k , then E
f = k=1 Ek f .

Proof. Let fk = f χEk . Then


∞ Z
X ∞ Z
X ∞
Z X  Z
f= f χEk = f χEk = f
k=1 Ek k=1 E E k=1 E

by the theorem above.


P∞
Remarks. If f ≡ 1, then the corollary above reduces to µ(E) = k=1 µ(Ek ).
Example 4.14. If E1 , . . . , Ep are subsets of [0, 1], and every point on [0, 1] is
covered by at least k of E1 , . . . , Ep where k ≤ p. Then µ(Ei ) ≥ kp for some i.
Pp R 1 Pp R1
Proof. i=1 µ(Ei ) = 0 i=1 χEi (x) ≥ 0
k = k.

32
RLemma 4.15 (Fatou’s Rlemma for integrals). Suppose fk : E → R̄+ . Then
E
lim inf k fk ≤ lim inf k E fk .
Proof. Define gk (x) = inf j≥k fj (x) for every x ∈ E and k. Then gk is non-
decreasing and nonnegative. Hence
Z Z Z Z
lim inf fk = lim gk = lim gk ≤ lim inf fk ,
E k→∞ E k→∞ k→∞ E k→∞ E

since gk (x) ≤ fk (x) on E.


R R
Remarks. If E fk ≤ M for all k, then E lim inf k→∞ fk ≤ M .
Example 4.16. We may have “<” hold in Lemma 4.15 in some cases: let

0, if x = 0


fk (x) = k, if 0 < x < k1
0, if 1 ≤ x ≤ 1


k

R1 R1
for every k. Then fk → 0 on [0, 1]. But 0
limk fk = 0 < 1 = limk 0
fk .

Theorem 4.17. Suppose f : E → R̄+ is finite a.e. E and µ(E) < ∞. If [0, ∞)
is partitioned such that 0 = y0 < y1 < · · · and yk+1 −yk < δP for all k, and define

Ek = {x ∈ E : yk P ≤ f (x) < yk+1 }. RThen f is integrable iff k=1 yk µ(Ek ) < ∞.

Moreover limδ→0 k=1 yk µ(Ek ) = E f .
R
Proof. For every k, there is yk µ(Ek ) ≤ Ek f ≤ yk+1 µ(Ek ) < δµ(Ek )+yk µ(Ek ) ≤
P∞
δµ(E) + yk µ(Ek ). By the squeeze theorem, f is integrable iff k=1 yk µ(Ek ) <
∞. Taking δ → 0 completes the proof.

4.3 Integral of general functions


of R E f +
R
R − 4.18. Suppose f : E → R̄ is measurable. IfR at least
Definition R one
and E f is finite, then the integral of f is defined by E f = E f + − E f − .
If both are finite, then f is called integrable, denoted by f ∈ L(E).
R R + R −
Remarks. Note R thatR E |f | = E f + E f . Hence f ∈ L(E) iff |f | ∈ L(E).
In addition, | E f | ≤ E |f |.

Example 4.19. If f : E → R is bounded, µ(E) R < ∞, then f ∈ L(E). [Hint:


There exists M > 0 such that |f | ≤ M . Hence E |f | ≤ M · µ(E) < ∞.]
Theorem 4.20 (Properties of integral). The following statements hold:
1. If f ∈ L(E), then |f | is finite a.e. RE.
2. If E ∈ M and f = 0 a.e. E, then E f = 0.
3. If f : E → R is measurable,
R g ∈ L(E), and |f | ≤ g, then f ∈ L(E).
4. If f ∈ L(Rn ), then limk Rn \B(0,k) f = 0.

33
Proof. Item 1 follows Theorem 4.8. Item 2 follows from f ± =R 0 a.e. E.
R Item 3
follows from 0 ≤ f ± ≤ g due to |f | ≤ g. Item 4 follows from f ± ≤ |f | < ∞
and fk± ↓ 0 where fk± := f ± χRn \B(0,k) .
R R
RTheorem 4.21 R (Linearity
R of integral). For any c ∈ R, E cf = c E f and
E
(f + g) = E f + E g.
R1
Example 4.22. Suppose f : [0, 1] → R is measurable. If 0 |f (x)| log(1 +
|f (x)|) < ∞, then f ∈ L([0, 1]).
Proof. Let E1 = {x : |f (x)| ≥ e − 1}, then |f (x)| ≤ |f (x)| log(1 + |f (x)|) for all
R1 R
x ∈ E1 . Let E2 = E \ E1 , then |f (x)| ≤ e − 1. Hence 0 |f | = E1 |f | log(1 +
R
|f |) + E2 (e − 1) < ∞.
R
Example 4.23 (Jensen’s inequality). Suppose w : E → R+ and E w = 1. If
ϕ R: [a, b] → RR is convex, f : E → [a, b] is measurable and f ∈ L(E). Then
ϕ( E f w) ≤ E ϕ(f )w.
R
Proof. Denote y0 = E f w. Then a ≤ y0 ≤ b. Since ϕ is convex, there exists
z ∈ R such that ϕ(y) ≥ ϕ(y0 ) + z · (y − y0 ) for all y. Hence by setting y = f (x),
multiplying w on both sides, and taking integral over E, we obtain
Z Z  Z 
ϕ(f (x))w(x) ≥ w(x) ϕ(y0 ) + z · f w − y0 = ϕ(y0 ),
E E E

which is the claimed inequality.


Theorem 4.24. Suppose Pk∞∈ M
E R are mutually disjoint. If f ∈ L(E) where
E = ∪∞
R
E
k=1 k , then E
f = k=1 Ek f .
P∞ R P∞ R
Proof. Note that k=1 Ek f ± = E f ± ≤ E |f | < ∞. Hence k=1 E f =
R R
P∞ R + P∞ R − R
k=1 E f − k=1 E f = E f.
Theorem 4.25 (Absolute continuity of integral). Suppose f ∈ L(E). Then for
any ϵ > 0,R there exists
R δ > 0, such that for any Eδ ⊂ E satisfying µ(Eδ ) < δ,
there is | Eδ f | ≤ Eδ |f | < ϵ.
Proof. WLOG, assume fR : E →R R+ . Then there exists a simple function g such
that 0 ≤ g ≤ f and 0 ≤ E f − E g < 2ϵ . Let M be the bound of g and δ = 2M ϵ
.
ϵ
Then for any Eδ satisfying µ(Eδ ) < δ = 2M , there is
Z Z Z Z
f= (f − g) + g≤ (f − g) + M · µ(Eδ ) < ϵ,
Eδ Eδ Eδ E

which completes the proof.


Example 4.26 (IntermediateR value theorem of integrals). Suppose f ∈ L(E)
where E ⊂ R andR 0 < C := E f < ∞. Then for any c ∈ (0, C), there exists
t ∈ R such that E∩(−∞,t] f = c.

34
R
Proof. Define g(t) = E∩(−∞,t] f (x). From the theorem above, we know that
for any ϵ > 0, there exists δ > 0, such that for any |∆t| < δ, there is
Z
|g(t + ∆t) − g(t)| ≤ |f (x)| < ϵ.
E∩[t,t+∆t)

Hence g(t) is continuous. Since g(−∞) = 0 and g(+∞) = C, by the interme-


diate value theorem of continuous functions, there exists t such that g(t) = c ∈
(0, C).
n n
R
RTheorem 4.27. If f ∈ L(R ), then for any y0 ∈ R , there is Rn f (x + y0 ) =
Rn
f (x).
Theorem 4.28 (Lebesgue dominated convergence theorem). Suppose fk ∈
RL(E) andR fk → f a.e. E. If there exists g ∈ L(E) such that |fk | ≤ g, then
f → E f.
E k

Proof. Define hk = |fk − f |, then hk → 0 a.e. E and 0 ≤ hk ≤ 2g for all k.


Hence hk , 2g ∈ L(E). Moreover, by Fatou’s Lemma 4.15,
Z Z Z Z Z
2g = lim (2g − 2hk ) ≤ lim inf (2g − 2hk ) = 2g − lim sup 2hk
E E k→∞ k→∞ E E k→∞ E
R R R R R
Hence lim supk E
hk ≤ 0 and therefore | E
fk − E
f| ≤ E
|fk −f | = E
hk → 0
as k → ∞.

Remarks. TheRLebesgue dominated convergence theorem actually implies a


stronger result: E |fk − f | → 0 as k → ∞.
R R R
Example 4.29. In general, E fk → E f does not imply E |fk − f | → 0. For
j j+1
R = 1 Rif x ∈ [ 2kR, 2k ) and j is
example, let fk (x) odd, and 0 otherwise. Let
1
f (x) = 2 . Then E fk = E f , but E |fk − f | = 12 .
R
Example 4.30. In general, E |fk − f | → 0 does not imply fk → f a.e. For
example, let fk (x) = 1 if x ∈ [ 2ij , i+1 j
2j ) and 0 elsewhere for k = 2 + i, where
µ
j ≥ 1 and i = 0, . . . , 2j − 1. Then fk → f , but we do not have fk → f a.e.
R µ
Theorem 4.31. If E |fk − f | → 0, then fk → f . Moreover, there exists a
subsequence fkj → f a.e.
Proof. For any ϵ > 0, denote Ek (ϵ) = {x ∈ E : |fk (x) − f (x)| ≥ ϵ}, there is
Z Z Z
ϵ µ(Ek (ϵ)) = ϵ≤ |fk − f | ≤ ϵ |fk − f | → 0.
Ek (ϵ) Ek (ϵ) E

µ
Hence µ(Ek (ϵ)) → 0 as k → ∞, i.e., fk → f .
Remarks. The converse is not true in general. See Example 4.16.

35
Theorem 4.32 (Bounded
R convergence
R theorem). If |fk | ≤ M , fk → f a.e. E,
and µ(E) < ∞, then E fk → E f .

R g ∈ L(E)
Proof. Set g(x) = M and note that R since µ(E) < ∞. Then by
Dominated Convergence Theorem, E fk → E f .
µ
Theorem 4.33 (Dominated convergence theorem for fk → f ). Suppose fk ∈
µ
L(RnR), fk → fR, and there exists g ∈ L(Rn ) such that |fk | ≤ g. Then f ∈ L(Rn )
and Rn fk → Rn f .
µ R
RProof. Since fk → fn, there exists a subsequence fkj → f a.e.R and E fRkj →
E
f . Hence f ∈ L(R ), and |f | ≤ g a.e. It remains to show that Rn fk → Rn f .
For any ϵ > 0, we Rfirst choose R > 0 large enough, and denote B = B(0, R)
for short, such that 2 Rn \B g < 3ϵ (by Theorem 4.20 Item 4). Then
Z Z
ϵ
|fk − f | ≤ 2 g< .
Rn \B Rn \B 3

Now we work on B which is bounded. We choose δR> 0 small enough, such


that for any Eδ ⊂ B satisfying µ(Eδ ) < δ, there is 2 Eδ g < 3ϵ (by Theorem
µ
4.25). Hence Eδ |fk − f | ≤ 2 Eδ g < 3ϵ . Since fk → f , there exists an integer
R R

K large enough, such that µ(Ck ) < δ for all k ≥ K, where Ck := {x ∈ B :


ϵ
|fk − f | ≥ 3m } ⊂ B and m := µ(B). Then there is
Z Z Z Z
ϵ ϵ ϵ
|fk − f | = |fk − f | + |fk − f | + |fk − f | < + + m =ϵ
n
R \B Ck B\Ck 3 3 3m
R
for all k ≥ K. Hence |fk − f | → 0.
Example 4.34. Suppose f ∈ C([0, ∞)), and f (x) → l as x → ∞, then for any
RA
A > 0, there is limk 0 f (kx) = Al.
Proof. Since f (x) → l, we know there exists X > 0 such that |f (x)| < |l| + 1 for
all x ≥ X. Since f is continuous, there exists m = max0≤x≤X |f (x)| < ∞. Then
|f (x)| ≤ M for all x ≥ 0 where M = max(m, |l| + 1). Define fk (x) = f (kx),
then fk (x) ≤ M on [0, A]. By the bounded convergence theorem, there is
Z A Z A Z A Z A
lim f (kx) = lim fk (x) = lim fk (x) = lim f (kx) = Al.
k→∞ 0 k→∞ 0 0 k→∞ 0 k→∞

R1 kx sin x
Example 4.35. For any α > 1, show that 0 1+(kx)α
dx → 0 as k → 0.
kx sin x kx 1
Proof. Denote fk (x) = 1+(kx)α . Then |fk (x)| ≤ 1+(kx)α ≤ (kx)α−1 → 0. By
R1 R1
DCT Theorem 4.28, limk 0 f = 0
limk fk = 0.
R∞ 2 2
k2 xe−k x
Example 4.36. For any a > 0, show that a 1+x2 → 0 as k → ∞.

36
Proof. Consider change of variable u = kx and then
∞ 2 2 ∞ 2 ∞ 2
k 2 xe−k x ue−u ue−u
Z Z Z
dx = du = χ[ka,∞) (u) du.
a 1 + x2 ka 1 + (u/k)2 0 1 + (u/k)2
−u2
ue
Denote fk (u) = χ[ka,∞) (u) 1+(u/k) 2 , then it is clear that fk (u) → 0 a.e. and

−u 2 R∞
|fk (u)| ≤ ue ∈ L(R). Hence by DCT there is limk 0 f = 0.
P∞ R P∞
Corollary 4.37. Let fk ∈ L(E). P∞ If k=1 E |fk | < ∞, then k=1 fk (x) con-
verges
R P∞E.RDefine f (x) = k=1 fk (x) for every x ∈ E. Then f ∈ L(E) and
a.e.
E
f = k=1 E fk .
Pk P∞
Proof. Let sRk (x) = Pi=1 |fRi (x)| and s(x) = k=1 |fk (x)| for every x ∈ E. Then
R ∞
E
s = limk E sk = k=1 E |fk | < ∞ by Theorem 4.12. Hence s ∈ L(E) and
Pk
s is finite a.e. E. Define
R gk (x) R = i=1 fi (x), thenR |gk (x)|
P∞≤ s(x).
R By Theorem
4.28 (DCT), there is E f = E limk gk = limk E gk = k=1 E fk .
Theorem 4.38 (Interchange derivative and integral). Suppose f : E × (a, b) →
R, f (·, y) ∈ L(E) for every y ∈ (a, b), and f (x, ·) is differentiable on (a, b) for
every x ∈ E. If there exists g ∈ L(E) such that |∂y f (x, y)| ≤ g(x) for any
(x, y) ∈ E × (a, b), then for any y ∈ (a, b), there is
Z  Z 
d  
f (x, y) dµ(x) = ∂y f (x, y) dµ(x).
dy E E

Proof. For any y ∈ (a, b), consider any nonzero sequence ek → 0, and define
fk (x) = e1k (f (x, y + ek ) − f (x, y)) for every x ∈ E. By the mean value theorem
of derivatives, there exists ξk ∈ (y, y + ek ), such that |fk (x)| = |∂y f (x, ξk )| ≤
g(x) ∈ L(E). Note that limk fk (x) = ∂y f (x, y). Hence by Theorem 4.28 (DCT),
there is
Z Z Z Z 
d   
f (x, y) dµ(x) = lim fk = lim fk = ∂y f (x, y) dµ(x),
dy E k→∞ E E k→∞ E

which completes the proof.


P∞ 1
Example 4.39. Let f ∈ L(R) and k=1 ak < ∞ where ak > 0 for all k. Then
limk f (ak x) = 0 a.e.
P∞ R P∞ R
Proof. Denote fk (x) = f (ak x). Then k=1 |fk | = k=1 |f (aakk x)| d(ak x) =
P∞ R |f (x)| P∞
k=1 ak dx < ∞. Hence k=1 |fk | is finite a.e., which implies fk → 0
a.e.
n
Theorem 4.40. If f ∈ L(E),R then for any ϵ > 0, there exists g ∈ C(R ) with
compact support, such that E |f − g| < ϵ.

37
R
Proof. For any ϵ > 0, there exists a compact set K such that E\K |f | < ϵ/4
R
and a simple function h̃ with supp(h̃) ⊂ K such that K |f − h̃| < ϵ/4. Define
h = h̃χK , then
Z Z Z Z
ϵ
|f − h| = |f − h̃χK | = |f − h̃| + |f | < .
E E K E\K 2

Let M > 0 be such that |h| ≤ M . By Theorem 3.36 (Lusin), for δ = ϵ/(4M ),
there exists (closed) F ⊂ K and a continuous function g where |g| ≤ M and
supp(g) ⊂ K, such that µ(K \ F ) < δ = ϵ/(4M ) and h|F = g|F . Then
Z Z Z
ϵ ϵ
|h − g| = |h − g| = |h − g| ≤ 2M µ(K \ F ) < 2M · = .
E K K\F 4M 2
R R R
Hence E
|f − g| ≤ E
|f − h| + E
|h − g| < ϵ.
Corollary 4.41. Suppose f ∈ L(E). Then there exists a sequence
R of continuous
functions {gk } with bounded support for every k, such that E |gk − f | → 0 and
gk → f a.e. E.
R
Proof. The first claim follows the theorem above. Since E |gk − f | → 0, we
µ
know gk → f and there exists a subsequence of {gk }, still denoted by {gk }, such
that gk → f a.e. E.

Example 4.42. Suppose f ∈ L(Rn ). If Rn f ϕ = 0 for any continuous function


R

ϕ with compact support, then f = 0 a.e.


Proof. Suppose not, i.e., µ({x : f (x) ̸= 0}) > 0. WLOG, assume µ({x : f (x) >
0}) > 0, then there exists c > 0 and E ⊂ Rn such that µ(E) > 0 and f (x) ≥ c
on E (because {x :P f (x) > 0} = ∪∞ 1
k=1 Ek where Ek = {x : f (x) ≥ k } and
∞ ∞
0 < µ(∪k=1 Ek ) ≤ k=1 µ(Ek ), which means µ(Ek ) > 0 for some k). Then
there exists a sequence of continuous functions {ϕk } with compact supports such
that ϕk ↑ χE and |χE − ϕk | → 0. Note that |f ϕk | ≤ |f χE | ≤ |f | ∈ L(Rn ), we
R

have Z Z Z
0 < cµ(E) ≤ f χE = lim f ϕk = lim f ϕk = 0,
k→∞ k→∞

where we applied Theorem 4.28 (DCT) to obtain the second equality. Contra-
diction.
Rb
Example 4.43. Suppose f ∈ L([a, b]). If a f ϕ′ = 0 for any differentiable
function ϕ with support supp(ϕ) ⊂ (a, b), then f ≡ c a.e. for some c.

Proof. For any continuous function g with supp(g) ⊂ (a, b) and continuous
Rb
function h with supp(h) ⊂ (a, b) and a h = 1, define
Z x Z b Z x
ϕ(x) = g(t) dt − g(t) dt · h(t) dt.
a a a

38
Rb Rb
Note that supp(ϕ) ⊂ (a, b). Then ϕ′ (x) = g(x) − a g(t) dt · h(x) and a f ϕ′ =
Rb Rb Rb Rb Rb
a
f g − a f h a g = a (f − a f h)g(x) dx. Since g is arbitrary, we have f (x) =
Rb Rb
a
f h a.e. for all continuous h and a h = 1.
Theorem 4.44. If f ∈ L(Rn ), then limh→0 Rn |f (x + h) − f (x)| dx = 0.
R

Proof. By Theorem 4.40 above, we can decompose f = fR1 + f2 , where f1 is


continuous and has compact support, and f2 is such that Rn |f2 | < ϵ/4. Since
f1 is continuous on a compact set K, f1 is uniformly continuous, there exists
δ > 0 such that |f1 (x + h) − f1 (x)| < ϵ/(2µ(K)) for any h ∈ (0, δ). Therefore
Z Z
ϵ ϵ
|f1 (x + h) − f1 (x)| dx = |f1 (x + h) − f1 (x)| dx < µ(K) · = .
R n K 2µ(K) 2
Therefore, we obtain that
Z Z Z
ϵ ϵ
|f (x+h)−f (x)| ≤ |f1 (x+h)−f1 (x)|+ |f2 (x+h)−f2 (x)| < + = ϵ,
Rn Rn Rn 2 2
R R
where we used the fact that Rn
|f2 (x + h) − f2 (x)| ≤ 2 Rn
|f2 | < ϵ/2.
Corollary 4.45. If f ∈ L(E), then there exists a sequence of simple functions

R k } where supp(ϕk ) is compact for every k, such that ϕk → f a.e. E and
E
|ϕk − f | → 0.

RProof. For anyϵ


ϵ > 0, there exists continuous g with bounded supp(g) such that
ϕ| < 2ϵ .
R
E
|f − g| < 2 . For g, there exists simple function ϕ such that |g −
Hence |f − ϕ| < ϵ. Let ϵk = k1 , then there exists ϕk such that |f − ϕk | → 0.
R R
µ
Hence ϕk → f and there exists a subsequence of {ϕk }, still denoted by {ϕk },
such that ϕk → f a.e.

4.4 Relation between Riemann and Lebesgue integrals


We consider the integrals of bounded functions defined on I = [a, b] only. Recall
(n) (n)
the definition of Riemann integral: consider a partition ∆(n) : a = x0 < x1 <
(n) (n) (n)
· · · < xkn = b of I into kn segments. Denote |∆(n) | = max1≤i≤kn |xi − xi−1 |.
For such a partition ∆(n) , denote
(n) (n) (n) (n) (n) (n)
Mi = sup{f (x) : xi−1 ≤ x ≤ xi }, mi = inf{f (x) : xi−1 ≤ x ≤ xi }

Then the Darboux upper and lower integrals are defined by the two limits below
as |∆(n) | → 0 and n → ∞:
Z b kn Z b kn
(n) (n) (n) (n) (n) (n)
X X
f= lim Mi (xi − xi−1 ), f= lim mi (xi − xi−1 )
a |∆(n) |→0 a |∆(n) |→0
i=1 i=1

Definition 4.46 (Riemann integral). We call a function f : I → R Riemann


Rb Rb
integrable on I if a f = a f .

39
For a sequence of partitions {∆(n) : n ∈ N} where ∆(n+1) is a refinement of
(n)
∆ for every n (i.e., ∆(n+1) retains all the partition points in ∆(n) and may
Pkn (n) (n)
add new points). Define wn (x) = i=1 (Mi − mi )χ[x(n) ,x(n) ) (x) ≥ 0, then
i i−1
wn+1 (x) ≤ wn (x) for all n and x. Suppose |f | ≤ M for some M > 0, then
|wn (x)| ≤ 2M (b − a).
We also define the oscillation of the function f at every point x ∈ I by

wf (x) = lim sup{|f (x′ ) − f (x′′ )| : x′ , x′′ ∈ B(x, δ)}


δ→0

It is easy to verify that f is continuous at x iff wf (x) = 0: Sufficiency is trivial;


For necessity, for any ϵ > 0, there exists δ0 > 0 such that |f (x′ )−f (x)| < ϵ/4 for
all x′ ∈ B(x, δ0 ). Hence |f (x′ ) − f (x′′ )| < ϵ/2 for all x′ , x′′ ∈ B(x, δ0 ). Denote
Wf (x, δ) := sup{|f (x′ ) − f (x′′ )| : x′ , x′′ ∈ B(x, δ)}, then for all δ ∈ (0, δ0 ), there
is Wf (x, δ) ≤ Wf (x, δ0 ) ≤ ϵ/2 < ϵ. Hence wf (x) = limδ→0 Wf (x, δ) = 0.

Lemma 4.47. The oscillation wf : (a, b) → R is a measurable function.


Proof. It suffices to show that Et := wf−1 ((−∞, t)) is measurable (we actually
can show it is open) for any t ∈ R. For any x ∈ Et , we know wf (x) < t, and
hence there exists δ0 > 0 such that W (x, δ0 ) < t. For any y ∈ B(x, δ0 ), there
exists δy > 0 such that B(y, δy ) ⊂ B(x, δ0 ), and hence W (y, δ) ≤ W (y, δy ) ≤
W (x, δ0 ) < t for all δ < δy . This implies that wf (y) = limδ→0 W (y, δ) < t, i.e.,
y ∈ Et . As y is arbitrary, we know B(x, δ0 ) ⊂ Et , which means Et is open.
It is trivial to extend the domain of wf to I = [a, b] and keep its measurabil-
ity. We hereafter consider its domain on I. Moreover, it is easy to verify that
wn → Rwf as |∆R(n) | → 0 and n → ∞. Since |wn | ≤ 2M (b − a) and µ(I) < ∞, we
know wn → wf by DCT. Due to the definition of the Darboux upper and
R Rb Rb
lower integrals, we also have wf = a f − a f .

Theorem 4.48. Suppose f : [a, b] → R is bounded. Then f is Riemann inte-


grable iff µ({x ∈ [a, b] : f (x) not continuous at x}) = 0
R
Proof. Necessity. If f is Riemann integrable, then wf = 0. Hence wf = R 0 a.e.
Sufficiency. Suppose f is continuous a.e. Then wf = 0 a.e. Then 0 = wf =
Rb Rb
a
f − a f . Hence f is Riemann integrable.

Theorem 4.49. If f is Riemann integrable on [a, b], then f ∈ L([a, b]).


Proof. From the theorem above, f is continuous Ra.e. [a, P b]. For any partition
n R xi
∆ : a = x0 < x1 < · · · < xn = b, there is I f = i=1 xi−1 f . Hence
Pn Rb Pn
i=1 mi (xi −xi−1 ) ≤ a f ≤ i=1 Mi (xi −xi−1 ). Since f is Riemann integrable
Rb Rb Rb
by taking limit |∆| → 0, we obtain a f = a f = a f

40
4.5 Iterated integrals
We denote F the set of all nonnegative measurable functions f : Rn = Rp ×Rq →
R+ that satisfy the following three properties:
1. For a.e. x ∈ RRp , f (x, ·) ≥ 0 is measurable on Rq .
2. Let Ff (x)R := Rq f (x,Ry) dy.R Then Ff ismeasurable
R and Ff ≥ 0 a.e. Rp .
3. There is Rp Ff dx = Rp Rq f (x, y) dy dx = Rn f dx dy.
Lemma 4.50. The following statements hold for the set F:
(i) If f ∈ F and a ≥ 0, then af ∈ F.
(ii) If f1 , f2 ∈ F, then f1 + f2 ∈ F.
(iii) If f, g ∈ F, f − g ≥ 0, and g ∈ L(Rn ), then f − g ∈ F.
(iv) If fk ∈ F, fk ≤ fk+1 for all k, and limk fk = f , then f ∈ F.
Proof. It is trivial to verify (i) and (ii). For (iii), since g ∈ F and g ∈ L(Rn ),
we know Fg is finite a.e. Rp . For every x ∈ Rp where Fg (x) < ∞, we know
g(x, ·) ∈ L(Rq ) and hence g(x, ·) is finite a.e. Rq . Hence g(x, y) is finite a.e. Rn .
Then it is easy to verify the three properties of f − g, which implies f − g ∈ F.
For (iv), it is easy to verify Property 1 of F for f . By Theorem 4.9 (Beppo-
Levi), we know
Z Z
Ff (x) = f (x, y) dy = lim fk (x, y) dy = lim Ffk (x)
Rq k→∞ Rq k→∞

which implies that Ff ≥ 0 is measurable (as the limit of a sequence of measurable


functions). Moreover, as Ffk ↑ Ff , we know
Z Z Z Z
Ff (x) dx = lim Ffk (x) dx = lim fk dx dy = f dx dy
Rp k→∞ Rp k→∞ Rn Rn

where we used Theorem 4.9 (Beppo-Levi) to obtain the first equality, the Prop-
erty 3 of fk ∈ F for the second equality, and Beppo-Levi again for the last
equality. This verifies Property 3 of f . Hence f ∈ F.
Theorem 4.51 (Tonelli). If f : Rn = Rp ×Rq → R+ is measurable, then f ∈ F.
Proof. By Lemma 4.50(iv) and that every measurable function is a limit of a
sequence of simple functions, it suffices to prove the case where f is simple. Due
to Lemma 4.50(ii), it suffices to prove χE ∈ F where E is measurable. As E
can be written as a disjoint union of an Fσ -set K and measure zero set Z, we
prove the claim in the following steps.
Firstly, it is easy to verify that χE ∈ F if E is a (possibly) half-open half-
closed box (an open box plus some of its 2n facets) in Rn .
Secondly, for any open G ⊂ Rn , we can rewrite G as a disjoint union G =

∪k=1 Ik where each Ik is a half-open half-closed box. Let Ek = ∪kj=1 Ij , then we
know that χEk ∈ F from Lemma 4.50(ii), and then χE ∈ F from χEk ↑ χG and
Lemma 4.50(iv).
Thirdly, we show that χF ∈ F if F ⊂ Rn is bounded and closed. To this end,
we first know F ⊂ G1 := B(0, k) for some k ∈ N. Hence G2 = G1 \ F = G1 ∩ F c

41
is open. Since χG1 − χG2 ≥ 0 and χG2 ∈ L(Rn ), we know by Lemma 4.50(iii)
that χF = χG1 − χG2 ∈ F.
Fourthly, we show that if Ek ↓ E and µ(E1 ) < ∞, χEk ∈ F for all k, then
χE ∈ F. To this end, let Dk = E1 \ Ek and D = E1 \ E, then 0 ≤ χDk ↑ χD
and χD ∈ L(Rn ) (since µ(E1 \ E) ≤ µ(E1 ) < ∞). Hence by Lemma 4.50(iv)
and Theorem 4.28 (DCT) we can see that χD ∈ F. Hence χE = χE1 − χD by
Lemma 4.50(iii).
Fifthly, we show that if µ(E) = 0 then χE ∈ F. To this end, consider a
sequence of non-increasing open sets Gk such that E ⊂ Gk and µ(Gk ) → 0.
Let H = ∩∞ k=1 Gk , then E ⊂ H and µ(H) = 0. From the second R and fourth
points above, we know χH ∈ F and hence χH (x, ·) = 0 and Rn χH = 0. As
0 ≤ χE ≤ χH , we know χE satisfies all three properties of F and hence χE ∈ F.
Sixthly, we show that if K is an Fσ -set and µ(K) < ∞, then χK ∈ F.
Suppose K = ∪∞ k=1 Fk where Fk is closed and bounded for all k. Let Dk =
Sk \ Sk−1 where Sk := ∪kj=0 Fk (assume F0 = ∅). Note that both Fk and Sk−1
are closed bounded sets, and hence by Lemma 4.50(iii) and the third point
above, we know that χDk = χSk − χSk−1 ∈ F. Therefore, by Lemma 4.50(ii),
χ∪kj=1 Dj = χSk ∈ F. Since Sk ↑ K, we know χK ∈ F by Lemma 4.50(iv).
Finally, let E = K ∪ Z where K is an Fσ -set and Z is a measure zero set,
and K ∩ Z = ∅. Then χE = χK + χZ ∈ F.
Theorem 4.52 (Fubini). If f ∈ L(Rn ), then the following statements hold:
1. For a.e. x ∈ RRp , f (x, ·) ∈ L(Rq ).
p
 ∈ L(R
2. Let Ff (x)R = Rq fR(x, y)R dy, then Ff (x) R ).R 
3. There is Rn f = Rp Rq f (x, y) dy dx = Rq Rp f (x, y) dx dy.
Proof. Let f = f + − f − . Since f ∈ L(Rn ), we know f ± ∈ L(Rn ). From
Theorem 4.51 we know f ± ∈ F, which implies the claims as all integrals are
finite.
Example 4.53. Suppose f ∈ L([0, ∞)) and a > 0, then
Z ∞ Z ∞ Z ∞
 f (y)
f (y)e−xy dy sin(ax) dx = a 2 + y2
dy.
0 0 0 a
R∞
Proof. Note that for any fixed y > 0, there is 0 e−xy sin(ax) dx = a2 +y a
2

(apply integration by parts twice). Hence we just need to show the condition in
Theorem 4.52 (Fubini) holds, i.e., sin(ax)f (y)e−xy ∈ L([0, ∞)2 ).
Note that | sin(ax)f (y)e−xy | ≤ |f (y)|
R ∞∈ L([δ, X] × (0, ∞)) for any 0 < δ <
X < ∞ (the integrand is bounded by 0 |f (y)| dy). Hence, by Theorem 4.52
(Fubini) on [δ, X] × (0, ∞), there is
Z X Z ∞  Z X Z ∞
f (y)e−xy dy sin(ax) dx = sin(ax)f (y)e−xy dy dx
δ 0 δ 0

42
For any fixed y > 0, e−xy is decreasing in x and hence the second mean value
theorem for integrals implies that there exists c ∈ (δ, X) such that
Z X Z c Z X
−xy −δy −Xy
e sin(ax) dx = e sin(ax) dx + e sin(ax) dx.
δ δ c

Therefore, we can show that


Z X Z c Z X 4
−xy
e sin(ax) dx ≤ sin(ax) dx + sin(ax) dx ≤

a

δ δ c

for all 0 < δ < X < ∞. Let δk → 0 and Xk → ∞ as k → ∞, and denote


RX R∞
Gk (y) := δk k e−xy sin(ax)f (y) dx, then we know Gk (y) → 0 e−xy sin(ax)f (y) dx
as k → ∞, and |Gk (y)| ≤ 4|f (y)|/a ∈ L((0, ∞)). Then we have
Z ∞ Z ∞  Z Xk Z ∞ 
−xy
f (y)e dy sin(ax) dx = lim f (y)e−xy dy sin(ax) dx
0 0 k→∞ δk 0
Z ∞ Z Xk 
= lim f (y)e−xy sin(ax) dx dy
k→∞ 0 δk
Z ∞
= lim Gk (y) dy
k→∞ 0
Z ∞ Z ∞ 
= e−xy sin(ax)f (y) dx dy
0 0

where the second equality is due to Theorem 4.52 (Fubini) on [δk , Xk ] × (0, ∞),
the third equality is due to the definition of Gk , and the last equality is due to
Theorem 4.28 applied to the sequence Gk (y).
R∞ 2

Example 4.54. Show that 0 e−x dx = 2π .
R∞R∞ 2 2 2
Proof. Consider 0 0 ye−y x e−y dx dy. Then by Theorem 4.51 (Tonelli),
we have
Z ∞Z ∞ Z ∞ Z ∞ 
−y 2 x2 −y 2 −y 2 2 2
ye e dx dy = e ye−y x dx dy
0 0 0 0
Z ∞ 2
 Z ∞ 2

−y
= e dy · e−u du
0 0
Z ∞
2
2
= e−y dy
0

where we applied the change of variable u = yx. On the other hand,


Z ∞Z ∞ Z ∞
−y 2 x2 −y 2 1 1 ∞ π
ye e dy dx = 2 + 1)
dx = arctan(x) 0 = .
0 0 0 2(x 2 4
R ∞ −y2 √
Hence 0 e dy = 2π .
R∞ 2
Remarks. An alternative proof is based on polar coordinate: ( 0 e−y dy)2 =
R ∞ R ∞ −x2 −y2 R π/2 R ∞ −ρ2
0 0
e e dx dy = 0 0
e ρ dρ dθ = π4 .

43
4.6 Convolution
Definition
R 4.55 (Convolution). Suppose f, g are measurable on E ⊂ Rn . We
call Rn f (x − y)g(y) dy, a function of x, the convolution of f and g, denoted by
f ∗ g or (f ∗ g)(x), if the integral exists for every x ∈ E. Note that f ∗ g = g ∗ f .

Theorem 4.56. If f, g ∈ L(Rn ), then f ∗ g is finite a.e., and


Z Z  Z 
|f ∗ g| ≤ |f | · |g| .
Rn Rn Rn

Proof. We first consider the case where f, g ≥ 0. By Theorem 4.51 (Tonelli),


there is
Z Z Z Z 
f (x − y)g(y) dy dx = f (x − y) dx g(y) dy
Rn Rn Rn Rn
Z  Z 
= f (x) dx · g(y) dy < ∞.
Rn Rn

For general functions f, g, note that |f ∗ g| ≤ |f | ∗ |g|.


Example 4.57 (Convolution is continuous). Suppose f ∈ L(Rn ), and g is
measurable and uniformly bounded a.e., then F (x) = (f ∗ g)(x) is uniformly
continuous on Rn .

Proof. Suppose M > 0 is such that |g| ≤ M a.e. Since f ∈ L(RRn ), by Theorem
4.44, we know for any ϵ > 0, there exists δ > 0 such that Rn |f (x + h) −
f (x)| dx < ϵ/M for all |h| ≤ δ. Hence,
Z
|F (x + h) − F (x)| ≤ |f (x + h − y) − f (x − y)||g(y)| dy
Rn
Z
ϵ
≤M |f (z + h) − f (z)| dz < M · = ϵ,
R n M

for all |h| ≤ δ. Hence F is uniformly continuous on Rn .

44
5 Signed Measures and Differentiations
5.1 Signed measure and decomposition
Definition 5.1 (Signed measure). Suppose (X, M) is a measurable space. A
function ν : M → R̄ is called a signed measure if (i) ν(∅) = 0; (ii)
Pν∞ can take ∞
or −∞ but not both; (iii) countable additivity: ν(∪∞ k=1 Ek ) = k=1 ν(Ek ) for
any countable family of mutually disjoint sets {Ek }.
Note that if in addition ν(E) ≥ 0 for any E ∈ M, then ν is called a positive
measure, or simply measure.

Theorem 5.2 (Continuity of signed measure). If Ek is increasing and E =


∪∞
k=1 Ek , then ν(Ek ) → ν(E). If Ek is decreasing, ν(E1 ) < ∞, and E =
∩∞
k=1 Ek , then ν(Ek ) → ν(E).

Proof. If Ek is increasing, then denote Dk = Ek \ ∪k−1


i=1 Ei . By countable addi-
tivity, there is
∞ k k ∞
∞ X X X X
ν(E) = ν( ∪ Ek ) = ν(Dk ) = lim ν(Di ) = lim ν(Ei ) = ν(Ek ).
k=1 k→∞ k→∞
k=1 i=1 i=1 k=1

If Ek is decreasing, then consider Fk = E1 \ Ek which is increasing, the rest


follows similarly.

Definition 5.3 (ν-positive/negative/null set). Suppose ν is a signed measure on


(X, M). Then a set E is called ν-positive (resp. ν-negative, ν-null) if ν(F ) ≥ 0
(resp. ≤ 0, = 0) for any F ⊂ E.
Lemma 5.4. If {Pk } are ν-positive, then ∪∞
k=1 Pk is ν-positive.

Proof. Denote Qk = Pk \ ∪k−1 ∞


i=1 Pi , then for any E ⊂ ∪k=1 Pk , there is


∞ X
ν(E) = ν(E ∩ ( ∪ Qk )) = ν(E ∩ Qk ) ≥ 0,
k=1
k=1

since E ∩ Qk ⊂ Pk .
Theorem 5.5 (Hahn Decomposition Theorem). If ν is a signed measure on
(X, M), then there exists positive P and negative N such that X = P ∪ N ,
P ∩ N = ∅. If P ′ and N ′ is another such pair, then P △P ′ and N △N ′ are null.
The pair (P, N ) is called a Hahn decomposition of ν.
Proof. (i) WLOG, assume that ν : M → R ∪ {−∞}. Consider the family of ν-
positive sets: P = {P ∈ M : P is ν-positive}. Define m = sup{ν(P ) : P ∈ P},
then there exists a sequence {Pk } ⊂ P such that limk ν(Pk ) = m < ∞. Define
P = ∪∞ k=1 Pk and N = X \ P . Then, by Lemma 5.4, P is ν-positive, and
ν(P ) = m.

45
(ii) Now we only need to show that N is ν-negative. First of all, if E ⊂ N
and ν(E) > 0, then E cannot be ν-positive: otherwise E ∩ P ⊂ N ∩ P = ∅ and
E ∪ P is ν-positive, and hence ν(E ∪ P ) = ν(E) + ν(P ) > m, contradiction.
Next, for any E ⊂ N with ν(E) > 0, there exists B ⊂ E such that ν(B) < 0
(since E is not ν-positive), then let A = E \ B, we have A ⊂ E and ν(A) =
ν(E) − ν(B) > ν(E).
Now we are ready to show that N is ν-negative. If not, then there exists A0 ⊂
N such that ν(A0 ) > 0. Then choose the smallest n1 ∈ N such that there exists
A1 ⊂ A0 that satisfies ν(A1 ) ≥ ν(A0 ) + n11 > ν(A0 ); then choose the smallest
n2 ∈ N such that there exists A2 ⊂ A1 and ν(A2 ) ≥ ν(A1 ) + n12 > ν(A1 ); and so
on. Note that Ak is decreasing, ν(Ak ) is increasing, and ν(Ak ) ≥ ν(Ak−1 ) + n1k
for allP k. Let A = ∩∞ k=1 Ak = limk Ak . Then ∞ > ν(A) = limk ν(Ak ) ≥
ν(A0 ) k n1k > 0. Therefore nk → ∞ as k → ∞. Since A ⊂ N and ν(A) > 0,
there again exists B ⊂ A and n ∈ N such that ν(B) ≥ ν(A) + n1 > ν(A).
Therefore, there exists k large enough, such that nk > n, but B ⊂ A ⊂ Ak−1 ,
and ν(B) ≥ ν(A) + n1 ≥ ν(Ak−1 ) + n1 , which contradicts to the construction of
nk (smallest integer) and Ak . Hence N must be ν-negative.
(iii) If P ′ and N ′ is another such pair, then P \ P ′ ⊂ P and P \ P ′ ⊂ N ′ .
Hence P \ P ′ is both ν-positive and ν-negative, and therefore is ν-null. Note
that N △N ′ = P △P ′ is therefore also ν-null.
Definition 5.6 (Mutually singular measures). Two signed measures µ and ν
are called mutually singular, denoted by µ ⊥ ν, if there exist E ∈ M such that
E is ν-null and E c is µ-null.

Theorem 5.7 (Jordan Decomposition Theorem). If ν is a signed measure on


M, then there exist unique positive measures ν + and ν − , such that ν + ⊥ ν −
and ν = ν + − ν − .
Proof. Let X = P ∪ N be a Hahn decomposition of ν. Define ν ± such that
ν + (E) = ν(E ∩ P ) and ν − (E) = −ν(E ∩ N ) for any E ∈ M. Then it is easy
to verify that both ν + and ν − are positive measures on M. Moreover, for any
E ⊂ N , ν + (E) = ν(E ∩P ) = ν(∅) = 0; and for any E ⊂ P , ν − (E) = ν(E ∩N ) =
ν(∅) = 0. Hence N is ν + -null and P = N c is ν − -null, i.e., ν + ⊥ ν − .
If there exists another Hahn decomposition X = P ′ ∪ N ′ with µ± defined
similarly, then P △P ′ is ν-null. Hence, for any E ∈ M, there is µ+ (E) =
ν(E ∩ P ′ ) = ν(E ∩ P ) = ν + (E). Hence µ+ = ν + . Similarly µ− = ν − .
Definition 5.8 (Total variation of signed measure). |ν| = ν + + ν − is called the
total variation of ν. That is, |ν|(E) = ν + (E) + ν − (E) = ν(E ∩ P ) − ν(E ∩ N )
for any E ∈ M.
Definition 5.9 (Integrable wrt. signed measure). WeR call f integrable with
respect to ν, where the integral is denoted by f dν = f dν + − f dν − , if f
R R

is integrable with respect to both ν + and ν − .

46
5.2 Radon-Nikodym theorem
Definition 5.10 (Absolute continuity). We say a signed measure ν is absolutely
continuous with respect to a positive measure µ, denoted by ν ≪ µ, if ν(E) = 0
for any E ∈ M with µ(E) = 0. Note that ν ≪ µ iff ν ± ≪ µ iff |ν| ≪ µ.
Theorem 5.11. If ν ⊥ µ and ν ≪ µ, then ν = 0.
Proof. Let E be such that E is µ-null and E c is ν-null. Then E is also ν-null
since ν ≪ µ. Hence X is ν-null, i.e., ν = 0.
Theorem 5.12. Suppose ν is a signed measure and µ is a measure, then ν ≪ µ
iff for any ϵ > 0, there exists δ > 0 such that |ν(E)| < ϵ for all E satisfying
µ(E) < δ.
Proof. Since ν ≪ µ iff |ν| ≪ µ, we only need to show this for positive measure ν.
Sufficiency is trivial. To prove necessity, assume that there exists ϵ0 > 0, such
that for any k ∈ N there are |ν(Ek )| > ϵ0 and µ(Ek ) < 2−k . Let Fk = ∪∞ i=k Ei
and F = ∩∞ k=1 F k . Then µ(F k ) ≤ 2 1−k
and µ(F ) = lim k µ(F k ) = 0. However
ν(Fk ) ≥ ν(Ek ) ≥ ϵ0 for all k, which implies ν(F ) = limk ν(Fk ) ≥ ϵ0 and
contradicts to ν ≪ µ.
Lemma 5.13. Suppose ν and µ are finite measures on M. Then either ν ⊥ µ
or there exist ϵ > 0 and E ∈ M with µ(E) > 0, such that E is (ν − ϵµ)-positive.
Proof. Consider signed measures ν − k −1 µ with a Hahn decomposition X =
Pk ∪ Nk , for any k ∈ N. Let P = ∪∞ ∞
k=1 Pk and N = ∩k=1 Nk . Note that N is
(ν − k −1 µ)-negative for all k, and hence 0 ≤ ν(N ) ≤ k −1 µ(N ) → 0. Hence N
is ν-null.
If P is µ-null, then ν ⊥ µ and done. Otherwise, µ(P ) > 0, and hence there
exists k such that µ(Pk ) > 0, and Pk is (ν − k −1 µ)-positive. Taking E = Pk
and ϵ = k −1 completes the proof.
Theorem 5.14 (Radon-Nikodym). Suppose ν is a σ-finite signed measure and
µ is a σ-finite measure. Then there exist unique σ-finite signed measures λ and
ρ, such that λ ⊥ µ, ρ ≪ µ, and ν = λ + ρ.
Proof. (i) We first consider the case where both µ and ν are finite positive
measures. Define
n Z o
F = f : X → [0, ∞] : f dµ ≤ ν(E), ∀E ∈ M
E

Note that 0 ∈ F and hence F is nonempty. For any f, g ∈ F, h = max{f, g} ∈


F: let A = {x : f (x) ≥ g(x)}, then
Z Z Z
h= f+ g ≤ ν(E ∩ A) + ν(E ∩ Ac ) = ν(E).
E E∩A E∩Ac
R
Let m = sup{ X f :R f ∈ F } ≤ ν(X) < ∞, then there exists a sequence
{fk } ⊂ F, such that X fk → m. Define gk (x) = max1≤i≤k fi (x) for all x ∈ X

47
R that gk ∈RF and gRk ↑ f . By Theorem 4.9
and f (x) = supk fk (x). Note R (Beppo
Levi), we know
R m ≤ lim
Rk X kf ≤ limk X kg = X
f ≤ m and hence X
f = m.
Moreover, E f = limk E gk ≤ ν(E) for all E ∈ M, Rand hence f ∈ F.
Now we claim that λ, defined by λ(E) = ν(E) − E f dµ for any E ∈ M (we
write this as dλ = dν − f dµ for short), satisfies λ ⊥ µ. If not, then by Lemma
5.13, there exists ϵ > 0 and A, such that µ(A) > 0 and A is (λ − ϵµ)-positive.
Then for any E ∈ M, there is
Z Z Z
(f +ϵχA ) dµ = f dµ+ (f +ϵχA ) dµ ≤ ν(E∩Ac )+ν(E∩A) = ν(E),
E E∩Ac E∩A

where we used the fact

0 ≤ (λ − ϵµ)(E ∩ A) = λ(E ∩ A) − ϵµ(E ∩ A)


Z Z
= ν(E ∩ A) − f dµ − ϵχA dµ
E∩A E
Z
= ν(E ∩ A) − (f + ϵχA ) dµ
E∩A
R
to obtain the inequality above. Hence f + ϵχA ∈ F. However X (f + ϵχA ) dµ =
m + ϵµ(A) > m, contradiction.
If there exists f ′ , λ′ such that dν = dλ′ +f ′ dµ, then dλ−dλ′ = f dµ−f ′ dµ =
(f − f ′ ) dµ. Hence (λ − λ′ ) ≪ µ. Moreover, since λ, λ′ ⊥ µ, there exist E and
E ′ such that E is λ-null, E ′ is λ′ -null, and E c , (E ′ )c are µ-null. Hence E ∩ E ′
is (λ − λ′ )-null and E c ∪ (E ′ )c is µ-null, which means (λ − λ′ ) ⊥ µ. Therefore,
by Theorem 5.11, λ − λ′ = 0, and hence f − f ′ = 0 µ-a.e.
(ii) Next we consider the case where both µ and ν are σ-finite measures.
Since there exist {Ak } and {Bk } such that X = ∪k Ak = ∪k Bk , µ is finite on
Ak and ν is finite on Bk . Then {Ai ∩ Bj : i, j ∈ N} is countable. Denote this
set by {Ck } (WLOG assume they are disjoint, otherwise take Ck \ (∪k−1 j=1 Cj ) for
all k), then µ and ν are both finite on Ck for any k. Define µk (E) = µ(E ∩ Ck )
and νk (E) = ν(E ∩ Ck ) for any E ∈ M and k. Then applying (i) we know P there
existP unique λk , fk such that dλk = dνk − fk dµk on Ck . Let λ = k λk and
f = k fk . Then it is easy to verify that λ ⊥ µ and ν = λ + f dµ on X.
(iii) Finally consider the general case where ν is σ-finite signed measure. Let
ν = ν + − ν − be the Jordan decomposition of ν, then applying (ii) to each of ν ±
yields unique λ± , f ± such that λ± = ν ± − f ± dµ and λ± ⊥ µ. Let λ = λ+ − λ−
and f = f + − f − , then λ ⊥ µ and ν = λ + f dµ, which completes the proof.
Definition 5.15 (Lebesgue decomposition and Radon-Nikodym derivative).
We call dν = dλ + f dµ from Theorem 5.14 the Lebesgue decomposition of ν
with respect to µ. If ν ≪ µ, then dν = f dµ and f is called the Radon-Nikodym

derivative of ν with respect to µ, denoted by dµ .
Theorem 5.16. Suppose ν is σ-finite signed measure, µ, λ are σ-finite measures
on (X, M), and ν ≪ µ ≪ λ. Then the following statements hold:

48

1. (Change of variable) If g is ν-integrable, then g · dµ is µ-integrable, and

R R
g dν = g · dµ dµ.
dν dν dµ
2. (Chain rule) dλ = dµ · dλ λ-a.e.

Proof. 1. By considering ν ± separately, it suffices to prove the result for positive


measure ν. We first verify the claim for g = χE where E ∈ M:
Z Z Z
dν dν
g dν = ν(E) = dµ = χE dµ.
E dµ dµ

(Note that we identify dν/ dµ with f where dν = f dµ.) Then it is easy to


verify this for g being nonnegative simple functions by linearity, then general
nonnegative functions, and finally for general function g.
2. By Item 1, we have g dµ = g dµ
R R
dλ dλ for all µ-integrable function g.

Then for any E ∈ M, we substitute g by χE dµ , then
Z Z Z
dν dν dµ
ν(E) = χE dν = χE dµ = dλ.
dµ E dµ dλ
dν dν dµ
This implies that dλ = dµ dλ a.e. λ.

5.3 Differentiation
We focus on the case where f : R → R in the remainder of this chapter.
Definition 5.17 (Vitali cover). The collection F of closed intervals is called a
Vitali cover of E if for any ϵ > 0 and any x ∈ E, there exists I ∈ F such that
µ(I) < ϵ and x ∈ I.
Example 5.18. Suppose E = [a, b]. Let {rk } = [a, b] ∩ Q and Ik,m = [rk −
1 1
m , rk + m ] for k, m ∈ N. Then F = {Ik,m k, m ∈ N} is a Vitali cover of E.
:

Lemma 5.19 (Vitali covering lemma). Suppose E ⊂ R and µ∗ (E) < ∞. If F


is a Vitali cover of E, then for any ϵ > 0 there exist a finite number of disjoint
sets {Ij : 1 ≤ j ≤ k} ⊂ F, such that µ∗ (E \ ∪kj=1 Ej ) < ϵ.
Proof. WLOG, we assume F only contains bounded closed intervals. Since
µ∗ (E) < ∞, there exists an open set G such that E ⊂ G and µ(G) < ∞. Since
F is a Vitali cover of E, WLOG we assume I ⊂ G for any I ∈ F.
Now we perform the following interval selection procedure: we first choose
I1 ∈ F arbitrarily. Inductively, suppose we have already chosen I1 , . . . , Ik ∈ F.
If E ⊂ ∪kj=1 Ij , then we can terminate because the claim is proved. Otherwise,
we denote Fk := {I ∈ F : I ∩ (∪kj=1 Ij ) = ∅} and δk := sup{|I| : I ∈ Fk } (here
|I| denotes the length of the interval I for short), and then choose Ik+1 ∈ F
such that |Ik+1 | > δ2k (this is possible since δk is taken as the supremum over
Fk ).

49
If this set selection procedure continues for P∞ infinitely many steps, then we
obtain a sequence of intervals {Ij }∞ j=1 . Since ∞
k=1 |Ik | = µ(∪k=1 Ik ) ≤ µ(G) <
P∞
∞, we know j=k+1 |Ij | → 0 as k → ∞.
P∞
Now let ϵ > 0 be arbitrary and fixed and k large enough so that j=k+1 |Ij | <
ϵ
5 . Denote S
:= E \ (∪kj=1 Ij ). Then we want to show µ∗ (S) < ϵ. To this end,
let x ∈ S be arbitrary, then x ∈ / ∪kj=1 Ij . Notice that ∪kj=1 Ij is a closed set,
we know there exists I ∈ F such that x ∈ I and I ∩ (∪kj=1 Ij ) = ∅. Moreover,
|I| ≤ δk < 2|Ik+1 | due to the criterion to select Ik+1 .
Furthermore, notice that |Ij | → 0 as j → ∞. In addition, I ∩(∪∞ j=k+1 Ij ) ̸= ∅
because otherwise we would have selected I over some Ij during the procedure
(the former has a fixed width while the width of the latter tends to 0 as j → 0).
Let k0 ≥ k + 1 be the smallest index such that I ∩ Ik0 ̸= ∅, then |I| ≤ δk0 −1 <
2|Ik0 |. Now for each j ≥ k + 1 we define Ik′ to be the closed interval with the
same center as Ik but 5 times larger radius, then x ∈ I ⊂ Ik′ 0 −1 . Since x ∈ S is
arbitrary, we know S ⊂ ∪∞ ′ ∗ ∞ ′ ∞
j=k+1 Ik , and µ (S) ≤ µ(∪j=k+1 Ij ) ≤ 5µ(∪j=k+1 Ij ) ≤
P∞
5 j=k+1 |Ij | < ϵ. This completes the proof.
Remarks. Vitali covering lemma can be extended to Rn . It is easy to show that
there exists a countable collection of sets {Ek } such that µ∗ (E \ (∪∞
k=1 Ek )) = 0.

Definition 5.20 (Dini derivatives). Suppose f : R → R, define D± f (x) and


D± f (x) at x ∈ R by
f (x + h) − f (x) f (x + h) − f (x)
D± f (x) = lim sup , D± f (x) = lim inf
h→0± h h→0 ± h

Then D± is called the upper right/left Dini derivative of f at x. Similarly, D±


is called the lower right/left Dini derivative of f at x.
Remarks. Here are several remarks regarding the four Dini derivatives:
• For any f and x, there are D+ f (x) ≤ D+ f (x) and D− f (x) ≤ D− f (x).
• D+ (−f ) = −D+ (f ) and D− (−f ) = −D− (f ).
• If D+ f (x) = D+ f (x), then we say f has right derivative at x. Similarly,
if D− f (x) = D− f (x), then we say f has left derivative at x.
• If all four derivatives are equal, then f is called differentiable at x.
Example 5.21. Suppose a < b and a′ < b′ , and define

2 1 2 1
ax sin ( x ) + bx cos ( x ), x > 0,


f (x) = 0, x = 0,
a′ x sin2 ( 1 ) + b′ x cos2 ( 1 ), x < 0.


x x

Then we can show that


f (h) − f (0) n 1  1 o
D+ f (0) = lim sup = lim sup a sin2 + b cos2 = b.
h→0+ h h→0+ h h

Similarly, D+ f (0) = a, D− f (0) = b′ , D− f (0) = a′ .

50
Example 5.22. If f ∈ C([a, b]), then there exist x0 ∈ (a, b) and k ∈ R, such
that D− f (x0 ) ≥ k ≥ D+ f (x0 ) or D− f (x0 ) ≤ k ≤ D+ f (x0 ).
Proof. Let k = (f (b) − f (a))/(b − a). Consider g(x) = f (x) − kx. Then g ∈
C([a, b]). Note that g(a) = f (a) − ka = (bf (a) − af (b))(b − a) = g(b). Hence
there exists x0 ∈ C such that g(x) attains max or min at x0 ∈ (a, b). If x0 is a
maximizer, then D+ g(x0 ) = D+ f (x0 )−k ≤ 0 and D− g(x0 ) = D− f (x0 )−k ≥ 0,
which implies that D− f (x0 ) ≥ k ≥ D− f (x0 ). Similarly, if x0 is minimizer, then
D− f (x0 ) ≤ k ≤ D+ f (x0 ).
Theorem 5.23 (Lebesgue). Suppose f : [a, b] → R is non-decreasing, then f is
Rb
differentiable a.e. [a, b] and a f ′ (x) dx ≤ f (b) − f (a).
Proof. (i) Note that if D+ f (x) ≤ D− f (x) and D− f (x) ≤ D+ f (x), then all
four Dini derivatives are equal and f is differentiable at x. Hence, if f is not
differentiable at x, then either D+ f (x) > D− f (x) or D− f (x) > D+ f (x). Let
E1 = {x : D+ f (x) > D− f (x)} and E2 = {x : D− f (x) > D+ f (x)}. We then
need to show µ(E1 ∪ E2 ) = 0. To this end, it suffices to show that µ(E1 ) = 0, as
µ(E2 ) = 0 can be proved similarly. Let r, s ∈ Q and Er,s = {x : D+ f (x) > r >
s > D− f (x)}, then E1 = ∪r,s∈Q Er,s . Hence it suffices to show that µ(Er,s ) = 0
for all r, s ∈ Q.
Now we denote E = Er,s for short. For any ϵ > 0, consider an open set G
such that E ⊂ G and µ(G) < µ∗ (E) + ϵ (such G exists due to the definition of
outer measure), and define the collection of closed intervals:

G = {[x − h, x] ⊂ G : x ∈ [a, b], f (x) − f (x − h) < sh for some h > 0}

Thus G is a Vitali cover of E (since x ∈ E implies that D− f (x) < s). Hence
there exist a finite number of disjoint intervals [xP1 − h1 , x1 ], . . . , [xp − hp , xp ],
p
such that µ∗ (E)−ϵ < µ(E ∩(∪pi=1 [xi −hi , xi ])) and i=1 hi ≤ µ(G) < µ∗ (E)+ϵ.
Since f (xi ) − f (xi − hi ) < shi , we have
p
X p
X
(f (xi ) − f (xi − hi )) < s hi < s(µ∗ (E) + ϵ).
i=1 i=1

Now define F = E∩(∪pi=1 (xi −hi , xi )). Consider the collection of closed intervals

F = {[y, y + l] ⊂ F : f (y + l) − f (y) > rl for some l > 0}

Hence F is a Vitali cover of F , and therePexist a finite number of disjoint intervals


q
[y , y + l ], . . . , [yq , yq + lq ], such that j=1 lj > µ(F ) − ϵ > µ∗ (E) − 2ϵ. Hence
P1q 1 1 Pq ∗
j=1 (f (yj + lj ) − f (yj )) > r j=1 lj > r(µ (E) − 2ϵ).
Pp
Since f is non-decreasing
and
Pq j j[y , y + lj ] ⊂ [x i − h i , xi ] for some i, we know i=1 (f (xi ) − f (xi − hi )) ≥
∗ ∗
j=1 (f (y j + lj ) − f (y j )). Hence r(µ (E) − 2ϵ) < s(µ (E) + ϵ). Since ϵ > 0 is
arbitrary, we have rµ∗ (E) ≤ sµ∗ (E), which implies that µ∗ (E) = 0 since r > s.

51
(ii) Consider fk (x) = k(f (x + k1 ) − f (x)). Then fk → f ′ a.e. and
Z b Z b Z b Z b

 1 
f = lim fk ≤ lim inf fk = lim inf k f (x + ) − f (x)
a a k→∞ k→∞ a k→∞ a k
1 1
Z b+ k Z a+ k 
= lim inf k f− f ≤ f (b) − f (a)
k→∞ b a

as k → ∞, where we used Lemma 4.15 (Fatou) to obtain the first inequality


and f is non-decreasing (and constant over [b, b + k1 ]) to obtain the second
inequality.
Remarks. In general we only have the inequality above. For example, let
(
0, 0 ≤ x < 21 ,
f (x) =
1, 12 ≤ x ≤ 1.
R1
Then f ′ = 0 a.e., but 0
f ′ = 0 < 1 = f (1) − f (0).
Theorem 5.24. Suppose fk : [a, b] → R is P non-decreasingPin x for all k, and
′ ′
P
k fk (x) converges for any x ∈ [a, b], then ( k fk (x)) = k fk (x) a.e. [a, b].

Proof. Since fk is non-decreasing, fk′ exists and fk′ ≥ 0 a.e. [a, b] for all k.
Pk P∞
Denote sk (x) = j=1 fk (x) and rk (x) = j=k+1 fj (x). Then both sk and rk
are nondecreasing and hence have derivatives a.e. [a, b]. Note that

X ′ k
X
fk = (sk + rk )′ = s′k + rk′ = fk′ + rk .
k=1 j=1

Hence it suffices to show that rk′ → 0 a.e. as k → ∞. Note that rk′ = fk+1

+
′ ′
rk+1 ≥ rk+1 ≥ 0 a.e. Hence rk′ ↓ ϕ for some ϕ ≥ 0 a.e. [a, b]. Then
Z b Z b Z b
0≤ ϕ= lim rk′ ≤ lim inf rk′ ≤ lim inf (rk (b) − rk (a)) = 0.
a a k→∞ k→∞ a k→∞

where we used Theorem


P∞ 5.23 (Lebesgue) to obtain the last inequality and the
fact that rk (x) = j=k+1 fj (x) → 0 as k → ∞ for every x to obtain the last
equality. Hence ϕ = 0 a.e. [a, b].
Example 5.25. Consider {rk } = [0, 1] ∩ Q. Define
(
0, 0 ≤ x < rk
fk (x) = 1
2k
, rk ≤ x ≤ 1
P∞
and s(x)P= k=1 fk (x). It is then easy to verify that s(x) < s(y) if x < y, and
s′ (x) = k fk′ (x) = 0 a.e. [a, b]. Namely, s is strictly increasing but s′ = 0 a.e.

52
5.4 Functions of bounded variation
Definition 5.26 (Functions of bounded variation). Suppose f : [a, b] → R and
there is a partition ∆ : a = x0 < x1 < · · · < xn = b of [a, b]. Then the variation
of f by partition ∆ is defined
n
X
V(f, [a, b], ∆) = |f (xi ) − f (xi−1 )|.
i=1

The total variation of f on [a, b] is defined by

TV(f, [a, b]) = sup{V(f, [a, b], ∆) : ∆ is a partition of [a, b]}

and f is called a function of bounded variation if TV(f, [a, b]) < ∞. The set
of functions of bounded variation is denoted by BV([a, b]). (We simply denote
TV(f ) if the interval [a, b] is clear from the context.)
Example 5.27. If f : [a, b] → R is monotone, then f ∈ BV([a, b]).
Proof. WLOG, assume f is non-decreasing. Then for any ∆ > 0, there is
n
X
V(f, [a, b], ∆) = |f (xi ) − f (xi−1 )| = f (b) − f (a) < ∞.
i=1

Hence TV(f ) = f (b) − f (a) < ∞, and f ∈ BV([a, b]).


Example 5.28. If f : [a, b] → R, and f is differentiable, and |f ′ | ≤ M for all
x. Then f ∈ BV([a, b]).
Proof. For any partition ∆, there is
n
X n
X
V(f, [a, b], ∆) = |f (xi ) − f (xi−1 )| ≤ M (xi − xi−1 ) = M (b − a).
i=1 i=1

Hence TV(f ) ≤ M (b − a) < ∞.


Example 5.29. Suppose f : [a, b] → R is defined by f (x) = x sin(π/x) if
0 < x ≤ 1 and 0 if x = 0. Then f ∈
/ BV([0, 1]).
2 2 2
Proof. Consider partition ∆k : 0 < 2k−1 < 2k−3 < ··· < 3 < 1. Then

k
2  2 2  2 X 2
V(f, [0, 1], ∆k ) = + + + ··· + = 2 →∞
2k − 1 2k − 1 2k − 3 3 j=1
2j −1

as k → ∞. Hence TV(f ) = ∞.
Theorem 5.30. The following statements hold:
1. If f ∈ BV([a, b]) then f is uniformly bounded.
2. BV([a, b]) is a linear space.

53
3. TV(f, [a, b]) = TV(f, [a, c]) + TV(f, [c, b]) for any c ∈ [a, b].
4. If f ∈ BV([a, b]), then |f | ∈ BV([a, b]).
5. If f, g ∈ BV([a, b]), then max{f, g} ∈ BV([a, b]).
Proof. Items 1 2, and 4 are trivial to prove. For item 5, note that max{f, g} =
f +g |f −g|
2 + 2 and hence it follows from item 4.
For item 3, consider any partition ∆ of [a, b], then ∆′ = ∆ ∪ {c} is also a
partition of [a, b]. Moreover,

V(f, [a, b], ∆) ≤ V(f, [a, b], ∆′ )


= V(f, [a, c], ∆′ ∩ [a, c]) + V(f, [c, b], ∆′ ∩ [c, b])
≤ TV(f, [a, c]) + TV(f, [c, b]).

Hence TV(f, [a, b]) ≤ TV(f, [a, c]) + TV(f, [c, b]).
On the other hand, for any ϵ > 0, there exist partition ∆1 of [a, c] and ∆2
of [c, b], such that
ϵ ϵ
TV(f, [a, c]) − < V(f, [a, c], ∆1 ), TV(f, [c, b]) − < V(f, [c, b], ∆2 )
2 2
Note that ∆ = ∆1 ∪ ∆2 is a partition of [a, b]. Hence

TV(f, [a, c]) + TV(f, [c, b]) − ϵ < V(f, [a, c], ∆1 ) + V(f, [c, b], ∆2 )
= V(f, [a, b], ∆)
≤ TV(f, [a, b])

As ϵ is arbitrary, we know TV(f, [a, c]) + TV(f, [c, b]) ≤ TV(f, [a, b]).
For a partition ∆ : a = x0 < x1 < · · · < xn = b of the interval [a, b], we
can obtain a set of n + 1 points: (x0 , f (x0 )), . . . , (xn , f (xn )) in R2 . We connect
these n + 1 points using straight line segments, and sum the lengths of these
line segments to obtain the total length:
n 
X 1/2
l∆ (f ) = (xi − xi−1 )2 + (f (xi ) − f (xi−1 ))2 .
i=1

Then we can take the supremum of l∆ (f ) over all partitions:

l(f ) = sup{l∆ (f ) : ∆ is a partition of [a, b]}.

The following theorem reveals the relation between TV(f ) and l(f ):
Theorem 5.31. Suppose f : [a, b] → R. Then TV(f ) < ∞ iff l(f ) < ∞.
Pn
Proof. For any partition ∆, there is l∆ (f ) ≤ i=1 |xi − xi−1 | + |f (xi ) − f (xi−1 )|
(because (u2 + v 2 )1/2 ≤ u + v for any u, v ≥ 0). Hence l∆ (f ) ≤ (b − a) +
P n
i=1 |f (xi )−f (xi−1 )|. Therefore V(f, [a, b], ∆) ≤ l∆(f ) ≤ (b−a)+V(f, [a, b], ∆).
As ∆ is arbitrary, we know TV(f, [a, b]) ≤ l(f ) ≤ (b − a) + TV(f, [a, b]), which
implies that TV(f ) < ∞ iff l(f ) < ∞.

54
There is a very elegant characterization of functions of bounded variation:
they can always be written as the differences of two non-decreasing functions,
as shown in the following theorem.
Theorem 5.32 (Jordan). Suppose f : [a, b] → R. Then f ∈ BV([a, b]) iff there
exist two non-decreasing functions g, h : [a, b] → R such that f = g − h.

Proof. First we show the necessity. Suppose f ∈ BV([a, b]). Denote Tf (x) =
TV(f, [a, x]) for any x ∈ [a, b], which is therefore well defined since TV(f ) < ∞.
Then define g(x) = 21 (Tf (x) + f (x)) and h(x) = 21 (Tf (x) − f (x). We can show
that both g and h are non-decreasing: for x < y, there is
1 1
g(y) − g(x) = (Tf (y) + f (y)) − (Tf (x) + f (x))
2 2
1 1
= TV(f, [x, y]) + (f (y) − f (x))
2 2
1 1
≥ V(f, [x, y], ∆) − |f (y) − f (x)| ≥ 0
2 2
where ∆ : x = x0 < x1 < · · · < xn = y is a partition of [x, y]. Similarly h is
non-decreasing, and obviously f = g − h.
Now we show the sufficiency. If g, h are non-decreasing, then g, h ∈ BV([a, b]).
Hence f = g − h ∈ BV([a, b]) as BV([a, b]) is a linear space.
From Theorem 5.23, we know both g and h are differentiable a.e. since they
are monotone. Hence f = g − h is differentiable. Therefore f is differentiable
a.e. if f ∈ BV([a, b]).
R x+h
Lemma 5.33. Suppose f ∈ L([a, b]). Define Fh (x) = h1 x f (t) dt (Assume
Rb
f (x) = f (a) if x < a and f (x) = f (b) if x > b). Then limh→0 a |Fh (x)−f (x)| =
0.

Proof. Since f ∈ L([a, b]), we know for any ϵ > 0, there exists δ > 0, such that
Rb
aR
|f (x + h) − f (x)| < ϵ for any h with |h| < δ. Note that Fh (x) − f (x) =
1 x+h
h x (f (t) − f (x)) dt. Therefore, for any t < h < δ, there is
Z b Z b Z x+h
1
|Fh (x) − f (x)| dx ≤ |f (t) − f (x)| dt dx
a a h x
Z b Z h
1
= |f (x + t) − f (x)| dt dx
a h 0
Z h Z b
1
= |f (x + t) − f (x)| dx dt
0 h a
Z h
1
≤ · ϵ dt = ϵ
0 h

where we applied Theorem 4.51 (Tonelli) to obtain the second equality.

55
Now we have the main theorem of this subsection.
Rx
Theorem 5.34. Let f ∈ L([a, b]). Define F (x) = a f (t) dt. Then F ′ (x) =
f (x) a.e. [a, b].
Proof. Note that F ′ (x) = limh→0 Fh (x) exists a.e. [a, b]. Hence
Z b Z b Z b
|f (x) − F ′ (x)| = lim |f (x) − Fh (x)| dx ≤ lim inf |f (x) − Fh (x)| dx = 0,
a a h→0 h→0 a

which implies that F (x) = f (x) a.e. [a, b].
1
Rh
Corollary 5.35. Suppose f ∈ L([a, b]). Then limh→0 h 0
|f (x+t)−f (x)| dt =
0 a.e. [a, b].
Proof. For any r ∈ Q, we know |f (x) − r| ∈ L([a, b]). Hence, for almost every
x ∈ [a, b], there is
1 h
Z
lim |f (x + h) − r| dt = |f (x) − r|
h→0 h 0
Rh
by Lemma 5.33. Denote Zr = {x : limh→0 h1 0 |f (x + t) − r| dt ̸= |f (x) − r|}.
Then µ(Zr ) = 0. Let Z = (∪r∈Q Zr ) ∪ {x : f (x) = ±∞}, there is also µ(Z) = 0.
Rh
For any x ∈ / Z (i.e., limh→0 h1 0 |f (x+h)−r| dt = |f (x)−r| for all r ∈ Q and
|f (x)| < ∞) and ϵ > 0, there exists r ∈ Q and δ > 0, such that |f (x) − r| < 3ϵ
Rh
and | h1 0 |f (x + t) − r| dt − |f (x) − r|| < 3ϵ for all h with |h| < δ. Hence

1 h 1 h
Z Z
|f (x + t) − f (x)| dt ≤ |f (x + t) − r| dt − |f (x) − r| + 2|f (x) − r|
h 0 h 0
1 Z h
≤ |f (x + t) − r| dt − |f (x) − r| + 2|f (x) − r|

h 0
ϵ ϵ
< + 2 · = ϵ.
3 3
Rh
Therefore limh→0 h1 0 |f (x + t) − f (x)| dt = 0 on Z c .
Rh
Remarks. We call x a Lebesgue point if x satisfies limh→0 h1 0 |f (x + t) −
f (x)| dt = 0. The corollary above says that f has Lebesgue points a.e. [a, b] if
f ∈ L([a, b]). Note that the corollary can also be proved by invoking Lemma
Rh
4.15 (Fatou) on G(h, x) := h1 0 |f (x + t) − f (x)| dt.
Rb
Example 5.36. Suppose f ∈ L(R). For [a, b], if limh→0 h1 a |f (x + h) −
f (x)| dx = 0, then there exists constant c > 0 such that f (x) = c a.e. [a, b].
Proof. Consider any two Lebesgue points x1 , x2 on [a, b] where x1 < x2 . Then
1 Z x2 1 Z x2
(f (x + h) − f (x)) dx ≤ |f (x + h) − f (x)| dx

h x1 h x1

1 b
Z
≤ |f (x + h) − f (x)| dx → 0
h a

56
as h → 0. On the other hand,
1 Z x2 1 Z x2 +h 1 x2
Z
f (x + h) − f (x) dt = f (x) dx − f (x) dx

h x1 h x1 +h h x1

1 Z x2 +h Z x1 +h 
= f (x) dx − f (x) dx

h x2 x1
→ |f (x2 ) − f (x1 )|

as h → 0. Hence f (x2 ) = f (x1 ) = c. Since [a, b] has Lebesgue point a.e., we


know f (x) = c for all Lebesgue point x.
Rx
Example 5.37. Let f ∈ L([a, b]) and F (x) = a f (t) dt. Then F ∈ BV([a, b])
Rb
and TV(F ) ≤ a |f (x)| dx.

Proof. For any partition ∆ : a = x0 < x1 < · · · < xn = b, there is


n
X n Z
X xi X n Z xi Z b
|F (xi ) − F (xi−1 )| = f (t) dt ≤ |f (t)| dt = |f (t)| dt.


i=1 i=1 xi−1 i=1 xi−1 a

Rb Rb
Therefore V(F, [a, b], ∆) ≤ a
|f (x)| dx. Hence, TV(F ) ≤ a
|f | dx.

5.5 Absolute continuity


We would like to ask the following questions: suppose f ∈ R[a, b] → R, then in
x
what case, there exists a function g such that f (x) − f (a) = a g(t) dt for a.e. x
in [a, b]. We have shown before that if such g exists, then f is bounded, has
bounded variation, and is continuous. But is the converse true?

Example 5.38. The Cantor function ϕ is continuous and satisfies ϕ′ (x) = 0


a.e. but ϕ(0) = 0 and ϕ(1) = 1.
So we need stronger condition than continuity. This is called the absolute
continuity.

Lemma 5.39. Suppose f : [a, b] → R, and f ′ = 0 a.e. If f is not constant,


then there exists ϵ0 > 0 such that for any δ > 0P there exist a finite number of
p
mutually disjoint intervals (x1 , y1 ), . . . , (xn , yn ), i=0 |f (ui ) − f (vi )| ≥ ϵ0 .

Proof. Suppose c ∈ (a, b) such that f (c) ̸= f (a). Then choose ϵ0 ∈ (0, |f (c)−f
2
(a)|
)
ϵ0
and r ∈ (0, b−a ). Define the set Ec = {x ∈ (a, c) : f ′ (x) = 0} and the collection
of closed intervals

F = {[x, x + h] ⊂ (a, c) : |f (x + h) − f (x)| < rh for some h > 0}

Hence F is a Vitali cover of Ec . Then for any δ > 0 there exist mutually disjoint
intervals [x1 , x1 + h1 ], . . . , [xp , xp + hp ], such that µ(Ec \ ∪pi=1 [xi , xi + hi ]) < δ.
WLOG, assume a = x0 < x1 < x1 + h1 < · · · < xp < xp + hp < xp+1 = c. Note

57
xi + hi (h0 = 0, u0 − v0 = x1 − x0 ), we have
that by letting ui = xi+1 and vi =P
p
up = xp+1 and vp = xp + hp , and i=1 |ui − vi | < δ.
On the other hand, there is
p
X p
X
2ϵ0 < |f (c) − f (a)| ≤ |f (ui ) − f (vi )| + |f (xi + hi ) − f (xi )|
i=0 i=1
p
X p
X p
X
≤ |f (ui ) − f (vi )| + r hi ≤ |f (ui ) − f (vi )| + r(b − a).
i=0 i=1 i=0
Pp
Note that r(b − a) < ϵ0 , we know i=0 |f (ui ) − f (vi )| ≥ ϵ0 .
Definition 5.40. f : [a, b] → R is absolutely continuous if for any ϵ > 0, there
exists δ > P
0, such that for any mutuallyP disjoint intervals (xi , yi ), i = 1, . . . , p,
p p
satisfying |y
i=0 i − x i | < δ, there is i=1 |f (yi ) − f (xi )| < ϵ. The set of
absolutely continuous functions is denoted by AC([a, b]).
Theorem 5.41. The following statements hold:
1. If f ∈ AC([a, b]) then f is continuous.
2. AC([a, b]) is a linear space.
Example 5.42. If f is Lipschitz continuous then f ∈ AC([a, b]).
Pp Pp
Proof. i=1 |f (yi ) − f (xi )| ≤ M i=1 |yi − xi | ≤ M δ.
Rx
Theorem 5.43. Suppose f ∈ L([a, b]) then F (x) = a f (t) dt ∈ AC([a, b]).
Proof.
R Since f ∈ L([a, b]), we know for any ϵ > 0, there exists δ > 0, such that
E
|f | < ϵ for any E ⊂ [a, b]Psatisfying µ(E) < δ. For any disjoint intervals
p
{(x
Ppi i, y ) : i = 1, . . . , p}, if i=1 |yi − xi | < δ, then µ(E) < δ where E =
[x
i=1 i i , y ]. This implies
p
X p Z
X yi Z
|F (yi ) − F (xi )| ≤ |f (x)| dx = |f (x)| dx < ϵ,
i=1 i=1 xi E

which completes the proof.


Theorem 5.44. If f ∈ AC([a, b]) then f ∈ BV([a, b]).
Pp
Proof. Let ϵ = 1, then there exists δ > 0 such that i=1 |f (yP i ) − f (xi )| < 1 for
p
any mutually disjoint intervals {[xi , yi ] : 1 ≤ i ≤ p} satisfying i=1 |yi −xi | < δ.
(Clearly it is true for p = 1.) Consider the partition ∆ : a = x0 < x1 < · · · <
xn = b where |xP i − xi−1 | < δ for all i, we know that TV(f, [xi−1 , xi ]) < 1. Hence
n
TV(f, [a, b]) = i=1 TV(f, [xi−1 , xi ]) < n < ∞.
Corollary 5.45. If f ∈ AC([a, b]), then f is differentiable a.e. [a, b] and f ′ ∈
L([a, b]).
Theorem 5.46 R x (Fundamental theorem of calculus). If f ∈ AC([a, b]), then
f (x) − f (a) = a f ′ (t) dt for any x ∈ [a, b].

58
Proof. If f ∈ AC([a, b]),
R x then f ′ exists a.e. [a, b] and f ′ ∈ L([a, b]) by Corollary

5.45. Define g(x) = a f (t) dt then g ∈ AC([a, b]) by Theorem 5.43. Since
f − g ∈ AC, f ′ − g ′ = 0 a.e., we know f − g ≡ c for some constant c (otherwise
f − g is not absolutely continuous due to Lemma 5.39, a contradiction). Hence
R x− g(a) = f (a) as g(a) = 0, which implies that f (x) = f (a) + g(x) =
c = f (a)
f (a) + a f ′ (t) dt.
Remarks. The results above can be summarizedR as follows: f ∈ AC([a, b]) iff
x
there exists g ∈ L([a, b]) such that f (x) = f (a) + a g(t) dt for all x ∈ [a, b]. In

this case, f = g a.e. [a, b].
Example 5.47. Suppose gk ∈ AC([a, b]) for all k. If there exists c ∈ [a, b] such
P Rb
converges and k a |gk′ (x)| dx < ∞, then P k gk (x) exists for all
P P
that k gk (c) P
x. Let g(x) = k gk (x), then g ∈ AC([a, b]) and g ′ (x) = k gk′ (x) a.e. [a, b].
P∞ R b
Proof. Since k=1 a |gk′ (x)| dx < ∞, we know by Corollary 4.37 that h(x) =
P∞ ′ P∞ R x ′ Rx
k=1 gk (x) exists, h ∈ L([a, b]), andR k=1 c gk (t) dt = c h(t) dt. Since gk ∈
x
AC([a, b]), we know gk (x) = gk (c) + c gk′ (t) dt for all x. This implies that
n
X n
X n Z
X x ∞
X Z x
gk (x) = gk (c) + gk′ (t) dt → gk (c) + h(t) dt
k=1 k=1 k=1 c k=1 c

P∞ P∞ Rx
as n → ∞ for all x. Therefore g(x) = k=1 gk (x)P = k=1 gk (c) + c h(t) dt

exists and g ∈ AC([a, b]). Moreover g ′ (x) = h(x) = k=1 gk′ .
Example 5.48. Composition of absolutely continuous functions is not neces-
sarily absolutely continuous. For example, let f (y) = y 1/3 for y ∈ [−1, 1], and
(
x3 cos3 ( πx ), if x ∈ (0, 1],
g(x) =
0, if x = 0.

Then both f and g are absolutely continuous (they are Lipschitz continuous as
|f ′ | and |g ′ | are bounded), but (f ◦ g)(x) = x cos( πx ) is not.

Example 5.49. Absolute continuity is not closed under uniform convergence.


Consider the functions
(
0, if 0 ≤ x ≤ k1 ,
fk (x) =
x sin( πx ), if k1 < x ≤ 1,

which are absolutely continuous. Then fk ⇒ f := x sin( πx ). Hence f is uni-


formly continuous, but not of bounded variation, hence not absolutely continu-
ous.

59
6 Lp Spaces
6.1 Important inequalities
Definition 6.1. Let E ∈ M. If p ∈ (0, ∞], then the Lp norm of f on E is
defined by
Z 1/p
∥f ∥p = |f |p .
E

Lp (E) = {f : ∥f ∥p < ∞} is called the Lp space. If p = ∞, the L∞ norm (also


called the essential supremum) of f on E is defined by

∥f ∥∞ = inf{M ∈ R : |f | ≤ M a.e. E}

and L∞ = {f : ∥f ∥∞ < ∞} is the L∞ space.


Theorem 6.2. If µ(E) < ∞, then limp→∞ ∥f ∥p = ∥f ∥∞ .
Proof. Denote M = ∥f ∥∞ . First we have that
Z 1/p Z 1/p
∥f ∥p = |f |p ≤ |M |p = M (µ(E))1/p → M
E E

as p → ∞. Hence lim supp→∞ ∥f ∥p ≤ M .


On the other hand, for any ϵ > 0, let A = {x ∈ E : |f (x)| > M − ϵ}. Then
µ(A) > 0 (by definition of M ). Hence
Z 1/p Z 1/p
∥f ∥p = |f |p ≥ |f |p ≥ (M − ϵ)(µ(A))1/p → M − ϵ
E A

as p → ∞. Hence lim inf p→∞ ∥f ∥p ≥ M − ϵ. As ϵ is arbitrary, we have


limp→∞ ∥f ∥p = ∥f ∥∞ .
Theorem 6.3 (Lp space is linear). Let p ∈ (0, ∞], and f, g ∈ Lp (E), then
αf + βg ∈ Lp (E) for any α, β ∈ R.
Proof. Note that for any u, v ≥ 0, there is

(u + v)p ≤ (2 max(u, v))p = 2p max(up , v p ) ≤ 2p (up + v p ).

If p ∈ (0, ∞), then there is

|αf + βg|p ≤ 2p (|α|p |f |p + |β|p |g|p ),

integrating on both sides shows αf + βg ∈ Lp . If p = ∞, then |αf + βg| ≤


|α|∥f ∥∞ + |β|∥g∥∞ a.e.
We only consider the case p ∈ [1, ∞] hereafter unless otherwise noted.
Definition 6.4 (Conjugate). The two numbers p, q > 1 are called conjugate if
1 1
p + q = 1.

60
p
Note that q = p−1 . If p = 2, then q = 2. If p = 1, then q = ∞.

Theorem 6.5 (Young’s inequality). Let p, q be conjugate. For any u, v ≥ 0,


p q
there is uv ≤ up + vq .
Proof. If either u or v is zero, then trivial. Now suppose both are nonzero. Note
that ex is convex, and p1 + 1q = 1, therefore

1 p
)+ q1 log(v q ) 1 log(up ) 1 log(vq ) up vq
uv = e p log(u ≤ e + e = + ,
p q p q
which completes the proof.
Theorem 6.6 (Hölder’s inequality). For any p ∈ [1, ∞] and q be its conjugate.
If f ∈ Lp (E) and g ∈ Lq (E), then ∥f g∥1 ≤ ∥f ∥p ∥g∥q .
Proof. It is trivial if p or q is ∞. Now suppose p, q ∈ (1, ∞) and ∥f ∥p , ∥g∥q ̸= 0.
Then
|f | |g| 1 |f |p 1 |g|q  1 1
Z Z 
· ≤ p + = + =1
E ∥f ∥p ∥g∥q E p ∥f ∥p q ∥g∥qq p q
where we used Hölder’s inequality above. Multiplying the constant ∥f ∥p ∥g∥q
on both sides yields the inequality.
Corollary 6.7 (Schwarz inequality). If f, g ∈ L2 (E), then ∥f g∥1 ≤ ∥f ∥2 ∥g∥2 .
Theorem 6.8. If µ(E) < ∞ and 0 < p1 < p2 ≤ ∞, then Lp2 (E) ⊂ Lp1 (E) and
1 1
∥f ∥p1 ≤ (µ(E)) p1 − p2 ∥f ∥p2

Proof. The proof is trivial if p2 = ∞. Now suppose 0 < p1 < p2 < ∞. Then
Z 1/p1 Z 1/r Z 1/s 1/p1
∥f ∥p1 = |f |p1 ≤ |f |p1 r 1s ,

p2
where r, s > 1 are conjugate. By choosing r = p1 > 1 and its conjugate
r
s = r−1 = p2p−p
2
1
, we obtain the claimed inequality.

Example 6.9. Suppose f ∈ Lr ∩ Ls where 0 < r < p < s < ∞. Let λ ∈ (0, 1)
such that p1 = λr + 1−λ λ 1−λ
s . Then ∥f ∥p ≤ ∥f ∥r ∥f ∥s .
r s
Proof. Note that λp and (1−λ)p are conjugate. Hence
Z Z
∥f ∥pp = |f |p = |f |λp |f |(1−λ)p
Z r
 λp
r 
Z
s
 (1−λ)p
s
λp· λp (1−λ)p· (1−λ)p
≤ |f | |f |

= ∥f ∥λp (1−λ)p
r ∥f ∥s

Taking p-th root on both sides completes the proof.

61
Example 6.10. Let 0 < r < p < s < ∞ and f ∈ Lp (E). Then for any t > 0,
there exist g, h such that f = g + h, and ∥g∥rr ≤ tr−p ∥f ∥pp and ∥h∥ss ≤ ts−p ∥f ∥pp .
Proof. For any x, define g(x) = f (x) if f (x) > t and g(x) = 0 otherwise. Let
h = f − g. Then, by r − p < 0, there is
Z Z Z
r r r−p p r−p
∥g∥r = |g| = |g| |g| ≤ t |f |p = tr−p ∥f ∥pp .
E {f >t} E

Similarly we can show the inequality for h.


µ
Example 6.11. Suppose fk ∈ L2 ([0, 1]) for all k, fk → 0 in [0, 1], ∥fk ∥2 ≤ 1.
R1
Show that limk 0 |fk | = 0.
µ
Proof. For any ϵ > 0, let Ek (ϵ) = {x ∈ [0, 1] : |fk (x)| ≥ ϵ}. Then fk → 0 implies
that limk Ek (ϵ) = 0. Hence
Z 1 Z Z
0≤ |fk | = |fk | + |fk |
0 [0,1]\Ek (ϵ) Ek (ϵ)
Z Z
≤ ϵ+ χEk (ϵ) |fk |
[0,1]\Ek (ϵ)

≤ ϵ + (µ(Ek (ϵ)))1/2 ∥fk ∥2 → ϵ

as k → ∞, where we used Hölder’s inequality to obtain the R 1last inequality


above and ∥fk ∥2 ≤ 1 to obtain the limit. Hence 0 ≤ lim supk 0 |fk | ≤ ϵ. As ϵ
R1
is arbitrary, we know limk 0 |fk | = 0.
Theorem 6.12 (Minkowski’s inequality). Let p ∈ [1, ∞]. If f, g ∈ Lp , then
∥f + g∥p ≤ ∥f ∥p + ∥g∥p .
p
Proof. The proof is trivial if p = 1 or p = ∞. Suppose p ∈ (1, ∞) and q = p−1
is its conjugate. WLOG we assume ∥f + g∥p > 0. Then
Z Z Z Z
|f + g|p = |f + g|p−1 |f + g| ≤ |f + g|p−1 |f | + |f + g|p−1 |g|.

Now for the first term on the RHS, we have


Z Z 1/q Z 1/p
|f + g|p−1 |f | ≤ |f + g|(p−1)q |f |p = ∥f + g∥pp−1 ∥f ∥p ,

where we used Hölder’s inequality. Similarly, there is |f + g|p−1 |g| ≤ ∥f +


R

g∥p−1
p ∥g∥p . Therefore
Z
∥f + g∥p = |f + g|p ≤ ∥f + g∥pp−1 (∥f ∥p + ∥g∥p ).
p

Dividing both sides by ∥f + g∥p−1


p yields the Minkowski’s inequality.

62
6.2 Lp space
We identify two functions f, g ∈ Lp (E) if f = g a.e. E. Suppose we define
d : Lp (E) × Lp (E) → R by d(f, g) = ∥f − g∥p for any f, g ∈ Lp (E). Then it is
easy to verify that d is a metric: (i) d(f, g) ≥ 0, and d(f, g) = 0 iff f = g a.e. E;
(ii) d(f, g) = d(g, f ); (iii) d(f, g) ≤ d(f, h) + d(h, g) for all f, g, h ∈ Lp (E) by
using Theorem 6.12 (Minkowski).
Definition 6.13. Let p ∈ [1, ∞] and d(f, g) = ∥f − g∥p for any f, g ∈ Lp (E).
Then (Lp (E), d) is a metric space.
Theorem 6.14. If ∥fk − f ∥p → 0, then ∥fk ∥p → ∥f ∥p .

Proof. Note that |∥fk ∥p − ∥f ∥p | ≤ ∥fk − f ∥p by Minkowski’s inequality.


Theorem 6.15 (Lp space is complete). If {fk } is Cauchy in Lp , then there
exists f ∈ Lp such that ∥fk − f ∥p → 0.
Proof. First consider p ∈ [1, ∞). Since ∥fk − fj ∥p → 0 as k, j → ∞, we know
for any ϵ > 0, denote Ek,j (ϵ) = {x ∈ E : |fk (x) − fj (x)| ≥ ϵ}, there is
Z 1/p Z 1/p
ϵ(µ(Ek,j (ϵ))1/p ≤ |fk − fj |p ≤ |fk − fj |p →0
Ek,j (ϵ) E

as k, j → ∞. Hence µ(Ek,j (ϵ)) → 0 as k, j → ∞. Therefore {fk } is Cauchy in


measure, which implies that there exists a subsequence {fkj } and f such that
fkj → f a.e. E as j → ∞. Therefore
Z Z Z
p p
|fk − f | = lim |fk − fkj | ≤ lim inf |fk − fkj |p
E E j→∞ j→∞ E

Taking limit k → ∞ on both sides yields ∥fk − f ∥p → 0. Moreover, ∥f ∥p ≤


∥fk − f ∥p + ∥fk ∥p < ∞, and hence f ∈ Lp (E).
Next consider p = ∞. Since ∥fk − fj ∥∞ → 0, there exists Z ⊂ E, such that
µ(Z) = 0 and fk (x) − fj (x) → 0 as k, j → ∞ on E \ Z. Let f (x) = limk fk (x)
for x ∈ E \ Z and arbitrary on Z. For any ϵ > 0, there exists K sufficiently
large, such that

|fk (x) − f (x)| = lim |fk (x) − fj (x)| ≤ lim ∥fk − fj ∥∞ ≤ ϵ


j→∞ j→∞

for all k ≥ K and x ∈ E \ Z. Hence ∥fk − f ∥∞ ≤ ϵ. In addition, ∥f ∥∞ ≤


∥fk − f ∥∞ + ∥fk ∥∞ < ∞, hence f ∈ L∞ (E).

Definition 6.16. A metric space (X, d) is called seperable if X contains a


countable dense subset. Namely, X has a countable subset Y , such that for any
x ∈ X and ϵ > 0, there exists y ∈ Y that satisfies d(x, y) < ϵ.
Lemma 6.17. Let p ∈ [1, ∞) and f ∈ Lp (E), then for any ϵ > 0, the following
statements hold:

63
R g : E → R which is continuous and has compact support,
1. There exists
such that E |f − g|p < ϵ.
2. There exists a simple function ϕ : E → R which is of form ϕ(x) =
Pk
i=1 ci χAi where every Ai is a finiteR union of open boxes on regular grids
and has compact support, such that E |f − ϕ|p < ϵ.
Proof. Proof of Item 1 is similar to that of Theorem 4.40. For Item 2, note that
the tolerance ϵ allows approximating f by such type of simple function ϕ.
Theorem 6.18. Suppose p ∈ [1, ∞). Then Lp space is separable.
Proof. (i) Suppose E = Rn . Then for any f ∈ Lp (E) and ϵ > 0, there exists a
Pk
simple function ϕ = i=1 ci χAi such that ∥f − ϕ∥p < ϵ/2. Hence there exists
M > 0, such that |ci | ≤ M and µ(Ai ) < M p for all i ≤ k. Note that there exists
Pk
ri ∈ Q such that |ci − ri | < ϵ/(2kM ) for every i ≤ k. Let ψ = i=1 ri χAi , then

Xk k
X k
X
∥ϕ − ψ∥p = ci χAi − ri χAi ≤ |ci − ri |∥χAi ∥p

i=1 i=1 p i=1
k
X ϵ ϵ
= |ci − ri |µ(Ai )1/p ≤ k · ·M = .
i=1
2kM 2

Hence ∥f − ψ∥p ≤ ∥f − ϕ∥p + ∥ϕ − ψ∥p < ϵ. Note that the set Γ = {ψ =


Pk p n
i=1 ri χAi : r ∈ Q} is a countable, hence Γ is a countable dense set of L (R ).
n
(ii) For general E ⊂ R , consider g(x) = f (x) if x ∈ E and g(x) = 0
otherwise. Then g : Rn → R. By (i), there exists a simple function ψ ∈ Γ such
that Rn |g − ψ|p < ϵ. Hence
R

Z Z Z
p p
|f − ψ| = |g − ψ| ≤ |g − ψ|p < ϵ
E E Rn

which also implies that Γ is dense in Lp (E).


p n
R
Example 6.19. Let R p ∈ [1, ∞) and f ∈ L (R ). Show that lim|t|→∞ Rn
|f (x)+
f (x − t)|p dx = 2 Rn |f (x)|p dx.
Proof. For any ϵ > 0, consider the decomposition f = g + h where g is a
continuous function with compact support and h = f − g such that ∥h∥p < ϵ/4.
For notation simplicity, we denote ft (x) = f (x − t), gt (x) = g(x − t), and
ht (x) = h(x − t) for any fixed t. Since g has compact support, we know that
the supports of g(x) and gt (x) do not overlap if |t| is sufficiently large, which
implies that
Z Z Z
∥g + gt ∥pp = |g + gt |p = (|g|p + |gt |p ) = 2 |g|p = 2∥g∥pp
Rn Rn Rn

64
Hence we have

∥f + ft ∥p − 21/p ∥f ∥p ≤ ∥f + ft ∥p − 21/p ∥g∥p + 21/p ∥g∥p − ∥f ∥p


= ∥f + ft ∥p − ∥g + gt ∥p + 21/p ∥g∥p − ∥f ∥p

ϵ ϵ ϵ
≤ ∥h + ht ∥p + 21/p ∥h∥p < + + 21/p · < ϵ,
4 4 4
which completes the proof.

6.3 L2 space and inner product


Definition 6.20 (Inner product). Let f, g ∈ L2 (E), then the inner product of
f and g is defined by Z
⟨f, g⟩ = f g.
E
R
Note that |⟨f, g⟩| ≤ E
|f g| ≤ ∥f ∥2 ∥g∥2 < ∞.
It is easy to verify that the following identities hold:
• ⟨f, g⟩ = ⟨g, f ⟩.
• ⟨f1 + f2 , g⟩ = ⟨f1 , g⟩ + ⟨f2 , g⟩.
• ⟨αf, g⟩ = α⟨f, g⟩ = ⟨f, αg⟩ for all α ∈ R.
Example 6.21. Suppose f, g ∈ L2 then 2∥f g∥1 ≤ t∥f ∥22 + 1t ∥g∥22 for all t > 0.
√ 2 2
Proof. Note that |f g| = t|f | · √1t |g| ≤ t|f2| + |g|
2t by Young’s inequality.

Example 6.22. Suppose f : [0, ∞) → R+ is integrable, then


Z ∞ 4 Z ∞  Z ∞ 
f dx = π2 f 2 dx x2 f 2 dx .
0 0 0

1
R∞ R∞
Proof. Recall that for any α, β > 0, there is 0 α+βx 2 dx =
√1 1
dy =
αβ 0 1+y 2
q
√1 π using the change of variable y = β
αβ 2 α x. Therefore

Z ∞ 2 Z ∞
1 p 2
f dx = p · α + βx2 f dx
0 α + βx2 0
ZZ ∞∞
1
≤ dx · (α + βx2 )f 2 dx
0 α + βx2 0
Z ∞ Z ∞
π 1  2

= √ α f dx + β x2 f 2 dx
2 αβ 0 0
R∞ 2 2 R∞ 2
Letting α = 0 x f dx and β = 0 f dx yields the claimed inequality.
Theorem 6.23. If ∥fk − f ∥2 → 0, then ⟨fk , g⟩ → ⟨f, g⟩ for all g ∈ L2 .
Proof. Note |⟨fk , g⟩ − ⟨f, g⟩| = |⟨fk − f, g⟩| ≤ ∥fk − f ∥2 ∥g∥2 → 0.

65
Definition 6.24. f, g ∈ L2 is called orthogonal if ⟨f, g⟩ = 0. {ϕα : α ∈ A} is
called an orthogonal set if ⟨ϕα , ϕβ ⟩ = 0 for all distinct α, β ∈ A. If in addition
∥ϕα ∥2 = 1, then {ϕα } is called an orthonormal set.
Example 6.25. { √12π , √1π cos(kx), √1π sin(kx) : k ∈ N} is an orthonormal set
of L2 ([−π, π]).
Theorem 6.26. An orthonormal set of L2 (E) is at most countable.
Proof. Suppose {ϕα ∈ L2 (E) : α ∈ A} is an orthonormal set. Then for any
distinct α, β ∈ A, there is

∥ϕα − ϕβ ∥22 = ∥ϕα ∥22 + ∥ϕβ ∥22 = 2.

Since L2 (E) is separable, there exists a countable dense set Γ ⊂√L2 (E) such
that for any α ∈ A, there exists xα ∈ Γ satisfying ∥xα − ϕα ∥2 < 2/2. Hence
|A| ≤ |Γ| = ℵ0 .

Example 6.27 (Parallelogram law). Suppose f, g ∈ L2 , then ∥f + g∥22 + ∥f −


g∥22 = 2(∥f ∥22 + ∥g∥22 ).
Definition 6.28 (Generalized Fourier series). Suppose {ϕk } is an orthonormal
set of L2 . For any f ∈ L2 , let ck = ⟨f, ϕk ⟩ for any k ∈ N. Then P∞ {ck } are
called the generalized Fourier coefficients of f under {ϕk } and k=1 ck ϕk is
generalized Fourier series of f .
Pk
Theorem 6.29. For any fixed k, let Fk = { i=1 ai ϕi : ai ∈ R}. Then fk =
Pk
i=1 ci ϕi , where ci = ⟨f, ϕi ⟩ for every i, uniquely minimizes ∥f − g∥2 among
all g ∈ Fk .
Pk
Proof. For any fk = i=1 ai ϕi ∈ Fk , there is

∥f − fk ∥22 = ∥f ∥22 − 2⟨f, fk ⟩ + ∥fk ∥22


k
X k
X
= ∥f ∥22 − 2 ai ⟨f, ϕi ⟩ + a2i
i=1 i=1
k
X k
X
= ∥f ∥22 − 2 ai ci + a2i
i=1 i=1
k
X k
X
= ∥f ∥22 + 2 |ai − ci |2 − c2i
i=1 i=1

which is minimized only if ai = ci for all i.


2
Theorem 6.30 (Bessel inequality). Let {ϕk } be an orthonormal set in
PL . Sup-

pose f ∈ L and {ck } is the generalized Fourier coefficients. Then k=1 c2k ≤
2

∥f ∥22 .

66
Pk Pk
Proof. For any fk = i=1 ci ϕi , 0 ≤ ∥f − fk ∥2 = ∥f ∥22 − i=1 c2i . Hence
Pk 2 2
P∞ 2 2
i=1 ci ≤ ∥f ∥2 for all k, which implies that k=1 ck ≤ ∥f ∥2 .

Lemma 6.31. Suppose {ϕk } is an orthonormal set of L2 and f ∈ L2 . If


Pk
fk = i=1 ci ϕi where ci = ⟨f, ϕi ⟩, then ⟨f − fk , fk ⟩ = 0.
Pk Pk
Proof. Note that ⟨f, fk ⟩ = i=1 ci ⟨f, ϕi ⟩ = i=1 c2i = ⟨fk , fk ⟩.
Theorem 6.32 (Riesz-Fischer). Suppose {ϕk } is an orthonormal set of L2 . If
P ∞ 2 2
k=1 ck < ∞. Then there exists g ∈ L , such that ⟨g, ϕk ⟩ = ck for all k.
Pk
Proof. Define sk = i=1 ci ϕi , then sk ∈ L2 . Note that, for any l ∈ N,
k+l
X k+l
X
∥sk+l − sk ∥22 = ∥ ci ϕi ∥22 = c2i → 0
i=k+1 i=k+1

as k → ∞. Hence {sk } is Cauchy in L2 , and there exists g ∈ L2 such that


Pk
∥sk − g∥2 → 0 as k → ∞. Let ai = ⟨g, ϕi ⟩ for all i ∈ N, gk = i=1 ai ϕi , and
hk = g − gk , then
k
X
|ci −ai |2 = ∥sk −gk ∥22 ≤ ∥sk −gk ∥22 +∥hk ∥22 = ∥sk −gk −hk ∥22 = ∥sk −g∥22 → 0
i=1

where we used the fact that ⟨gk , hk ⟩ = 0 from Lemma 6.31 to show that ⟨sk −
gk , hk ⟩ = 0 and obtained the second equality. Hence ai = ci for all i.
Definition 6.33 (Complete orthonormal basis). We call {ϕk } a complete or-
thonormal basis if {ϕk } is an orthonormal set, and ⟨f, ϕk ⟩ = 0 for all k implies
f = 0 a.e.
Theorem 6.34. Suppose {ϕk } is a complete orthonormal basis in L2 . Let
Pk
f ∈ L2 , ck = ⟨f, ϕk ⟩ for all k, then limk ∥ i=1 ci ϕi − f ∥2 = 0.
P∞
Proof. By Theorem 6.30 (Bessel’s inequality), we know k=1 c2k ≤ P ∥f ∥22 < ∞.
2 ∞
By Theorem 6.32 (Riesz-Fischer), there exists g ∈ L such that g = k=1 ck ϕk ,
Pk
and ∥ i=1 ci ϕi − g∥2 → 0. Note that ⟨f − g, ϕk ⟩ = ⟨f, ϕk ⟩ − ⟨g, ϕk ⟩ = 0 for all
Pk
k, we know f = g a.e. Hence ∥ i=1 ci ϕi − f ∥2 → 0.
Definition 6.35 (Linear independency). {ϕi : 1 ≤ i ≤ k} is called linearly
Pk
independent if i=1 ci ϕi = 0 implies ci = 0 for all i ≤ k. {ϕk : k ∈ N} is called
linearly independent if any finite subset is linearly independent. It is obvious
that 0 cannot be in a linearly independent set.
Example 6.36. If {ϕk } is an orthonormal set in L2 , then it is linearly inde-
pendent.
Pk
Proof. Suppose i=1 ci ϕi = 0, then multiplying both sides by ϕi yields ci ∥ϕi ∥22 =
0, which implies that ci = 0, for every i.

67
Example 6.37 (Gram-Schmidt). If {ψk : k ∈ N} is a linearly independent set,
then we can construct an orthonormal set {ϕk : k ∈ N}: define ϕ1 = ψ1 /∥ψ1 ∥2 ;
suppose we already have constructed ϕ1 , . . . , ϕk−1 , then define ϕk = (ψk −
Pk−1 Pk−1
i=1 ⟨ψk , ϕi ⟩ϕi )/∥ψk − i=1 ⟨ψk , ϕi ⟩ϕi ∥2 . It is easy to verify that {ϕk } is an
orthonormal set.
Theorem 6.38. Suppose {ϕi : i ∈ N} is an orthonormal set in L2 . If for any
f ∈ L2 and ϵ > 0, there exists a finite subset {ϕij : 1 ≤ j ≤ k} such that
Pk
∥f − j=1 cj ϕij ∥2 < ϵ, then {ϕi } is complete.

Proof. Assume {ϕi } is not complete. Then there exists nonzero f ∈ L2 such that
⟨f, ϕi ⟩ = 0 for all i. On the one hand, there exist a finite subset {ϕij : 1 ≤ j ≤ k}
Pk
such that ∥f − j=1 cj ϕij ∥2 < ∥f ∥2 /2. Moreover,

k k
X X ∥f ∥22
|⟨f, f − cj ϕij ⟩| ≤ ∥f ∥2 · ∥f − cj ϕij ∥2 ≤ .
j=1 j=1
2

On the other hand, there is


k
X k
X
⟨f, f − cj ϕij ⟩ = ∥f ∥22 − ⟨f, cj ϕij ⟩ = ∥f ∥22 ,

j=1 j=1

which is a contradiction.

6.4 Dual space of Lp


Theorem 6.39. Let pR ∈ [1, ∞) and f ∈ Lp (E). Then there exists g ∈ Lq (E),
∥g∥q = 1, and ∥f ∥p = E f g.
Proof. (i) First consider p = 1. Then letting g = sign(f ) proves the claim.
p
(ii) Next consider p ∈ (1, ∞). Let q = p−1 be the conjugate, and define
|f |p−1
g = sign(f ) · ∥f ∥p−1
. Then (p − 1)q = p and
p

Z Z
1
|g|q = |f |p = 1,
E ∥f ∥pp E
|f |p ∥f ∥pp
Z Z
fg = = = ∥f ∥p ,
E E ∥f ∥pp−1 ∥f ∥pp−1
which prove the claim.
Remarks. Note that Hölder’s inequality implies that ∥f g∥1 ≤ ∥f ∥p ∥g∥q , and
hence ∥f ∥p ≥ supg∈Lq ∥f g∥1
∥g∥q = sup∥g∥q =1 ∥f g∥1 . The theorem above implies
that the supremum can be replaced by maximum, and shows the maximizer for
p ∈ [1, ∞).
Theorem 6.40. Suppose f ∈ L∞ (E), then ∥f ∥∞ = sup∥g∥1 =1 | E f g|.
R

68
R R
Proof. Note that, if ∥g∥1 R= 1, then | E f g| ≤ E |f g| ≤ ∥f ∥∞ ∥g∥1 = ∥f ∥∞ .
Hence ∥f ∥∞ ≥ sup∥g∥1 =1 | E f g|.
On the other hand, let M = ∥f ∥∞ . Then for any ϵ > 0, there exists A ⊂ E,
such that µ(A) = a > 0 and |f | > M − ϵ on A. Define g = sign(f )χA /a, then
Z 1Z 1
f g = |f | ≥ (M − ϵ) · a · = M − ϵ,

E a A a
R
where ∥g∥1 = 1. Hence M − ϵ ≤ sup∥g∥1 =1 | E f g| ≤ M . As ϵ is arbitrary, we
R
know ∥f ∥∞ = sup∥g∥1 =1 | E f g|.
Example 6.41. We cannot replace the supremum by maximum in the theorem
above: consider E = [0, 1] and f (x) = x, then ∥f ∥∞ = 1 but for any g ∈ L1 and
R1 R1 R1
∥g∥1 = 1, there is | 0 f g dx| ≤ 0 x|g(x)| dx < 1 (otherwise 0 (1−x)|g(x)| dx =
0 implies g = 0 a.e., a contradiction).
Definition 6.42 (Dual space of Lp ). We call Lq the dual space of Lp if q is the
conjugate of p.
Theorem 6.43. Suppose g : E → R is a measurable function. Let p ∈ [1, ∞]
and q be itsRconjugate. If there exists M > 0 such that for any simple function
ϕ there is | E gϕ| ≤ M ∥ϕ∥p , then g ∈ Lq and ∥g∥q ≤ M .
Proof. (i) First consider p ∈ (1, ∞). Let {ψk } be a sequence of simple functions
1
such that ψk ↑ |g| p−1 and ϕk = sign(g)ψk . Then
Z Z Z Z
p
gϕk → |g| p−1 = |g|q and gϕk ≤ M ∥ϕk ∥p = M ∥g∥q/pq
E E E E

from which we can obtain ∥g∥q ≤ M . R R


(ii) Now consider p = ∞. Let ϕ = sign(g) then E gϕ = E |g| ≤ M ∥ϕ∥∞ =
M , i.e., ∥g∥1 ≤ M .
(iii) Next consider p = 1. WLOG assume g ≥ 0, a.e. If g ∈ / L∞ (E), then
for any k ∈ N let Ak = {x ∈ E : g(x) ≥ k}. Then Ak is non-increasing, and
µ(Ak ) > 0 for all k. Let ϕk = χAk , then
Z Z
kµ(Ak ) ≤ g= gϕk ≤ M ∥ϕk ∥1 = M µ(Ak )
Ak E

which implies that M ≥ k for all k ∈ N, contradiction. Therefore g ∈ L∞ (E).


Now we need to show ∥g∥∞ ≤ M . If not, then ∥g∥∞ = M ′ > M . Let
ϵ = (M ′ − M )/2, then there exists A ⊂ E such that µ(A) = a > 0 and
|g(x)| ≥ M +ϵ for all x ∈ A since ∥g∥∞ = M ′ > M +ϵ. Let ϕ(x) = sign(g)χA /a,
then ∥ϕ∥1 = 1 and
Z Z
1
gϕ = |g| ≥ M + ϵ = (M + ϵ)∥ϕ∥1 ,
E a A

which is a contradiction. Hence ∥g∥∞ ≤ M .

69
Theorem 6.44 (Generalized Minkowski’s inequality). Suppose p ∈ [1, ∞) and
n n
fR : R R × R → pR. If for almost every y ∈ Rn , f (x, y) ∈ Lp (Rn ), and M =
1/p
(
Rn Rn
|f (x, y)| dx) dy < ∞, then
Z Z p 1/p Z Z 1/p
f (x, y) dy dx ≤M = |f (x, y)|p dx dy.


Rn Rn Rn Rn
R
Proof. Proof is trivial for p = 1. For p ∈ (1, ∞), let F (x) = Rn f (x, y) dy.
Then for any simple function ϕ, there is
Z Z Z Z 
F (x)ϕ(x) dx ≤ |F (x)||ϕ(x)| dx ≤ |f (x, y)| dy |ϕ(x)| dx

Z Z 
= |f (x, y)||ϕ(x)| dx dy
Z Z 1/p
≤ |f (x, y)|p dx dy∥ϕ∥q = M ∥ϕ∥q ,

where we applied Theorem 4.51 (Tonelli) to obtain the first equality and Hölder’s
inequality to obtain the last inequality. Hence F (x) ∈ Lp and ∥F ∥p ≤ M .
Applying Theorem 6.43 yields the claimed inequality.
Example 6.45 (Reduction to Minkowski’s inequality). Suppose f, g ∈ Lp (R).
Define the function h : R × [0, 2] → R by
(
f (x) if 0 ≤ y ≤ 1,
h(x, y) =
g(x) if 1 < y ≤ 2.
R2
Then 0
h(x, y) dy = f (x) + g(x) and
(
Z 1/p ∥f ∥p if 0 ≤ y ≤ 1,
p
|h(x, y)| dx =
∥g∥p if 1 < y ≤ 2.

Hence the generalized Minkowski’s inequality implies ∥f + g∥p ≤ ∥f ∥p + ∥g∥p .


Example 6.46. Define the function f : (−∞, ∞) × [0, 2] → R by
(
ak , if k ≤ x < k + 1, 0 ≤ y ≤ 1,
f (x, y) =
bk , if k ≤ x < k + 1, 1 < y ≤ 2,
R2
where ak , bk ≥ 0 for all k. Then 0 f (x, y) dy = ak + bk and
(P
Z ∞ 1/p ∞
p ( |ak |p )1/p , if k ≤ x < k + 1, 0 ≤ y ≤ 1,
|f (x, y)| dx = Pk=1∞
0 ( k=1 |bk |p )1/p , if k ≤ x < k + 1, 1 < y ≤ 2.
P∞
Hence
P∞ the generalized P∞Minkowski’s inequality implies ( k=1 |ak + bk |p )1/p ≤
( k=1 |ak |p )1/p + ( k=1 |bk |p )1/p .

70
7 Probability Theory
7.1 Basic concepts
Example 7.1 (Terms in measure theory vs probability theory). See Table 1.

Table 1: Terminology correspondences in measure theory and probability theory

Measure theory Probability theory


Measure space (X, M, µ) Probability space (Ω, F, P) (P(Ω) = 1)
σ-algebra M σ-field F
Measurable set E ∈ M Event E ∈ F
Measurable real-valued function f Random variable X
Measure Ron R induced by f Probability distribution
R PX
Integral X f (x) dµ(x) Expectation E(X) = Ω X(ω) dP(ω)
f ∈ Lp X has finite pth moment
Convergence in measure Convergence in probability
Almost everywhere (a.e.) Almost surely (a.s.)
Borel probability measure Distribution
Fourier transform of a measure Characteristic function of PX
Laplace transform of a measure Moment generating function of PX

Definition 7.2 (Expectation and variance). Suppose X is a random


R variable
on (Ω, F, P), then the expectation of X is defined by E(X) = Ω X(ω) dP(ω),
and the variance of X is defined by V(X) = E[(X − E(X))2 ]. Note that is is
easy to verify that V(X) = E(X 2 ) − (E(X))2 .
Definition 7.3 (Image measure). Suppose (Ω, F, P) is a probability space, and
(Ω′ , F ′ ) is a measure space. Let ϕ : Ω → Ω′ be a measurable function, i.e.,
ϕ−1 (E) ∈ F if E ∈ F ′ . Then ϕ induces a probability measure on (Ω′ , F ′ ),
called image measure, defined by Pϕ (E) = P(ϕ−1 (E)) for all E ∈ F ′ .
7.4. Suppose f : F ′ → R is a measurable function, then Ω′ f ′ dPϕ =
R
Theorem
R

f ◦ ϕ dP .
Proof. Let E ∈ F ′ and f = χE : F ′ → R. Note that for any ω ∈ Ω there is
χE (ϕ(ω)) = 1 iff ϕ(ω) ∈ E iff ω ∈ ϕ−1 (E), i.e., χE ◦ ϕ = χϕ−1 (E) : Ω → R.
Hence
Z Z Z
f dPϕ = χE dPϕ = dPϕ = Pϕ (E) = P(ϕ−1 (E))
Ω′ Ω′ E
Z Z Z
= dP = χϕ−1 (E) dP = χE ◦ ϕ dP.
ϕ−1 (E) Ω Ω

Therefore the identity holds for f = χE . It is straightforward to show that it


holds for simple functions by linearity. Taking limit of a sequence of simple func-
tions and applying Theorem 4.28 (DCT) prove the claim for general measurable
functions.

71
Definition 7.5 (Distribution). Suppose X is a random variable on (Ω, F, P).
Let PX be the image measure of X on R, called the distribution of X. The
function F (t) = PX ((−∞, t)) = P(X < t) is called the distribution function of
X. A family of random variables {Xα : α ∈ A} is called identically distributed
if their image measures {PXα : α ∈ A} are identical.
Definition 7.6 (Joint distribution). Suppose {Xk : 1 ≤ k ≤ n} are random
variables on (Ω, F, P). Then (X1 , . . . , Xn ) : Ω → Rn , and the image measure
PX1 ,...,Xn is called the joint distribution of X1 , . . . , Xn .
Remarks. The behaviors of random variables are completely determined by
their (joint) distributions. Therefore, we often use
Z Z Z
E(X) = X(ω) dP(ω) = t dPX (t), V(X) = (t − E(X))2 dPX (t).
Ω R R

We also use Z
E(X + Y ) = (t + s) dPX,Y (t, s).
R2

Definition 7.7 (Independency). Suppose (Ω, F, P) is a probability space. A


set of events {Eα : α ∈ A} are called independent if for any finite subset of
distinct events {Eαk : 1 ≤ k ≤ n, αk ∈ A} there is
n
Y
P(Eα1 ∩ · · · ∩ Eαn ) = P(Eαk ).
k=1

A set of random variables {Xα : α ∈ A} are called independent if the events


{Xα−1 (Bα ) : α ∈ A} are independent. Note that this is different from and
stronger than pairwise independency. An alternative definition of independent
random variables is that for any finite subset of these random variables, say
X1 , . . . , Xn , which are distinct, there is
n
Y
PX1 ,...,Xn (B1 × · · · × Bn ) = PXk (Bk ).
k=1

We can see this because on the one hand we have

P(X1−1 (B1 ) ∩ · · · ∩ Xn−1 (Bn )) = P((X1 , . . . , Xn )−1 (B1 × · · · × Bn ))


= PX1 ,...,Xn (B1 × · · · × Bn )

and on the other hand we have


n
Y n
Y n
Y
P(Xk−1 (Bk )) = PXk (Bk ) = ( PXk )(B1 × · · · × Bn ).
k=1 k=1 k=1

Hence X1 , . . . , XQ
n are independent iff the two quantities above are identical,
n
i.e., PX1 ,...,Xn = k=1 PXk .

72
Theorem 7.8. Suppose X1 , . . . , Xn are independent random variables, and fk :
R → R are measurable, then f1 (X1 ), . . . , fn (Xn ) are also independent.
Proof. Let Yk = fk (Xk ). For any Bk ∈ F, there is

PY1 ,...,Yn (B1 × · · · × Bn ) = PX1 ,...,Xn (f1−1 (B1 ) × · · · × fn−1 (Bn ))


Yn Yn
= PXk (fk−1 (Bk )) = PYk (Bk ),
k=1 k=1

which completes the proof.

Theorem
Qn 7.9. Suppose k ≤ n} are independent and Xk ∈ L1 , then
Qn{Xk : 1 ≤Q
1 n
k=1 Xk ∈ L and E( k=1 Xk ) = k=1 E(Xk ).

Proof. Note that


n
Y n
 Z Y n
Z Y
E |Xk | = |Xk | dPX1 ,...,Xn = |Xk | dPX1 . . . dPXn
k=1 k=1 k=1
n Z
Y n
Y
= |Xk | dPXk = E(|Xk |) < ∞,
k=1 k=1
Qn 1
which implies
Qn that Q k=1 Xk ∈ L . Remove the absolute values and redo this to
n
show E( k=1 Xk ) = k=1 E(Xk ).
Theorem
Pn 7.10.PSuppose {Xk : 1 ≤ k ≤ n} are independent and Xk ∈ L2 , then
n
V( k=1 Xk ) = k=1 V(Xk ).
Proof. Let Yk = Xk − E(Xk ). Then Y1 , . . . , Yn are independent, E(Yk ) = 0, and
E(Yk2 ) = V(Xk . Moreover E(Yk Yj ) = E(Yk )E(Yj ) = 0 whenever k ̸= j since
they are independent. Therefore
n
h X n
X 2 i n
h X 2 i
V(X1 + · · · + Xn ) = E Xk − E(Xk ) =E Yk
k=1 k=1 k=1
n
X Xn n
X
= E(Yk Yj ) = E(Yk2 ) = V(Xk ),
k,j=1 k=1 k=1

which completes the proof.

7.2 The law of large numbers


Theorem 7.11 (Chebyshev’s inequality). Suppose X is a random variable with
mean E(X) and variance V(X). Then for any ϵ > 0, there is

V(X)
P(|X − E(X)| ≥ ϵ) < .
ϵ2

73
Proof. Note that
2
t − E(X) 
Z Z 
V(X)
P(|X − E(X)| ≥ ϵ) = dP ≤ dPX (t) = ,
|X−E(X)|≥ϵ R ϵ ϵ2

which proves the claim.


Theorem 7.12 (Weak law of large numbers). Suppose {Xk : k ∈P N} are inde-
n
pendent random variables with means µk and variances σk2 . If n1 k=1 σk2 → 0
as n → ∞, then for any ϵ > 0,
n
 1 X n
 1 1X 2
P (Xk − µk ) ≥ ϵ ≤ 2 σk → 0.

n ϵ n
k=1 k=1
Pn
Proof. Applying Theorem 7.11 (Chebyshev) to n1 k=1 (Xk − µk ), which has
n
mean 0 and variance n1 k=1 σk2 , to obtain the claimed inequality.
P

Theorem 7.13 (Borel-Cantelli). Suppose (Ω, F, P) is a probability space, and


{Ek : k ∈
PN} are events. Then

1. If k=1 P(Ek ) < ∞, then P(lim
P∞supk Ek ) = 0.
2. If {Ek } are independent and k=1 P(Ek ) = ∞, then P(lim supk Ek ) = 1.
Proof. Item 1 can be easily verified: as k → ∞, there is

∞ ∞ ∞ X
P(lim sup Ek ) = P( ∩ ∪ Ej ) ≤ P( ∪ Ej ) ≤ P(Ej ) → 0.
k→∞ k=1 j=k j=k
j=k

For Item 2, we know {Ekc } are independent since {Ek } are so. Hence
n n n
n Y Y Y Pn
P( ∪ Ejc ) = P(Ejc ) = (1 − P(Ej )) ≤ e−P(Ej ) = e− j=k P(Ej )
→0
j=k
j=k j=k j=k
P∞
as n → ∞. Hence P(lim inf k Ekc ) = P(∪∞ ∞ c
k=1 ∩j=k Ej ) ≤
∞ c
k=1 P(∩j=k Ej ) = 0,
c c
which implies that P(lim supk Ek ) = P((lim inf k Ek ) ) = 1.
Theorem 7.14 (Kolmogorov’s inequality). Suppose {Xk : 1 ≤ k ≤ n} are
independent random variables with mean 0 and variances σk2 for all k. Let
Pk
Sk = j=1 Xj for k = 1, . . . , n. Then for any ϵ > 0, there is

  n
X
P max |Sk | ≥ ϵ ≤ ϵ−2 σk2 .
1≤k≤n
k=1

Proof. Note that E(Xk ) = 0 and V(Xk ) = σk2 , hence E(Sk ) = 0 and V(Sk ) =
Pk
E(Sk2 ) = j=1 σj2 . Moreover, Sk and Sn − Sk are independent. Now let Ak =

74
{|Sk | ≥ ϵ} ∩ {|Sj | < ϵ : 1 ≤ j < k}, then Ak ∩ Aj = ∅ whenever k ̸= j. Thus
{max1≤k≤n |Sk | ≥ ϵ} = ∪nk=1 Ak is a disjoint union. Therefore
n n
   n  X 1 X
P max |Sk | ≥ ϵ = P ∪ Ak = P(Ak ) ≤ 2 E(χAk Sk2 ),
1≤k≤n k=1 ϵ
k=1 k=1
R R |Sk |2 1 2
where we used P (Ak ) = Ak dP ≤ Ak ϵ2 dP = ϵ2 E(χAk Sk ) to obtain the
inequality. On the other hand, we have
n
h X  i n
hX  i
E(Sn2 ) ≥ E χAk Sn2 = E χAk Sn2 + 2Sk (Sn − Sk ) + (Sn − Sk )2
k=1 k=1
n
X n
X n
X
≥ E(χAk Sk2 ) + 2 E(χAk Sk (Sn − Sk )) = E(χAk Sk2 ),
k=1 k=1 k=1

where we used the fact E(χAk Sk (Sn − Sk )) = E(χAk Sk )E(Sn − Sk ) = 0 due to


the independency between χA S = Sn − Sk . Combining the two inequalities
k k
n
above and recalling E(Sn2 ) = k=1 σk2 completes the proof.
P

Theorem 7.15 (Strong law of large numbers). If {Xn : n ∈ N} is a sequence


of independent L2 random variables with mean µn and variances σn2 such that
P∞ σn2 1
Pn
n=1 n2 < ∞, then n k=1 (Xk − µk ) → 0 a.s. as n → ∞.
Pn
Proof. Denote Sn = k=1 (Xk − µk ). It suffices to show that, for any ϵ > 0,
P(lim supn { |Snn | ≥ ϵ}) = 0. Now we define
n |Sn | o n o
Ak = max ≥ϵ ⊂ max |Sn | ≥ 2k−1 ϵ .
2k−1 ≤n<2k n 1≤n<2k

Then it is clear that


n |S | ∞ [ ∞ n ∞ [∞
n
o \ |Sn | o \
lim sup ≥ϵ = ≥ϵ = Ak = lim sup Ak .
n→∞ n m=1 n=m
n m=1 k→∞
k=m

On the other hand, by Theorem 7.14 (Kolmogorov’s inequality), we know P(Ak ) ≤


P2k
(2k−1 ϵ)−2 n=1 σn2 , summing of which over k yields
k
∞ ∞ X
2 ∞ ∞
X X σn2 4 X X 1  2 16 X σn2
P(Ak ) ≤ = σ n ≤ < ∞.
22(k−1) ϵ2 ϵ2 n=1 22k ϵ2 n=1 n2
k=1 k=1 n=1 k≥log2 n

Hence, by Theorem 7.13 (Borel-Cantelli), we know P(lim supk Ak ) = 0.

7.3 Central limit theorem


Definition 7.16 (Moment generating function). The moment generating func-
tion
R ∞ of a random variable X with distribution function F is defined by E[etX ] =
tx

e dF (x) for every t ∈ R.

75
Remarks. The name of moment generating function is due to the fact that
(k)
E[X k ] = MX (0), the kth derivative of MX at t = 0 for all k = 0, 1, . . . .
P∞ E[X k ] k
This also implies that MX (t) = k=0 k! t . Moment generating function
MX is essentially the Laplace transform of the distribution function F . Thus,
two random variables are identical iff their moment generating functions are
identical.
Remarks. It is straightforward to verify that MaX+b (t) = ebt MX (at) for any
a, b ∈ R and MX+Y (t) = MX (t)MY (t) for any independent random variables
X and Y .

Theorem 7.17 (Central limit theorem). Let {Xk } be a sequence of independent


2 2
and identically√distributed
Pn L random variables with mean µ and variance σ ,
−1
then Yn := (σ n) k=1 (Xk − µ) has mean 0 and variance 1. Moreover, for
any a ∈ R, there is
Z a
1 2
lim P(Yn ≤ a) = √ e−t /2 dt.
n→∞ 2π −∞

That is, limn P(Yn ≤ a) = P(Z ≤ a) where Z ∼ N (0, 1) is the standard normal
random variable.
Proof. We assume µ = 0 and σ 2 = 1 since it is straightforward to extend
to the general case by changing variable Xk with (Xk − µ)/σ. Let F be the
distribution function of Z and Fn the distribution function of Yn , then we need
to show that Fn → F pointwisely. To this end, we consider √ their moment
generating functions MZ and MYn . We know MYn (t) = MX (t/ n)n . Noting
that MX (t) = 1 + t2 /2 + o(t2 ), we have MYn (t) = (1 + t2 /(2n) + o(t2 ))n . On
the other hand, MZ (t) = 1 + t2 /2 + o(t2 ). Hence MYn (t) → MZ (t) as n → ∞
for all t ∈ R sufficiently close to 0, and applying inverse Laplacian transform to
the moment generating functions yields the claim.

76

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy