Ye - Lecture Notes On Real Analysis
Ye - Lecture Notes On Real Analysis
Xiaojing Ye
Contents
1 Preliminaries 3
1.1 Basics of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Cardinality of sets . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Topology of metric spaces . . . . . . . . . . . . . . . . . . . . . . 8
3 Measurable Functions 21
3.1 Extended real numbers . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Simple functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Convergence almost everywhere . . . . . . . . . . . . . . . . . . . 25
3.4 Convergence in measure . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Measurable functions and continuous functions . . . . . . . . . . 28
3.6 Measurability of composite functions . . . . . . . . . . . . . . . . 29
4 Lebesgue Integrals 30
4.1 Integral of simple nonnegative functions . . . . . . . . . . . . . . 30
4.2 Integral of general nonnegative functions . . . . . . . . . . . . . . 31
4.3 Integral of general functions . . . . . . . . . . . . . . . . . . . . . 33
4.4 Relation between Riemann and Lebesgue integrals . . . . . . . . 39
4.5 Iterated integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.6 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1
6 Lp Spaces 60
6.1 Important inequalities . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2 Lp space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.3 L2 space and inner product . . . . . . . . . . . . . . . . . . . . . 65
6.4 Dual space of Lp . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7 Probability Theory 71
7.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2 The law of large numbers . . . . . . . . . . . . . . . . . . . . . . 73
7.3 Central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . 75
These notes outline the materials covered in class. Detailed derivations and
explanations are given in lectures and/or the referenced books. The notes will
be continuously updated with additional content and corrections. Questions
and comments can be addressed to xye@gsu.edu.
2
1 Preliminaries
1.1 Basics of sets
A set A is a collection of elements with certain properties
P , commonly written
as A = {x : x satisfies P } (e.g., A = x ∈ R : x2 > 1 ). Recall the following
definitions: subset A ⊂ B, union A ∪ B, intersection A ∩ B, complement Ac ,
empty set ∅, equal sets A = B, set minus A \ B = A ∩ B c .
Example 1.1. The following statements hold:
• A ∩ B ⊂ A ⊂ A ∪ B for any A, B.
• A ⊂ B iff B c ⊂ Ac .
• A ∩ B = ∅ iff A ⊂ B c .
• Let A, B ⊂ X. If E ∩ A = E ∪ B for any E ⊂ X, then A = X and B = ∅.
We frequently consider a set of sets, and call it a family (or collection) of sets:
F = {Aα : α ∈ I}, where I is the index set. Here I can be finite {1, . . . , n},
countably infinite N, or uncountably infinite. We also work with union and
intersection of multiple (often infinitely many) sets:
∪ Aα = {x : ∃ α ∈ I, s.t. x ∈ Aα } and ∩ Aα = {x : x ∈ Aα , ∀ α ∈ I}
α∈I α∈I
Theorem 1.5 (De Morgan’s law). (∩α∈I Aα )c = ∪α∈I Acα and (∪α∈I Aα )c =
∩α∈I Acα .
Example 1.6. Some basic tricks in proofs.
• Use of Venn diagram. For example, define the symmetric difference of A
and B by A△B = (A \ B) ∪(B \ A), show A△B = (A ∪ B) \ (A ∩ B).
• A ⊂ B iff x ∈ A ⇒ x ∈ B.
• A = B iff A ⊂ B and B ⊂ A.
Definition 1.7 (Limit of a sequence of monotone sets). Suppose A1 ⊃ A2 ⊃
· · · Ak ⊃ · · · , then we say {Ak } is non-increasing or simply decreasing (to be
distinguished from strictly decreasing where Ak+1 ⊊ Ak for all k), and ∩∞ k=1 Ak
is called the limit of {Ak }, denoted by limk→∞ Ak or simply limk Ak . Similarly,
suppose A1 ⊂ · · · Ak ⊂ · · · , then {Ak } is non-decreasing or simply increasing,
and ∪∞ k=1 Ak is the limit of {Ak }, also denoted by limk Ak .
3
Example 1.8. Let Ak = [k, ∞) ⊂ R for k = 1, . . . ,, then limk Ak = ∅.
Example 1.9. Suppose {fk } is a sequence of real-valued functions defined on
R, and f1 (x) ≤ f2 (x) ≤ · · · ≤ fk (x) ≤ · · · and fk (x) → f (x) as k → ∞ for
every x ∈ R. For any t ∈ R, define Ak = {x ∈ R : fk (x) > t}. Show that {Ak }
is increasing, and limk Ak = {x ∈ R : f (x) > t}.
Proof. It is clear that Ak is increasing and limk Ak ⊂ A := {x ∈ R : f (x) > t}.
For every x ∈ A, there are f (x) > t, and fk (x) ↑ f (x) as k → ∞. Hence
let ϵ = (f (x) − t)/2 > 0, then there exists k ′ such that fk′ (x) > f (x) − ϵ =
(f (x) + t)/2 > t, and therefore x ∈ Ak′ ⊂ ∪∞k=1 Ak = limk Ak .
Definition 1.10 (Upper and lower limit of a sequence of sets). Suppose {Ak }
is a sequence of sets. Denote Bj = ∪k≥j Ak , then {Bj } is non-increasing. The
upper limit of {Ak } is denoted by
∞ ∞ ∞
lim sup Ak = lim Bk = ∩ Bj = ∩ ∪ Ak
k→∞ k→∞ j=1 j=1 k=j
Example 1.13. Suppose fn (x) → f (x) for every x ∈ R. Show that, for any
t ∈ R, there is
∞ ∞ ∞
n 1o
{x ∈ R : f (x) ≤ t} = ∩ ∪ ∩ x ∈ R : fn (x) ≤ t + .
k=1 N =1 n=N k
Definition 1.14 (Cartesian product). The Cartesian product of A and B is
A × B = {(a, b) : a ∈ A, b ∈ B}.
Definition 1.15. A few examples of Cartesian product:
• A = {1, 2, 3}, B = {4, 5}, then A×B = {(1, 4), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5)}.
• [0, 1] × [0, 1] = {(x, y) : 0 ≤ x, y ≤ 1}.
4
1.2 Functions
Definition 1.16. We have a series of defintions regarding functions:
• Let X and Y be two sets. f : X → Y is called a function (or mapping, or
transformation) if f assigns every x ∈ X to one element y ∈ Y .
• Let A ⊂ X, then f (A) = {y ∈ Y : y = f (x) for some x ∈ X} is called the
image of A under f . Let B ⊂ Y , then f −1 (B) = {x ∈ A : f (x) ∈ B} is
called the inverse image (or pre-image) of B under f .
• X is called the domain of f . f (X) is the range of f .
• If f (X) = Y then f is called a mapping from X onto Y (or f is surjective).
If x1 ̸= x2 implies f (x1 ) ̸= f (x2 ) then f is called one-to-one (or f is
injective).
• If f is both injective and surjective, then f is called bijective, or a one-to-
one correspondence between X and Y . In this case f −1 exists and is also
a one-to-one correspondence.
• Suppose f : X → Y and g : Y → Z, then g ◦ f : X → Z is called the
composition of f and g, defined by (g ◦ f )(x) = g(f (x)).
Definition 1.18. X and Y is said to have the same cardinal number if there
exists a one-to-one correspondence f : X → Y . In this case, we denote X ∼ Y .
Then it is obvious that ∼ represents an equivalence relation: (i) A ∼ A; (ii)
A ∼ B iff B ∼ A; (iii) If A ∼ B and B ∼ C, then A ∼ C.
5
• Q ∼ N.
• (−1, 1) ∼ R by setting f (x) = x
1−x2 for x ∈ (−1, 1). [Or f (x) = tan( π2 x).]
Lemma 1.20 (Decomposition of sets by functions). Suppose f : X → Y and
g : Y → X. Then there exist A1 , A2 ⊂ X and B1 , B2 ⊂ Y , such that f (A1 ) =
B1 , g(B2 ) = A2 , A1 ∩ A2 = ∅, B1 ∩ B2 = ∅, A1 ∪ A2 = X, and B1 ∪ B2 = Y .
6
Example 1.26. A few examples of sets of cardinality ℵ0 .
• If A ∼ N and B ∼ N then A ∪ B ∼ N.
• If An ∼ N for every n ≥ 1, then ∪∞
n=1 An ∼ N.
• Q ∼ N. (Note that this only means that it is possible to list the elements
of Q in some order, but not necessarily by their values.)
Example 1.27. The set of mutually disjoint open intervals in R is at most
countable.
Proof. Define the function f that maps each interval to a rational number r in
that interval. Then f is injective to Q.
Example 1.28. If f : R → R is monotone, then {x ∈ R : limy→x− f (y) ̸=
limy→x+ f (y)} is at most countable.
Proof. WLOG, assume non-decreasing. Then for each point in the set above,
there exists rx ∈ Q such that limy→x− f (y) < rx < limy→x+ f (y). Define
g : x 7→ rx , then g is injective.
Example 1.29. If E is a countable subset of R, then ∃ x0 ∈ R such that
E ∩(E + {x0 }) = ∅. [Hint: consider A = {rn − rm : rn , rm ∈ E, n ̸= m} which
is countable, hence ∃ x0 ∈ R \ A.]
Theorem 1.30. If A is an infinite set and B is at most countable, then A ∼
A ∪ B.
Proof. Suppose B = {b1 , b2 , . . . }. Extract a countable set A1 = {a1 , a2 , . . . }
from A, and denote A2 = A \ A1 . Then define
a2i−1 if x = bi ∈ B
f (x) = a2i if x = ai ∈ A1
a if x = a ∈ A2
7
Example 1.33. The following statements hold:
• If |An | = ℵ1 for all n ≥ 1, then | ∪∞
n=1 An | = ℵ1 . [Hint: Ak ∼ (k, k + 1].]
• |Rn | = |R| = ℵ1 . [Hint: Ak = R for k = 1, . . . , n.]
Theorem 1.34 (There is no “cap” on cardinal number). Suppse A ̸= ∅, then
A ≁ 2A := {E : E ⊂ A}.
Proof. If not, then there exists a one-to-one correspondence f : A → 2A . Let
B = {x ∈ A : x ∈ / f (x)}. Since B ∈ 2A , there exists y ∈ A such that f (y) = B.
If y ∈ B, then y ∈ / f (y) = B; if y ∈/ B = f (y), then y ∈ B. Both yield
contraditions.
8
• (Only in Rn ) Suppose ai < bi for i = 1, . . . , n, then I = (a1 , b1 ) × · · · ×
(an , bnQ) is called an open box in Rn . The volume of I is denoted by
n
|I| = i=1 (bi − ai ).
• A sequence {xk } in X is said to converge to x if limk→∞ d(xk , x) = 0 (or
simply denoted by xk → x).
• A sequence {xk } in X is said to be Cauchy if for any ϵ > 0, there exists
N , such that d(xn , xm ) < ϵ for all n, m ≥ N .
• Let E be an infinite subset of X. If there exists a sequence of distinct
points {xk } such that xk → x, then x is called a limit point (or accumu-
lation point) of E. [Note that a limit point of E needs not be in E.]
• The set of limit points of E is denoted by E ′ . The union Ē := E ∪ E ′ is
called the clousure of E. [Ē is a closed set; see below.]
• If A ⊂ B and Ā = B, then A is called dense in B, or A is a dense subset
of B.
• If x ∈ E and x is not a limit point of E, then x is called an isolated point
of E (i.e., ∃ δ > 0, such that B(x, δ) ∩ E = {x}).
• If Gα is open for every α ∈ I and E ⊂ ∪α∈I Gα , then {Gα : α ∈ I} is
called an open cover of E.
• E is called compact if every open cover of E contains a finite subcover.
[In Rn , E is compact iff E is closed and bounded; see below.]
Theorem 1.39. x ∈ E ′ iff for any δ > 0 there is (B(x, δ) \ {x}) ∩ E ̸= ∅.
Proof. Necessity is clear. Let δ1 = 1 and select x1 ∈ (B(x, δ1 ) \ {x}) ∩ E. Then
for any k ≥ 1, let δk+1 = 21 d(xk , x) and select xk+1 ∈ (B(x, δk+1 ) \ {x}) ∩ E,
then we obtain a sequence {xk } which are distinct and xk → x, i.e., x ∈ E ′ .
Theorem 1.40. E is closed iff E ′ ⊂ E.
Proof. Suppose E is closed, then E c is open. If x ∈ E ′ \ E, then x ∈ E c and
there exists {xk } ⊂ E and xk → x. But this is a contradiction since x is an
interior point of E c .
Suppose E ′ ⊂ E. For any x ∈ E c , we know x ∈ / E ′ , i.e., there exists δ > 0
c
such that B(x, δ) ∩ E = ∅. Hence B(x, δ) ⊂ E , i.e., x is an interior point of
E c . As x is arbitrary, we know E c is open, and hence E is closed.
Theorem 1.41. Ē is closed.
9
Theorem 1.43. Let E1 , E2 ⊂ Rn . Then (E1 ∪ E2 )′ = E1′ ∪ E2′ .
Proof. It is clear that Ej′ ⊂ (E1 ∪ E2 )′ for j = 1, 2. If x ∈ (E1 ∪ E2 )′ , then there
exists a sequence of distinct points {xk } ⊂ E1 ∪ E2 , such that xk → x. Then at
least one of E1 and E2 contains a subsequence of {xk } which also converges to
x. Hence (E1 ∪ E2 )′ ⊂ E1′ ∪ E2′ .
Proof. The necessity is clear. To show the sufficiency, suppose that for every
t both E1c and E2c are closed. If f is not continuous at x0 , then there exists
ϵ0 > 0 and a sequence xk → x0 such that |f (xk ) − f (x0 )| ≥ ϵ0 . WLOG,
suppose f (xk ) ≤ f (x0 ) − ϵ0 for all k. Then set t = f (x0 ) − ϵ0 , we know
E1c = {x ∈ Rn : f (x) ≤ t} is closed, which is a contradiction since xk → x0 and
{xk } ⊂ E1c but x ∈
/ E1c .
Theorem 1.46 (Operations on open and closed sets). Union of (finitely or
infinitely many) open sets is open; Intersection of finitely many open sets is
open. Contrary for closed sets. Namely,
• If Fα is closed and Gα is open for every α ∈ I, then ∩α∈I Fα is closed,
and ∪α∈I Gα is open.
• If Fk is closed and Gk is open for k = 1, . . . , n, then ∩nk=1 Fk is closed,
and ∪nk=1 Gk is open.
Proof. We only show this for open sets. Then applying De Morgan’s law implies
those for closed sets.
For every x ∈ ∪α∈I Gα , there exists α′ ∈ I such that x ∈ Gα′ , and hence
∃ B(x, δ) ⊂ Gα′ ⊂ ∪α∈I Gα .
For every x ∈ ∩nk=1 Gk , there exist δk > 0 such that B(x, δk ) ⊂ Gk for every
k = 1, . . . , n. Let δ = min{δk : k = 1, . . . , n} > 0 (require finiteness!) then
B(x, δ) ⊂ Gk for all k.
Definition 1.47 (Gδ -set and Fσ -set). We call H = ∩∞ k=1 Gk a Gδ -set if Gk is
open for all k, and K = ∪∞
k=1 Fk an Fσ -set if Fk is closed for all k.
10
Proof. It suffices to show that K c is open, i.e., every point of K c is an interior
point. For every y ∈ K c and x ∈ K, denote δx = 12 d(x, y) > 0. Then {B(x, δx ) :
x ∈ K} is an open cover of K, and hence has a finite subcover {B(xi , δxi ) : 1 ≤
i ≤ k} of K. Let δ := min1≤i≤k δxi (which is > 0), then B(y, δ) ∩ B(xi , δxi ) = ∅
for all i. Hence B(y, δ) ⊂ K c , i.e., y is an interior point of K c .
Proof. Suppose not, then I has an infinite open cover A = {Gα : α} which
does not have a finite subcover. Perform bisection of each side of I, we obtain
2n closed boxes, and at least one of them cannot be covered by finitely many
open sets in A. Hence we perform bisection of this box again, and obtain a
smaller closed box that does not have a finite subcover either, and so on, which
never ends. It is clear that the size of the box shrinks to 0 and converges
to a point x ∈ I, which must be an interior point of some Gα′ , i.e., ∃ δ > 0
such that B(x, δ) ⊂ Gα′ . Then we should have stopped within finitely many
iterations of the bisection once the box is in B(x, δ) which is covered by Gα′ , a
contradiction.
Theorem 1.52 (Heire-Borel). Bounded closed sets in Rn are compact.
Proof. Let F be closed and bounded. Then there exists a bounded closed box I
such that F ⊂ I. Since I is compact and F is closed, we know F is compact.
Example 1.53. Suppose F ⊂ Rn is closed and bounded, and G ⊂ Rn is
open, and F ⊂ G. Then ∃ δ > 0 such that for every x ∈ B(0, δ), there is
F + {x} := {y + x : y ∈ F } ⊂ G.
Proof. Since every y ∈ F is an interior point of G, we know ∃ δy > 0 such that
δ
B(y, δy ) ⊂ G. Also {B(y, 2y ) : y ∈ F } is an open cover of F , and hence has
δyi
a finite subcover {B(yi , 2 ) : 1 ≤ i ≤ k}. Namely, for every y ∈ F , we know
δyi δyi
y ∈ B(yi , 2 ) for some i ∈ {1, . . . , k}. Set δ = min1≤i≤k 2 . Then for any
11
x ∈ B(0, δ) and y ∈ F , x + y ∈ B(yi , δyi ) for some i ∈ {1, . . . , k}, and hence
x + y ∈ G.
Before closing this section, we consider several examples of distances between
sets.
12
2 Measure and Measurable Sets
2.1 σ-algebra
Definition 2.1 (σ-algebra). Let X be a nonempty set. Then Γ ⊂ 2X (i.e., Γ
is a collection of subsets of X) is called a σ-algebra of X if:
1. ∅ ∈ Γ;
2. If A ∈ Γ then Ac ∈ Γ;
3. If An ∈ Γ for n = 1, 2, . . . , then ∪∞
n=1 An ∈ Γ.
Given the definition above, it is also easy to verify that the following state-
ments hold if Γ is a σ-algebra of X:
1. X ∈ Γ;
2. If A, B ∈ Γ, then A \ B ∈ Γ;
3. If Ak ∈ Γ for k = 1, . . . , n, then ∪nk=1 Ak ∈ Γ;
4. If An ∈ Γ for k = 1, 2, . . . , then ∩∞
k=1 Ak , lim supk Ak , lim inf k Ak ∈ Γ.
13
Theorem 2.7 (Properties of outer measure). The outer measure µ∗ in Rn has
the following properties:
1. µ∗ (E) ≥ 0; µ∗ (∅) = 0 [Note that µ∗ (E) = 0 dose not imply E = ∅.]
2. If E1 ⊂ E2 then µ∗ (E1 ) ≤ µ∗ (EP
2 ).
∞
3. (Sub-additivity) µ∗ (∪∞ E
k=1 k ) ≤ ∗
k=1 µ (Ek ).
Proof. The first two are trivial. We only show the sub-additivity. For any ϵ > 0
and any Ek , there exists an open-box-cover {Ik,l : l ∈ N} of Ek , such that
∞
X ϵ
|Ik,l | < µ∗ (Ek ) + .
2k
l=1
14
Remarks. In order to show E ∈ M, it suffices to show µ∗ (T ) ≥ µ∗ (T ∩ E) +
µ∗ (T ∩ E c ) for any test set T , since µ∗ (T ) ≤ µ∗ (T ∩ E) + µ∗ (T ∩ E c ) is already
implied by the sub-additivity of µ∗ .
Example 2.14. If µ∗ (E) = 0, then E ∈ M. [Hint: µ∗ (T ) ≥ µ∗ (T ∩ E c ) =
µ∗ (T ∩ E) + µ∗ (T ∩ E c ) as 0 ≤ µ∗ (T ∩ E) ≤ µ∗ (E) = 0.]
Proof. Items 1 and 2 are trivial. For Item 3, we only need to show E1 ∪E2 ∈ M,
since E1 ∩ E2 = (E1c ∪ E2c )c and E1 \ E2 = E1 ∩ E2c . For any test set T ,
consider T ∩ (E1 ∪ E2 ), which can be partitioned into three mutually disjoint
sets: T ∩ E1 ∩ E2 , T ∩ E1 ∩ E2c , and T ∩ E1c ∩ E2 (use Venn diagram). Also note
that T ∩ (E1 ∪ E2 )c = T ∩ E1c ∩ E2c . Therefore
µ∗ (T ) = µ∗ (T ∩ E1 ) + µ∗ (T ∩ E1c )
= [µ∗ (T ∩ E1 ∩ E2 ) + µ∗ (T ∩ E1 ∩ E2c )] + [µ∗ (T ∩ E1c ∩ E2 )
+ µ∗ (T ∩ E1c ∩ E2c )]
≥ µ∗ (T ∩ (E1 ∪ E2 )) + µ∗ (T ∩ E1c ∩ E2c )
where the first equality is due to E1 ∈ M with T as test set, the second equality
is due to E2 ∈ M and T ∩ E1 and T ∩ E1c as test sets, and the inequality is due
to the sub-additivity
Pk of µ∗ applied to the first three
Pk terms. Note that it is easy
to show that i=1 Ek ∈ M and µ∗ (∪ki=1 Ek ) = i=1 µ∗ (Ek ).
For Item 4, WLOG, we assume Ek are mutually disjoint; otherwise replace
Ek by Fk := Ek \ (∪k−1 k
i=1 Ei ) for k ≥ 2 which are also in M. Denote Sk = ∪i=1 Ei
∞ c c
and S = ∪i=1 Ei . Note that Sk ∈ M due to Item 3, and S ⊂ Sk . Therefore,
for any T , there is
k
X
µ∗ (T ) = µ∗ (T ∩ Sk ) + µ∗ (T ∩ Skc ) ≥ µ∗ (T ∩ Ei ) + µ∗ (T ∩ S c )
i=1
P∞
Letting k → ∞, we obtain µ∗ (T ) ≥ i=1 µ∗ (T ∩ Ei ) + µ∗ (T P
∩ S c ) ≥ µ∗ (T ∩ S) +
∞
µ (T ∩ S ). Hence S ∈ M. Letting T = S yields µ (S) ≥ k=1 µ∗ (S ∩ Ek ) =
∗ c ∗
P ∞ ∗
k=1 µ (Ek ).
15
The theorem above implies that M, the collection of measurable sets in Rn ,
is a σ-algebra of Rn . This is formally called the Lebesgue measure of Rn .
Definition 2.17 (Lebesgue measure). For any E ∈ M, the Lesbesgue measure
µ of E is defined by µ(E) = µ∗ (E). The space Rn , the set M, and µ, constitute
the Lebesgue measure space (Rn , M, µ) on Rn .
Remarks. In general, for any set X and its σ-algebra A, and an extended
real-valued µ : A → R ∪ {∞}, we call (X, A, µ) a measurable space if (i) 0 ≤
µ(E) ≤ ∞ for any PE ∈ A; (ii) µ(∅) = 0; and (iii) µ is countably additive,
∞
i.e., µ(∪∞
k=1 E k ) = k=1 µ(Ek ) for any countable collection of mutually disjoint
sets {Ek ∈ A : k ∈ N}. If µ(X) = ∞, but X = ∪∞ k=1 Ek where Ek ∈ M and
µ(Ek ) < ∞ for all k, then µ is called σ-finite.
Theorem 2.18 (Continuity of µ from below). Suppose Ek ∈ M for all k =
1, 2, . . . , and E1 ⊂ E2 ⊂ . . . , then there is µ(limk Ek ) = limk µ(Ek ).
Proof. If limk µ(Ek ) = ∞ then trivial. Denote E0 = ∅ and Dk = Ek −Ek−1 ∈ M
for all k = 1, 2, . . . . Then limk Ek = ∪∞ ∞
k=1 Ek = ∪k=1 Dk . Hence,
∞ X∞ k
X
µ lim Ek =µ ∪ Dk = µ(Dk ) = lim µ(Dj ) = lim µ(Ek )
k→∞ k=1 k k
k=1 j=1
16
Corollary 2.21 (Fatou’s lemma for measures). Suppose Ek ∈ M for all k.
Then there is µ(lim inf k Ek ) ≤ lim inf k µ(Ek ).
Proof. Let Bk = ∩j≥k Ej for every k ∈ N. Then Bk is non-decreasing, Bk ⊂ Ek
for all k, and hence µ(lim inf k Ek ) = µ(limk Bk ) = limk µ(Bk ) ≤ lim inf k µ(Ek ).
Remarks. We have two remarks regarding the Fatou’s lemma for measures.
• In general we do not have µ(lim inf k Ek ) = lim inf k µ(Ek ). For example,
let Ek = [0, 12 ] if k is odd and ( 12 , 1] if even, then lim inf k Ek = ∅, and
µ(lim inf k Ek ) = 0 < 21 = lim inf k µ(Ek ) for all k.
• If Ek ⊂ E for all k and µ(E) < ∞, then we also have µ(lim supk Ek ) ≥
lim supk µ(Ek ) by substituting Ek with Ekc for every k and observing that
(lim supk Ek )c = lim inf k Ekc .
Lemma 2.22 (Carathéodory). Suppose G is an open proper subset of Rn , E ⊂
G (E needs not be in M). Let Ek = {x ∈ E : d(x, Gc ) ≥ k1 } for every k ∈ N.
Then limk µ∗ (Ek ) = µ∗ (E).
Proof. It is clear that E1 ⊂ E2 ⊂ · · · ⊂ Ek ⊂ · · · ⊂ E. Hence µ∗ (Ek ) ≤ µ∗ (E)
for all k. For any x ∈ E ⊂ G, x is an interior point of G, and hence there exists
δ > 0 such that B(x, δ) ⊂ G, i.e., d(x, Gc ) ≥ δ. Hence E = ∪∞ k=1 Ek . WLOG,
assume µ∗ (E) < ∞.
Let Dk = Ek+1 \ Ek for k = 1, 2, . . . , then we have
k
k X
∞ > µ∗ (E) ≥ µ∗ (E2k ) ≥ µ∗ ( ∪ D2j ) = µ∗ (D2j ), ∀ k ∈ N,
j=1
j=1
where we used the fact that d(D2i , D2j ) > 0 for all i < j ≤ k to obtain the
equalityP(µ∗ is additive if two sets are separated with
P∞a positive distance).
∞
Hence j=k+1 µ∗ (D2j ) → 0 as k → ∞. Similarly, ∗
j=k µ (D2j+1 ) → 0 as
k → ∞. Therefore, for any P∞ ϵ >∗ 0, there exists k large enough, such that
P ∞ ∗ ϵ ϵ
j=k+1 µ (D2j ) < 2 and j=k µ (D2j+1 ) < 2 . On the other hand, note that
E = E2k ∪ (∪∞ ∞
j=k+1 D2j ) ∪ (∪j=k D2j+1 ), we know that
∞
X ∞
X
µ∗ (E) ≤ µ∗ (E2k ) + µ∗ (D2j ) + µ∗ (D2j+1 )
j=k+1 j=k
∗ ∗
< µ (E2k ) + ϵ ≤ lim µ (Ek ) + ϵ.
k→∞
= µ∗ (T ∩ F ) + µ∗ (Fk ) → µ∗ (T ∩ F ) + µ∗ (T ∩ F c )
17
as k → ∞, where we used the fact that d(T ∩ F, Fk ) ≥ k1 > 0 to obtain the
second equality above. Hence µ∗ (T ) ≥ µ∗ (T ∩ F ) + µ∗ (T ∩ F c ). So F ∈ M.
Corollary 2.24 (Open sets are measurable). If G ⊂ Rn is open, then G ∈ M.
Corollary 2.25 (Borel sets are measurable). If A ∈ B(Rn ), then A ∈ M.
Proof. Since B(Rn ) is the smallest σ-algebra generated by the family of open
sets, and M is a σ-algebra also containing all open sets due to Corollary 2.24,
we know B(Rn ) ⊂ M.
Theorem 2.26. Suppose E ∈ M. For any ϵ > 0, there exist an open set G
such that E ⊂ G and µ(G \ E) < ϵ, and a closed set F such that F ⊂ E and
µ(E \ F ) < ϵ.
Proof. First assume µ(E) < ∞. Then there exists an open-box-cover {Ik } of
E such that G = ∪∞ k=1 Ik and µ(G) < µ(E) + ϵ. Hence 0 ≤ µ(G \ E) =
µ(G) − µ(E) < ϵ.
If µ(E) = ∞, then denote Ek = E ∪ B(0, k) for every k ∈ N. Then µ(Ek ) <
∞ for all k, and E = ∪∞k=1 Ek . For every k, there exists Gk such that Ek ⊂ Gk
and µ(Gk \ Ek ) < 2ϵk . Now let G = ∪∞ k=1 Gk , then E ⊂ G and G \ E =
∪∞
k=1 (Gk \ E). Hence
∞ ∞ ∞
X X X ϵ
µ(G \ E) ≤ µ(Gk \ E) ≤ µ(Gk \ Ek ) < = ϵ.
2k
k=1 k=1 k=1
18
Proof. By Theorem 2.28, for every Ek , there exists a Gσ -set Hk (hence Hk ∈
M), such that Ek ⊂ Hk and µ(Hk ) = µ∗ (Ek ). Hence µ∗ (lim inf k Ek ) ≤
µ(lim inf k Hk ) ≤ lim inf k µ(Hk ) = lim inf k µ∗ (Ek ).
Corollary 2.30. Suppose Ek ⊂ Rn but need not be in M, and E1 ⊂ E2 ⊂
· · · ⊂ Ek ⊂ · · · , then µ∗ (limk Ek ) = limk µ∗ (Ek ).
Proof. Note that µ∗ (limk Ek ) ≤ limk µ∗ (Ek ) by Theorem 2.29. The converse is
obvious because Ek ⊂ limk Ek for all k.
Example 2.31 (Cantor set). Consider the closed interval [0, 1] ⊂ R. Let C1 =
[0, 31 ] ∪ [ 23 , 1] by removing the open middle third ( 13 , 32 ) from [0, 1]. Then let
C2 = [0, 19 ] ∪ [ 29 , 13 ] ∪ [ 23 , 97 ] ∪ [ 89 , 1], by removing the middle thirds from both of
[0, 31 ] and [ 23 , 1], and so on. Then Ck is compact for every k, and C1 ⊃ C2 ⊃ · · · .
Then C = ∩∞ k=1 Ck is called the Cantor set. In addition, C has the following
properties: (i) C is compact, nowhere dense, totally disconnected, and has no
isolated point; (ii) µ(C) = 0; and (iii) |C| = c.
Proof. (i) Note that C is closed and bounded, hence compact. The remaining
three are easy to check; (ii) The total length of intervals removed is 13 + 29 + 27
4
+
P∞ 2k−1 P∞ ak
· · · = k=1 3k = 1; (iii) It can be shown that C = { k=1 3k : ak = {0, 2}},
hence |C| = c (see comment below Definition 1.32).
19
2. For any x ∈ E, there exists c ∈ CE and q ∈ Q such that x = c + q, since
CE contains a point from the same equivalence class which x belongs to.
Note that the first property also implies that {λ + CE : λ ∈ Λ} is a collection
of disjoint sets provided that Λ ⊂ Q. Now the following theorem shows that
non-measurable sets exist.
Theorem 2.33 (Vitali: non-measurable sets exist). Any set E with µ∗ (E) > 0
contains non-measurable subset.
Proof. WLOG, we assume 0 < µ∗ (E) < ∞ and E ⊂ [−b, b] ⊂ R (otherwise
we take B(0, b) ∩ E for some b ∈ N). Let Λ = [−2b, 2b] ∩ Q. Let CE be
the subset of E containing exactly one point from each equivalence class, then
{λ + CE : λ ∈ Λ} is a collection of countably many disjoint sets.
If CE ∈ M, then by Lemma 2.32, we know µ(CE ) = 0. Now for any x ∈ E,
there exists c ∈ CE and q ∈ Q, such that q = x − c. Since CE ⊂ E ⊂ [−b, b],
we know q ∈ [−2b, 2b] and hence q ∈ Λ. Therefore x ∈ ∪λ∈Λ (λ + CE ). As x
∗
P we know E ⊂ ∪P
is arbitrary, λ∈Λ (λ + CE ). However 0 < µ (E) ≤ µ(∪λ∈Λ (λ +
CE )) = λ∈Λ µ(λ + CE ) = λ∈Λ µ(CE ) = 0, which is a contradiction. Hence
CE ∈ / M.
Theorem 2.34. There exist disjoint sets A, B ⊂ R such that µ∗ (A ∪ B) <
µ∗ (A) + µ∗ (B).
/ M and T ⊂ R, there is µ∗ (T ) ≥ µ∗ (T ∩ E) +
Proof. If not, then for any E ∈
∗
µ (T ∩ E ) since T ∩ E and T ∩ E c are disjoint. This implies E ∈ M by
c
definition, a contradiction.
Example 2.35. Let E = [0, 1) ⊂ R and CE be the subset of E containing
exactly one point from each equivalence class in E as above. Let Dλ = {x +
λ (mod 1) : x ∈ CE } for every λ ∈ Λ := [0, 1) ∩ Q. Then E = ∪λ∈Λ Dλ is a
countable disjoint union, µ∗ (Dλ ) = µ∗ (CE ) > 0 for all λ ∈ Λ, and
[ X
1 = µ∗ ([0, 1)) = µ∗ (E) = µ∗ ( Dλ ) < µ∗ (Dλ ) = ∞.
λ∈Λ λ∈Λ
20
3 Measurable Functions
3.1 Extended real numbers
Recall the following properties of the extended real numbers R̄ = R ∪ {±∞}:
1. For any x ∈ R, there is −∞ < x < ∞.
2. For any x ∈ R, there are
4. (±∞) − (±∞) and ±(∞) + (∓∞) are not defined. We define 0 · (±∞) = 0
in this course.
Definition 3.1 (Measureable function). Suppose E ⊂ Rn . We call the function
f : E → R̄ measurable, or f is a measurable function on E, if {x ∈ E : f (x) >
t} ∈ M for any t ∈ R.
It turns out that we only need to show {x ∈ E : f (x) > r} ∈ M for all r in
a dense set D of R, for example D = Q.
Theorem 3.2. Suppose f : E → R̄ and D is a dense subset of R. If {x ∈ E :
f (x) > r} ∈ M for any r ∈ D, then f is measurable.
Proof. For any t ∈ R, there exist {rk } ⊂ D such that rk ↓ t. Hence {x ∈ E :
f (x) > t} = ∪∞
k=1 {x ∈ E : f (x) > rk } ∈ M.
21
3. {x ∈ E : f (x) < t} = E \ {x ∈ E : f (x) ≥ t}.
4. {x ∈ E : f (x) = t} = {x ∈ E : f (x) ≥ t} ∩ {x ∈ E : f (x) ≤ t}.
5. {x ∈ E : f (x) < ∞} = ∪∞ k=1 {x ∈ E : f (x) < k}.
6. {x ∈ E : f (x) = +∞} = E \ {x ∈ E : f (x) < +∞}.
7. {x ∈ E : f (x) > −∞} = ∪∞ k=1 {x ∈ E : f (x) > −k}.
8. {x ∈ E : f (x) = −∞} = E \ {x ∈ E : f (x) > −∞}.
Remarks. We can use any of the first three as the definition of measurable
sets.
Theorem 3.5. If f : E1 ∪ E2 → R̄, and f is measurable on E1 and E2 , then f
is measurable on E1 ∪ E2 .
Proof. {x ∈ E1 ∪ E2 : f (x) > t} = {x ∈ E1 : f (x) > t} ∪ {x ∈ E2 : f (x) > t} ∈
M.
Theorem 3.6. If f : E → R̄, and A ⊂ E is measurable, then f is measurable
on A.
Proof. {x ∈ A : f (x) > t} = A ∩ {x ∈ E : f (x) > t} ∈ M.
Example 3.7. Suppose E ∈ M. Then χE : Rn → R̄ is measurable.
Proof. Note that all of the three sets below are measurable:
n
R
if t < 0
n
{x ∈ R : χE (x) > t} = E if 0 ≤ t < 1
∅
if t ≥ 1
22
Corollary 3.10. If fk , f : E → R̄, fk is measurable, and fk → f for every
x ∈ E, then f is measurable.
Example 3.11. Suppose that f : E → R̄ is measurable. Define f + (x) =
max(f (x), 0) and f − (x) = max(−f (x), 0) for all x ∈ E. Then f + , f − are
measurable.
which implies that fk is measurable. Then limk f (x, y) = f (x, y) for all y implies
that f (x, y) is measurable.
Remarks. Note that this is different from “bounded by M a.e. E”, which is
µ({x ∈ E : |f (x)| > M }) = 0.
Theorem 3.18. If f, g : E → R̄, f = g a.e. E, and f is measurable, then g is
measurable.
Proof. Let A = {x : f = g}, then µ(E \ A) = 0 and A ∈ M. Therefore
{x ∈ E : g(x) > t} = {x ∈ A : g(x) > t} ∪ {x ∈ Ac : g(x) > t} = {x ∈ A :
f (x) > t} ∪ {x ∈ Ac : g(x) > t} ∈ M since both sets are measurable.
Example 3.19. Suppose E ∈ M and 0 < µ(E) < ∞. If 0 < f < ∞ a.e. E,
then for any δ > 0, there exits Eδ ⊂ E and K > 0, such that µ(Eδ ) < δ and
1
K ≤ f (x) ≤ K for all x ∈ E \ Eδ .
23
Proof. Define Ak = {x ∈ E : k1 ≤ f (x) ≤ k}. Then it is clear that Ak is
non-decreasing. Let A = ∪∞k=1 Ak . Then E = A ∪ Z0 ∪ Z1 where Z0 = {x ∈ E :
f (x) = 0} and Z1 = {x ∈ E : f (x) = ∞}. Note that µ(Z0 ) = µ(Z1 ) = 0. Hence
µ(E) = µ(E \ (Z0 ∪ Z1 )) = µ(A) = µ(∪∞k=1 Ak ) = limk µ(Ak ). Thus there exists
1
K, such that for Eδ = Z0 ∪ Z1 ∪ AcK , there are µ(Eδ ) < δ and K ≤ f (x) ≤ K
c
for all x ∈ AK = E \ Eδ = E \ (Z0 ∪ Z1 ∪ AK ).
24
Example 3.24. Let gk (x) = fk (x)χB(0,k) (x) where fk is the simple function
in the previous theorem, then gk is also simple and has compact support, and
gk (x) → f (x) for all x ∈ E.
∞ ∞
1 c ∞ ∞
n 1o
E \ Eδ = ∩ ∩ Ek = ∩ ∩ x ∈ E : |fk (x) − f (x)| <
i=1 k=ji i i=1 k=ji i
1
Therefore, for any i, there exists ji , such that |fk (x)−f (x)| < i for all x ∈ E\Eδ
and k ≥ ji . This means fk ⇒ f on E \ Eδ .
Remarks. A few remarks are in place:
1. The boundedness of E is necessary. [Hint: consider fk (x) = χ[0,k] (x) and
f (x) = 1 for all x ∈ R. Or consider fk (x) = xk and f (x) = 0 for x ∈ R.]
25
2. If µ(E) = ∞, we can still show that for any M > 0, there exists EM ⊂ E
such that µ(EM ) > M and fk ⇒ f on EM . [Hint: choose any set FM ⊂ Rn
with µ(FM ) = M + δ, and EM = FM \ Eδ by applying Theorem 3.27
(Egorov) to E = FM .]
3. There exists a sequence of sets {Ej } with non-decreasing measure to µ(E),
such that fk ⇒ f on Ej for every j, and µ(E \ (∪∞ j=1 Ej )) = 0.
4. We can choose Eδ such that E \ Eδ is also closed. To this end, just choose
Eδ/2 at the first place such that µ(Eδ/2 ) < δ/2 and fk ⇒ f on E \ Eδ/2 ,
and choose a closed set F ⊂ E \ Eδ/2 such that µ((E \ Eδ/2 ) \ F ) < δ/2
(by Theorem 2.26), then µ(E \ F ) < δ, and fk ⇒ f on F .
Example 3.28. fk (x) = xk on [0, 1]. Then for any δ > 0 we can show fk ⇒ f
on [0, 1 − δ].
26
• The boundedness of E in Theorem 3.31 is again necessary: consider
fk (x) = χ[0,k] (x) and f (x) = 1 for all x ∈ R. Or consider fk (x) = xk
and f (x) = 0 for x ∈ R.
Theorem 3.32 (Almost uniform convergence ⇒ convergence in µ). Suppose
fk , f : E → R̄, fk , f finite a.e. E. If for any δ > 0, there exists Eδ ⊂ E such
µ
that µ(Eδ ) < δ and fk ⇒ f on E \ Eδ , then fk → f .
Proof. For any δ > 0, there is Eδ ⊂ E, µ(Eδ ) < δ, and fk ⇒ f on E \ Eδ .
Hence for any ϵ > 0, there exists an integer K depending on ϵ and δ, such that
|fk (x) − f (x)| < ϵ for all x ∈ E \ Eδ and k ≥ K. Therefore Ek (ϵ) = {x ∈ E :
|fk (x) − f (x)| ≥ ϵ} ⊂ Eδ , i.e., µ(Ek (ϵ)) ≤ µ(Eδ ) < δ for all k ≥ K. Therefore
µ
limk µ(Ek (ϵ)) = 0, i.e., fk → f on E.
27
µ
Finally, we will show fk → f . To this end, for any ϵ > 0, consider
n ϵo n ϵo
{x : |fk (x)−f (x)| ≥ ϵ} ⊂ x : |fk (x) − fki (x)| ≥ ∪ x : |fki (x) − f (x)| ≥
2 2
Note that the two sets on the right hand side have measure approaching 0 as
k, i → ∞. Hence µ({x : |fk (x) − f (x)| ≥ ϵ}) → 0 as k → ∞.
Remarks. It is easy to show the converse of Theorem 3.34. Hence {fk } is
µ
Cauchy in measure iff fk → f .
Theorem 3.35 (Reisz: Convergence in µ ⇒ ∃ subsequence converges a.e.). If
µ
fk → f on E, then there exists a subsequence fki → f a.e. E.
Proof. From the proof of Theorem 3.34, there exists a subsequence {fki } of
µ
{fk }, such that fki → g a.e. E for some g and fki → g. On the other hand,
µ
fki → f . Hence f = g a.e. E. Therefore fki → f a.e. E.
28
3.6 Measurability of composite functions
Lemma 3.39. Suppose f : Rn → R. Then f is measurable iff f −1 (G) ∈ M for
any open set G ⊂ R.
Proof. Necessity is obvious. Now suppose f is measurable. Then for any (a, b) ⊂
R, we know f −1 ((a, b)) = f −1 ((a, ∞)) \ f −1 ([b, ∞)) ∈ M. Recall that for any
open set G ⊂ R, it can be written as the union of countably many disjoint open
intervals as G = ∪∞k=1 (ak , bk ). Hence f
−1
(G) = ∪∞k=1 f
−1
((ak , bk )) ∈ M.
Theorem 3.40. Suppose f : R → R is continuous and g : Rn → R is measur-
able, then f ◦ g : Rn → R is measurable.
Proof. Since f is continuous, we know that for any open G, f −1 (G) is open.
Hence (f ◦ g)−1 (G) = g −1 (f −1 (G)) ∈ M.
Remarks. Note that if f is measurable and g is continuous, f ◦ g is not neces-
sarily measurable.
Proof. Define h(x, y) = f (x). Then for any t ∈ R, {(x, y) : h(x, y) > t} = {x :
f (x) > t} × R ∈ M(R2 ). Hence h is measurable. Let T (x, y) = (x − y, x + y)
then T is a linear nondegenerate transformation. Therefore g(x, y) = h(T (x, y))
is measurable.
29
4 Lebesgue Integrals
4.1 Integral of simple nonnegative functions
n
Definition 4.1 (Integral of simple nonnegative function). PpSuppose f : R →
R̄+ is a simple measurable function such that f (x) = i=1 ci χAi (x), where
{Ai } are mutually disjoint and Rn = ∪pi=1 Ai . Then for any E ∈ M, the integral
of f on E is defined by
Z p
X
f (x) dµ(x) = ci µ(E ∩ Ai ).
E i=1
30
Pp
Proof. Suppose f (x) = i=1 ai χAi (x). Then there is
Z p
X p
X
lim f = lim ai µ(Ek ∩ Ai ) = ai µ lim (Ek ∩ Ai )
k→∞ Ek k→∞ k→∞
i=1 i=1
p
X p
X Z
= ai µ lim (Ek ) ∩ Ai = ai µ(E ∩ Ai ) = f,
k→∞ E
i=1 i=1
31
R R
Theorem 4.9 (Beppo Levi: fk ↑ f ⇒ fk ↑ f ). Suppose fk : E → R̄+ ,
fk (x) ≤ fk+1
R (x) forR every x ∈ E and k, and limk fk (x) = f (x) for all x ∈ E.
Then limk E fk = E f .
R R
Proof.
R It is clear
R that E f is well defined. Since fk ↑ f , we know E fk ↑ and
limk E fk ≤ E f .
For any simple function h ≤ f and any c ∈ (0, 1), consider Ek = {x : fk (x) ≥
ch(x)}. Then Ek ↑ E. Hence
Z Z Z Z Z
lim fk ≥ lim fk ≥ lim ch = ch = c h.
k→∞ E k→∞ Ek k→∞ Ek E E
R R R R
Letting c → 1, we have limk E fk ≥ E h. Hence limk E fk ≥ E
f.
R R R
Remarks. If E f1 < ∞ and fk ↓ f ≥ 0, then E fk ↓ E f .
Theorem
R 4.10 (Linearity
R of integral).
R Suppose f, g : E → R̄+ and α, β ≥ 0.
Then E (αf + βg) = α E f + β E g.
Proof. Only need to show this for α = β = 1. Let fk ↑ f and gk ↑ g, then
fk + gk ↑ f + g. Hence
Z Z nZ Z o Z Z
(f + g) = lim (fk + gk ) = lim fk + gk = f+ g.
E k→∞ E k→∞ E E E E
R R
Example 4.11. Suppose f, g : E → R̄+ and f = g a.e. E. Then E f = E g.
[Hint: f = f χE1 + f χE2 where E1 = {x : f = g} and E2 = E \ E1 .]
R P∞ P∞ R
Theorem 4.12. Suppose fk : E → R̄+ . Then E k=1 fk = k=1 E fk .
Pk P∞
Proof. Let sk (x) =R i=1 fiR(x) and s(x) = k=1 fk (x) for every x and k. Then
sk ↑ s. Hence limk E sk = E s.
R {EkP
Corollary 4.13. Suppose }k ⊂ M
∞ R
are mutually disjoint. If f is integrable
on E = ∪∞ E
k=1 k , then E
f = k=1 Ek f .
32
RLemma 4.15 (Fatou’s Rlemma for integrals). Suppose fk : E → R̄+ . Then
E
lim inf k fk ≤ lim inf k E fk .
Proof. Define gk (x) = inf j≥k fj (x) for every x ∈ E and k. Then gk is non-
decreasing and nonnegative. Hence
Z Z Z Z
lim inf fk = lim gk = lim gk ≤ lim inf fk ,
E k→∞ E k→∞ k→∞ E k→∞ E
R1 R1
for every k. Then fk → 0 on [0, 1]. But 0
limk fk = 0 < 1 = limk 0
fk .
Theorem 4.17. Suppose f : E → R̄+ is finite a.e. E and µ(E) < ∞. If [0, ∞)
is partitioned such that 0 = y0 < y1 < · · · and yk+1 −yk < δP for all k, and define
∞
Ek = {x ∈ E : yk P ≤ f (x) < yk+1 }. RThen f is integrable iff k=1 yk µ(Ek ) < ∞.
∞
Moreover limδ→0 k=1 yk µ(Ek ) = E f .
R
Proof. For every k, there is yk µ(Ek ) ≤ Ek f ≤ yk+1 µ(Ek ) < δµ(Ek )+yk µ(Ek ) ≤
P∞
δµ(E) + yk µ(Ek ). By the squeeze theorem, f is integrable iff k=1 yk µ(Ek ) <
∞. Taking δ → 0 completes the proof.
33
Proof. Item 1 follows Theorem 4.8. Item 2 follows from f ± =R 0 a.e. E.
R Item 3
follows from 0 ≤ f ± ≤ g due to |f | ≤ g. Item 4 follows from f ± ≤ |f | < ∞
and fk± ↓ 0 where fk± := f ± χRn \B(0,k) .
R R
RTheorem 4.21 R (Linearity
R of integral). For any c ∈ R, E cf = c E f and
E
(f + g) = E f + E g.
R1
Example 4.22. Suppose f : [0, 1] → R is measurable. If 0 |f (x)| log(1 +
|f (x)|) < ∞, then f ∈ L([0, 1]).
Proof. Let E1 = {x : |f (x)| ≥ e − 1}, then |f (x)| ≤ |f (x)| log(1 + |f (x)|) for all
R1 R
x ∈ E1 . Let E2 = E \ E1 , then |f (x)| ≤ e − 1. Hence 0 |f | = E1 |f | log(1 +
R
|f |) + E2 (e − 1) < ∞.
R
Example 4.23 (Jensen’s inequality). Suppose w : E → R+ and E w = 1. If
ϕ R: [a, b] → RR is convex, f : E → [a, b] is measurable and f ∈ L(E). Then
ϕ( E f w) ≤ E ϕ(f )w.
R
Proof. Denote y0 = E f w. Then a ≤ y0 ≤ b. Since ϕ is convex, there exists
z ∈ R such that ϕ(y) ≥ ϕ(y0 ) + z · (y − y0 ) for all y. Hence by setting y = f (x),
multiplying w on both sides, and taking integral over E, we obtain
Z Z Z
ϕ(f (x))w(x) ≥ w(x) ϕ(y0 ) + z · f w − y0 = ϕ(y0 ),
E E E
34
R
Proof. Define g(t) = E∩(−∞,t] f (x). From the theorem above, we know that
for any ϵ > 0, there exists δ > 0, such that for any |∆t| < δ, there is
Z
|g(t + ∆t) − g(t)| ≤ |f (x)| < ϵ.
E∩[t,t+∆t)
µ
Hence µ(Ek (ϵ)) → 0 as k → ∞, i.e., fk → f .
Remarks. The converse is not true in general. See Example 4.16.
35
Theorem 4.32 (Bounded
R convergence
R theorem). If |fk | ≤ M , fk → f a.e. E,
and µ(E) < ∞, then E fk → E f .
R g ∈ L(E)
Proof. Set g(x) = M and note that R since µ(E) < ∞. Then by
Dominated Convergence Theorem, E fk → E f .
µ
Theorem 4.33 (Dominated convergence theorem for fk → f ). Suppose fk ∈
µ
L(RnR), fk → fR, and there exists g ∈ L(Rn ) such that |fk | ≤ g. Then f ∈ L(Rn )
and Rn fk → Rn f .
µ R
RProof. Since fk → fn, there exists a subsequence fkj → f a.e.R and E fRkj →
E
f . Hence f ∈ L(R ), and |f | ≤ g a.e. It remains to show that Rn fk → Rn f .
For any ϵ > 0, we Rfirst choose R > 0 large enough, and denote B = B(0, R)
for short, such that 2 Rn \B g < 3ϵ (by Theorem 4.20 Item 4). Then
Z Z
ϵ
|fk − f | ≤ 2 g< .
Rn \B Rn \B 3
R1 kx sin x
Example 4.35. For any α > 1, show that 0 1+(kx)α
dx → 0 as k → 0.
kx sin x kx 1
Proof. Denote fk (x) = 1+(kx)α . Then |fk (x)| ≤ 1+(kx)α ≤ (kx)α−1 → 0. By
R1 R1
DCT Theorem 4.28, limk 0 f = 0
limk fk = 0.
R∞ 2 2
k2 xe−k x
Example 4.36. For any a > 0, show that a 1+x2 → 0 as k → ∞.
36
Proof. Consider change of variable u = kx and then
∞ 2 2 ∞ 2 ∞ 2
k 2 xe−k x ue−u ue−u
Z Z Z
dx = du = χ[ka,∞) (u) du.
a 1 + x2 ka 1 + (u/k)2 0 1 + (u/k)2
−u2
ue
Denote fk (u) = χ[ka,∞) (u) 1+(u/k) 2 , then it is clear that fk (u) → 0 a.e. and
−u 2 R∞
|fk (u)| ≤ ue ∈ L(R). Hence by DCT there is limk 0 f = 0.
P∞ R P∞
Corollary 4.37. Let fk ∈ L(E). P∞ If k=1 E |fk | < ∞, then k=1 fk (x) con-
verges
R P∞E.RDefine f (x) = k=1 fk (x) for every x ∈ E. Then f ∈ L(E) and
a.e.
E
f = k=1 E fk .
Pk P∞
Proof. Let sRk (x) = Pi=1 |fRi (x)| and s(x) = k=1 |fk (x)| for every x ∈ E. Then
R ∞
E
s = limk E sk = k=1 E |fk | < ∞ by Theorem 4.12. Hence s ∈ L(E) and
Pk
s is finite a.e. E. Define
R gk (x) R = i=1 fi (x), thenR |gk (x)|
P∞≤ s(x).
R By Theorem
4.28 (DCT), there is E f = E limk gk = limk E gk = k=1 E fk .
Theorem 4.38 (Interchange derivative and integral). Suppose f : E × (a, b) →
R, f (·, y) ∈ L(E) for every y ∈ (a, b), and f (x, ·) is differentiable on (a, b) for
every x ∈ E. If there exists g ∈ L(E) such that |∂y f (x, y)| ≤ g(x) for any
(x, y) ∈ E × (a, b), then for any y ∈ (a, b), there is
Z Z
d
f (x, y) dµ(x) = ∂y f (x, y) dµ(x).
dy E E
Proof. For any y ∈ (a, b), consider any nonzero sequence ek → 0, and define
fk (x) = e1k (f (x, y + ek ) − f (x, y)) for every x ∈ E. By the mean value theorem
of derivatives, there exists ξk ∈ (y, y + ek ), such that |fk (x)| = |∂y f (x, ξk )| ≤
g(x) ∈ L(E). Note that limk fk (x) = ∂y f (x, y). Hence by Theorem 4.28 (DCT),
there is
Z Z Z Z
d
f (x, y) dµ(x) = lim fk = lim fk = ∂y f (x, y) dµ(x),
dy E k→∞ E E k→∞ E
37
R
Proof. For any ϵ > 0, there exists a compact set K such that E\K |f | < ϵ/4
R
and a simple function h̃ with supp(h̃) ⊂ K such that K |f − h̃| < ϵ/4. Define
h = h̃χK , then
Z Z Z Z
ϵ
|f − h| = |f − h̃χK | = |f − h̃| + |f | < .
E E K E\K 2
Let M > 0 be such that |h| ≤ M . By Theorem 3.36 (Lusin), for δ = ϵ/(4M ),
there exists (closed) F ⊂ K and a continuous function g where |g| ≤ M and
supp(g) ⊂ K, such that µ(K \ F ) < δ = ϵ/(4M ) and h|F = g|F . Then
Z Z Z
ϵ ϵ
|h − g| = |h − g| = |h − g| ≤ 2M µ(K \ F ) < 2M · = .
E K K\F 4M 2
R R R
Hence E
|f − g| ≤ E
|f − h| + E
|h − g| < ϵ.
Corollary 4.41. Suppose f ∈ L(E). Then there exists a sequence
R of continuous
functions {gk } with bounded support for every k, such that E |gk − f | → 0 and
gk → f a.e. E.
R
Proof. The first claim follows the theorem above. Since E |gk − f | → 0, we
µ
know gk → f and there exists a subsequence of {gk }, still denoted by {gk }, such
that gk → f a.e. E.
have Z Z Z
0 < cµ(E) ≤ f χE = lim f ϕk = lim f ϕk = 0,
k→∞ k→∞
where we applied Theorem 4.28 (DCT) to obtain the second equality. Contra-
diction.
Rb
Example 4.43. Suppose f ∈ L([a, b]). If a f ϕ′ = 0 for any differentiable
function ϕ with support supp(ϕ) ⊂ (a, b), then f ≡ c a.e. for some c.
Proof. For any continuous function g with supp(g) ⊂ (a, b) and continuous
Rb
function h with supp(h) ⊂ (a, b) and a h = 1, define
Z x Z b Z x
ϕ(x) = g(t) dt − g(t) dt · h(t) dt.
a a a
38
Rb Rb
Note that supp(ϕ) ⊂ (a, b). Then ϕ′ (x) = g(x) − a g(t) dt · h(x) and a f ϕ′ =
Rb Rb Rb Rb Rb
a
f g − a f h a g = a (f − a f h)g(x) dx. Since g is arbitrary, we have f (x) =
Rb Rb
a
f h a.e. for all continuous h and a h = 1.
Theorem 4.44. If f ∈ L(Rn ), then limh→0 Rn |f (x + h) − f (x)| dx = 0.
R
Then the Darboux upper and lower integrals are defined by the two limits below
as |∆(n) | → 0 and n → ∞:
Z b kn Z b kn
(n) (n) (n) (n) (n) (n)
X X
f= lim Mi (xi − xi−1 ), f= lim mi (xi − xi−1 )
a |∆(n) |→0 a |∆(n) |→0
i=1 i=1
39
For a sequence of partitions {∆(n) : n ∈ N} where ∆(n+1) is a refinement of
(n)
∆ for every n (i.e., ∆(n+1) retains all the partition points in ∆(n) and may
Pkn (n) (n)
add new points). Define wn (x) = i=1 (Mi − mi )χ[x(n) ,x(n) ) (x) ≥ 0, then
i i−1
wn+1 (x) ≤ wn (x) for all n and x. Suppose |f | ≤ M for some M > 0, then
|wn (x)| ≤ 2M (b − a).
We also define the oscillation of the function f at every point x ∈ I by
40
4.5 Iterated integrals
We denote F the set of all nonnegative measurable functions f : Rn = Rp ×Rq →
R+ that satisfy the following three properties:
1. For a.e. x ∈ RRp , f (x, ·) ≥ 0 is measurable on Rq .
2. Let Ff (x)R := Rq f (x,Ry) dy.R Then Ff ismeasurable
R and Ff ≥ 0 a.e. Rp .
3. There is Rp Ff dx = Rp Rq f (x, y) dy dx = Rn f dx dy.
Lemma 4.50. The following statements hold for the set F:
(i) If f ∈ F and a ≥ 0, then af ∈ F.
(ii) If f1 , f2 ∈ F, then f1 + f2 ∈ F.
(iii) If f, g ∈ F, f − g ≥ 0, and g ∈ L(Rn ), then f − g ∈ F.
(iv) If fk ∈ F, fk ≤ fk+1 for all k, and limk fk = f , then f ∈ F.
Proof. It is trivial to verify (i) and (ii). For (iii), since g ∈ F and g ∈ L(Rn ),
we know Fg is finite a.e. Rp . For every x ∈ Rp where Fg (x) < ∞, we know
g(x, ·) ∈ L(Rq ) and hence g(x, ·) is finite a.e. Rq . Hence g(x, y) is finite a.e. Rn .
Then it is easy to verify the three properties of f − g, which implies f − g ∈ F.
For (iv), it is easy to verify Property 1 of F for f . By Theorem 4.9 (Beppo-
Levi), we know
Z Z
Ff (x) = f (x, y) dy = lim fk (x, y) dy = lim Ffk (x)
Rq k→∞ Rq k→∞
where we used Theorem 4.9 (Beppo-Levi) to obtain the first equality, the Prop-
erty 3 of fk ∈ F for the second equality, and Beppo-Levi again for the last
equality. This verifies Property 3 of f . Hence f ∈ F.
Theorem 4.51 (Tonelli). If f : Rn = Rp ×Rq → R+ is measurable, then f ∈ F.
Proof. By Lemma 4.50(iv) and that every measurable function is a limit of a
sequence of simple functions, it suffices to prove the case where f is simple. Due
to Lemma 4.50(ii), it suffices to prove χE ∈ F where E is measurable. As E
can be written as a disjoint union of an Fσ -set K and measure zero set Z, we
prove the claim in the following steps.
Firstly, it is easy to verify that χE ∈ F if E is a (possibly) half-open half-
closed box (an open box plus some of its 2n facets) in Rn .
Secondly, for any open G ⊂ Rn , we can rewrite G as a disjoint union G =
∞
∪k=1 Ik where each Ik is a half-open half-closed box. Let Ek = ∪kj=1 Ij , then we
know that χEk ∈ F from Lemma 4.50(ii), and then χE ∈ F from χEk ↑ χG and
Lemma 4.50(iv).
Thirdly, we show that χF ∈ F if F ⊂ Rn is bounded and closed. To this end,
we first know F ⊂ G1 := B(0, k) for some k ∈ N. Hence G2 = G1 \ F = G1 ∩ F c
41
is open. Since χG1 − χG2 ≥ 0 and χG2 ∈ L(Rn ), we know by Lemma 4.50(iii)
that χF = χG1 − χG2 ∈ F.
Fourthly, we show that if Ek ↓ E and µ(E1 ) < ∞, χEk ∈ F for all k, then
χE ∈ F. To this end, let Dk = E1 \ Ek and D = E1 \ E, then 0 ≤ χDk ↑ χD
and χD ∈ L(Rn ) (since µ(E1 \ E) ≤ µ(E1 ) < ∞). Hence by Lemma 4.50(iv)
and Theorem 4.28 (DCT) we can see that χD ∈ F. Hence χE = χE1 − χD by
Lemma 4.50(iii).
Fifthly, we show that if µ(E) = 0 then χE ∈ F. To this end, consider a
sequence of non-increasing open sets Gk such that E ⊂ Gk and µ(Gk ) → 0.
Let H = ∩∞ k=1 Gk , then E ⊂ H and µ(H) = 0. From the second R and fourth
points above, we know χH ∈ F and hence χH (x, ·) = 0 and Rn χH = 0. As
0 ≤ χE ≤ χH , we know χE satisfies all three properties of F and hence χE ∈ F.
Sixthly, we show that if K is an Fσ -set and µ(K) < ∞, then χK ∈ F.
Suppose K = ∪∞ k=1 Fk where Fk is closed and bounded for all k. Let Dk =
Sk \ Sk−1 where Sk := ∪kj=0 Fk (assume F0 = ∅). Note that both Fk and Sk−1
are closed bounded sets, and hence by Lemma 4.50(iii) and the third point
above, we know that χDk = χSk − χSk−1 ∈ F. Therefore, by Lemma 4.50(ii),
χ∪kj=1 Dj = χSk ∈ F. Since Sk ↑ K, we know χK ∈ F by Lemma 4.50(iv).
Finally, let E = K ∪ Z where K is an Fσ -set and Z is a measure zero set,
and K ∩ Z = ∅. Then χE = χK + χZ ∈ F.
Theorem 4.52 (Fubini). If f ∈ L(Rn ), then the following statements hold:
1. For a.e. x ∈ RRp , f (x, ·) ∈ L(Rq ).
p
∈ L(R
2. Let Ff (x)R = Rq fR(x, y)R dy, then Ff (x) R ).R
3. There is Rn f = Rp Rq f (x, y) dy dx = Rq Rp f (x, y) dx dy.
Proof. Let f = f + − f − . Since f ∈ L(Rn ), we know f ± ∈ L(Rn ). From
Theorem 4.51 we know f ± ∈ F, which implies the claims as all integrals are
finite.
Example 4.53. Suppose f ∈ L([0, ∞)) and a > 0, then
Z ∞ Z ∞ Z ∞
f (y)
f (y)e−xy dy sin(ax) dx = a 2 + y2
dy.
0 0 0 a
R∞
Proof. Note that for any fixed y > 0, there is 0 e−xy sin(ax) dx = a2 +y a
2
(apply integration by parts twice). Hence we just need to show the condition in
Theorem 4.52 (Fubini) holds, i.e., sin(ax)f (y)e−xy ∈ L([0, ∞)2 ).
Note that | sin(ax)f (y)e−xy | ≤ |f (y)|
R ∞∈ L([δ, X] × (0, ∞)) for any 0 < δ <
X < ∞ (the integrand is bounded by 0 |f (y)| dy). Hence, by Theorem 4.52
(Fubini) on [δ, X] × (0, ∞), there is
Z X Z ∞ Z X Z ∞
f (y)e−xy dy sin(ax) dx = sin(ax)f (y)e−xy dy dx
δ 0 δ 0
42
For any fixed y > 0, e−xy is decreasing in x and hence the second mean value
theorem for integrals implies that there exists c ∈ (δ, X) such that
Z X Z c Z X
−xy −δy −Xy
e sin(ax) dx = e sin(ax) dx + e sin(ax) dx.
δ δ c
where the second equality is due to Theorem 4.52 (Fubini) on [δk , Xk ] × (0, ∞),
the third equality is due to the definition of Gk , and the last equality is due to
Theorem 4.28 applied to the sequence Gk (y).
R∞ 2
√
Example 4.54. Show that 0 e−x dx = 2π .
R∞R∞ 2 2 2
Proof. Consider 0 0 ye−y x e−y dx dy. Then by Theorem 4.51 (Tonelli),
we have
Z ∞Z ∞ Z ∞ Z ∞
−y 2 x2 −y 2 −y 2 2 2
ye e dx dy = e ye−y x dx dy
0 0 0 0
Z ∞ 2
Z ∞ 2
−y
= e dy · e−u du
0 0
Z ∞
2
2
= e−y dy
0
43
4.6 Convolution
Definition
R 4.55 (Convolution). Suppose f, g are measurable on E ⊂ Rn . We
call Rn f (x − y)g(y) dy, a function of x, the convolution of f and g, denoted by
f ∗ g or (f ∗ g)(x), if the integral exists for every x ∈ E. Note that f ∗ g = g ∗ f .
Proof. Suppose M > 0 is such that |g| ≤ M a.e. Since f ∈ L(RRn ), by Theorem
4.44, we know for any ϵ > 0, there exists δ > 0 such that Rn |f (x + h) −
f (x)| dx < ϵ/M for all |h| ≤ δ. Hence,
Z
|F (x + h) − F (x)| ≤ |f (x + h − y) − f (x − y)||g(y)| dy
Rn
Z
ϵ
≤M |f (z + h) − f (z)| dz < M · = ϵ,
R n M
44
5 Signed Measures and Differentiations
5.1 Signed measure and decomposition
Definition 5.1 (Signed measure). Suppose (X, M) is a measurable space. A
function ν : M → R̄ is called a signed measure if (i) ν(∅) = 0; (ii)
Pν∞ can take ∞
or −∞ but not both; (iii) countable additivity: ν(∪∞ k=1 Ek ) = k=1 ν(Ek ) for
any countable family of mutually disjoint sets {Ek }.
Note that if in addition ν(E) ≥ 0 for any E ∈ M, then ν is called a positive
measure, or simply measure.
∞
∞ X
ν(E) = ν(E ∩ ( ∪ Qk )) = ν(E ∩ Qk ) ≥ 0,
k=1
k=1
since E ∩ Qk ⊂ Pk .
Theorem 5.5 (Hahn Decomposition Theorem). If ν is a signed measure on
(X, M), then there exists positive P and negative N such that X = P ∪ N ,
P ∩ N = ∅. If P ′ and N ′ is another such pair, then P △P ′ and N △N ′ are null.
The pair (P, N ) is called a Hahn decomposition of ν.
Proof. (i) WLOG, assume that ν : M → R ∪ {−∞}. Consider the family of ν-
positive sets: P = {P ∈ M : P is ν-positive}. Define m = sup{ν(P ) : P ∈ P},
then there exists a sequence {Pk } ⊂ P such that limk ν(Pk ) = m < ∞. Define
P = ∪∞ k=1 Pk and N = X \ P . Then, by Lemma 5.4, P is ν-positive, and
ν(P ) = m.
45
(ii) Now we only need to show that N is ν-negative. First of all, if E ⊂ N
and ν(E) > 0, then E cannot be ν-positive: otherwise E ∩ P ⊂ N ∩ P = ∅ and
E ∪ P is ν-positive, and hence ν(E ∪ P ) = ν(E) + ν(P ) > m, contradiction.
Next, for any E ⊂ N with ν(E) > 0, there exists B ⊂ E such that ν(B) < 0
(since E is not ν-positive), then let A = E \ B, we have A ⊂ E and ν(A) =
ν(E) − ν(B) > ν(E).
Now we are ready to show that N is ν-negative. If not, then there exists A0 ⊂
N such that ν(A0 ) > 0. Then choose the smallest n1 ∈ N such that there exists
A1 ⊂ A0 that satisfies ν(A1 ) ≥ ν(A0 ) + n11 > ν(A0 ); then choose the smallest
n2 ∈ N such that there exists A2 ⊂ A1 and ν(A2 ) ≥ ν(A1 ) + n12 > ν(A1 ); and so
on. Note that Ak is decreasing, ν(Ak ) is increasing, and ν(Ak ) ≥ ν(Ak−1 ) + n1k
for allP k. Let A = ∩∞ k=1 Ak = limk Ak . Then ∞ > ν(A) = limk ν(Ak ) ≥
ν(A0 ) k n1k > 0. Therefore nk → ∞ as k → ∞. Since A ⊂ N and ν(A) > 0,
there again exists B ⊂ A and n ∈ N such that ν(B) ≥ ν(A) + n1 > ν(A).
Therefore, there exists k large enough, such that nk > n, but B ⊂ A ⊂ Ak−1 ,
and ν(B) ≥ ν(A) + n1 ≥ ν(Ak−1 ) + n1 , which contradicts to the construction of
nk (smallest integer) and Ak . Hence N must be ν-negative.
(iii) If P ′ and N ′ is another such pair, then P \ P ′ ⊂ P and P \ P ′ ⊂ N ′ .
Hence P \ P ′ is both ν-positive and ν-negative, and therefore is ν-null. Note
that N △N ′ = P △P ′ is therefore also ν-null.
Definition 5.6 (Mutually singular measures). Two signed measures µ and ν
are called mutually singular, denoted by µ ⊥ ν, if there exist E ∈ M such that
E is ν-null and E c is µ-null.
46
5.2 Radon-Nikodym theorem
Definition 5.10 (Absolute continuity). We say a signed measure ν is absolutely
continuous with respect to a positive measure µ, denoted by ν ≪ µ, if ν(E) = 0
for any E ∈ M with µ(E) = 0. Note that ν ≪ µ iff ν ± ≪ µ iff |ν| ≪ µ.
Theorem 5.11. If ν ⊥ µ and ν ≪ µ, then ν = 0.
Proof. Let E be such that E is µ-null and E c is ν-null. Then E is also ν-null
since ν ≪ µ. Hence X is ν-null, i.e., ν = 0.
Theorem 5.12. Suppose ν is a signed measure and µ is a measure, then ν ≪ µ
iff for any ϵ > 0, there exists δ > 0 such that |ν(E)| < ϵ for all E satisfying
µ(E) < δ.
Proof. Since ν ≪ µ iff |ν| ≪ µ, we only need to show this for positive measure ν.
Sufficiency is trivial. To prove necessity, assume that there exists ϵ0 > 0, such
that for any k ∈ N there are |ν(Ek )| > ϵ0 and µ(Ek ) < 2−k . Let Fk = ∪∞ i=k Ei
and F = ∩∞ k=1 F k . Then µ(F k ) ≤ 2 1−k
and µ(F ) = lim k µ(F k ) = 0. However
ν(Fk ) ≥ ν(Ek ) ≥ ϵ0 for all k, which implies ν(F ) = limk ν(Fk ) ≥ ϵ0 and
contradicts to ν ≪ µ.
Lemma 5.13. Suppose ν and µ are finite measures on M. Then either ν ⊥ µ
or there exist ϵ > 0 and E ∈ M with µ(E) > 0, such that E is (ν − ϵµ)-positive.
Proof. Consider signed measures ν − k −1 µ with a Hahn decomposition X =
Pk ∪ Nk , for any k ∈ N. Let P = ∪∞ ∞
k=1 Pk and N = ∩k=1 Nk . Note that N is
(ν − k −1 µ)-negative for all k, and hence 0 ≤ ν(N ) ≤ k −1 µ(N ) → 0. Hence N
is ν-null.
If P is µ-null, then ν ⊥ µ and done. Otherwise, µ(P ) > 0, and hence there
exists k such that µ(Pk ) > 0, and Pk is (ν − k −1 µ)-positive. Taking E = Pk
and ϵ = k −1 completes the proof.
Theorem 5.14 (Radon-Nikodym). Suppose ν is a σ-finite signed measure and
µ is a σ-finite measure. Then there exist unique σ-finite signed measures λ and
ρ, such that λ ⊥ µ, ρ ≪ µ, and ν = λ + ρ.
Proof. (i) We first consider the case where both µ and ν are finite positive
measures. Define
n Z o
F = f : X → [0, ∞] : f dµ ≤ ν(E), ∀E ∈ M
E
47
R that gk ∈RF and gRk ↑ f . By Theorem 4.9
and f (x) = supk fk (x). Note R (Beppo
Levi), we know
R m ≤ lim
Rk X kf ≤ limk X kg = X
f ≤ m and hence X
f = m.
Moreover, E f = limk E gk ≤ ν(E) for all E ∈ M, Rand hence f ∈ F.
Now we claim that λ, defined by λ(E) = ν(E) − E f dµ for any E ∈ M (we
write this as dλ = dν − f dµ for short), satisfies λ ⊥ µ. If not, then by Lemma
5.13, there exists ϵ > 0 and A, such that µ(A) > 0 and A is (λ − ϵµ)-positive.
Then for any E ∈ M, there is
Z Z Z
(f +ϵχA ) dµ = f dµ+ (f +ϵχA ) dµ ≤ ν(E∩Ac )+ν(E∩A) = ν(E),
E E∩Ac E∩A
48
dν
1. (Change of variable) If g is ν-integrable, then g · dµ is µ-integrable, and
dν
R R
g dν = g · dµ dµ.
dν dν dµ
2. (Chain rule) dλ = dµ · dλ λ-a.e.
5.3 Differentiation
We focus on the case where f : R → R in the remainder of this chapter.
Definition 5.17 (Vitali cover). The collection F of closed intervals is called a
Vitali cover of E if for any ϵ > 0 and any x ∈ E, there exists I ∈ F such that
µ(I) < ϵ and x ∈ I.
Example 5.18. Suppose E = [a, b]. Let {rk } = [a, b] ∩ Q and Ik,m = [rk −
1 1
m , rk + m ] for k, m ∈ N. Then F = {Ik,m k, m ∈ N} is a Vitali cover of E.
:
49
If this set selection procedure continues for P∞ infinitely many steps, then we
obtain a sequence of intervals {Ij }∞ j=1 . Since ∞
k=1 |Ik | = µ(∪k=1 Ik ) ≤ µ(G) <
P∞
∞, we know j=k+1 |Ij | → 0 as k → ∞.
P∞
Now let ϵ > 0 be arbitrary and fixed and k large enough so that j=k+1 |Ij | <
ϵ
5 . Denote S
:= E \ (∪kj=1 Ij ). Then we want to show µ∗ (S) < ϵ. To this end,
let x ∈ S be arbitrary, then x ∈ / ∪kj=1 Ij . Notice that ∪kj=1 Ij is a closed set,
we know there exists I ∈ F such that x ∈ I and I ∩ (∪kj=1 Ij ) = ∅. Moreover,
|I| ≤ δk < 2|Ik+1 | due to the criterion to select Ik+1 .
Furthermore, notice that |Ij | → 0 as j → ∞. In addition, I ∩(∪∞ j=k+1 Ij ) ̸= ∅
because otherwise we would have selected I over some Ij during the procedure
(the former has a fixed width while the width of the latter tends to 0 as j → 0).
Let k0 ≥ k + 1 be the smallest index such that I ∩ Ik0 ̸= ∅, then |I| ≤ δk0 −1 <
2|Ik0 |. Now for each j ≥ k + 1 we define Ik′ to be the closed interval with the
same center as Ik but 5 times larger radius, then x ∈ I ⊂ Ik′ 0 −1 . Since x ∈ S is
arbitrary, we know S ⊂ ∪∞ ′ ∗ ∞ ′ ∞
j=k+1 Ik , and µ (S) ≤ µ(∪j=k+1 Ij ) ≤ 5µ(∪j=k+1 Ij ) ≤
P∞
5 j=k+1 |Ij | < ϵ. This completes the proof.
Remarks. Vitali covering lemma can be extended to Rn . It is easy to show that
there exists a countable collection of sets {Ek } such that µ∗ (E \ (∪∞
k=1 Ek )) = 0.
50
Example 5.22. If f ∈ C([a, b]), then there exist x0 ∈ (a, b) and k ∈ R, such
that D− f (x0 ) ≥ k ≥ D+ f (x0 ) or D− f (x0 ) ≤ k ≤ D+ f (x0 ).
Proof. Let k = (f (b) − f (a))/(b − a). Consider g(x) = f (x) − kx. Then g ∈
C([a, b]). Note that g(a) = f (a) − ka = (bf (a) − af (b))(b − a) = g(b). Hence
there exists x0 ∈ C such that g(x) attains max or min at x0 ∈ (a, b). If x0 is a
maximizer, then D+ g(x0 ) = D+ f (x0 )−k ≤ 0 and D− g(x0 ) = D− f (x0 )−k ≥ 0,
which implies that D− f (x0 ) ≥ k ≥ D− f (x0 ). Similarly, if x0 is minimizer, then
D− f (x0 ) ≤ k ≤ D+ f (x0 ).
Theorem 5.23 (Lebesgue). Suppose f : [a, b] → R is non-decreasing, then f is
Rb
differentiable a.e. [a, b] and a f ′ (x) dx ≤ f (b) − f (a).
Proof. (i) Note that if D+ f (x) ≤ D− f (x) and D− f (x) ≤ D+ f (x), then all
four Dini derivatives are equal and f is differentiable at x. Hence, if f is not
differentiable at x, then either D+ f (x) > D− f (x) or D− f (x) > D+ f (x). Let
E1 = {x : D+ f (x) > D− f (x)} and E2 = {x : D− f (x) > D+ f (x)}. We then
need to show µ(E1 ∪ E2 ) = 0. To this end, it suffices to show that µ(E1 ) = 0, as
µ(E2 ) = 0 can be proved similarly. Let r, s ∈ Q and Er,s = {x : D+ f (x) > r >
s > D− f (x)}, then E1 = ∪r,s∈Q Er,s . Hence it suffices to show that µ(Er,s ) = 0
for all r, s ∈ Q.
Now we denote E = Er,s for short. For any ϵ > 0, consider an open set G
such that E ⊂ G and µ(G) < µ∗ (E) + ϵ (such G exists due to the definition of
outer measure), and define the collection of closed intervals:
Thus G is a Vitali cover of E (since x ∈ E implies that D− f (x) < s). Hence
there exist a finite number of disjoint intervals [xP1 − h1 , x1 ], . . . , [xp − hp , xp ],
p
such that µ∗ (E)−ϵ < µ(E ∩(∪pi=1 [xi −hi , xi ])) and i=1 hi ≤ µ(G) < µ∗ (E)+ϵ.
Since f (xi ) − f (xi − hi ) < shi , we have
p
X p
X
(f (xi ) − f (xi − hi )) < s hi < s(µ∗ (E) + ϵ).
i=1 i=1
Now define F = E∩(∪pi=1 (xi −hi , xi )). Consider the collection of closed intervals
51
(ii) Consider fk (x) = k(f (x + k1 ) − f (x)). Then fk → f ′ a.e. and
Z b Z b Z b Z b
′
1
f = lim fk ≤ lim inf fk = lim inf k f (x + ) − f (x)
a a k→∞ k→∞ a k→∞ a k
1 1
Z b+ k Z a+ k
= lim inf k f− f ≤ f (b) − f (a)
k→∞ b a
Proof. Since fk is non-decreasing, fk′ exists and fk′ ≥ 0 a.e. [a, b] for all k.
Pk P∞
Denote sk (x) = j=1 fk (x) and rk (x) = j=k+1 fj (x). Then both sk and rk
are nondecreasing and hence have derivatives a.e. [a, b]. Note that
∞
X ′ k
X
fk = (sk + rk )′ = s′k + rk′ = fk′ + rk .
k=1 j=1
Hence it suffices to show that rk′ → 0 a.e. as k → ∞. Note that rk′ = fk+1
′
+
′ ′
rk+1 ≥ rk+1 ≥ 0 a.e. Hence rk′ ↓ ϕ for some ϕ ≥ 0 a.e. [a, b]. Then
Z b Z b Z b
0≤ ϕ= lim rk′ ≤ lim inf rk′ ≤ lim inf (rk (b) − rk (a)) = 0.
a a k→∞ k→∞ a k→∞
52
5.4 Functions of bounded variation
Definition 5.26 (Functions of bounded variation). Suppose f : [a, b] → R and
there is a partition ∆ : a = x0 < x1 < · · · < xn = b of [a, b]. Then the variation
of f by partition ∆ is defined
n
X
V(f, [a, b], ∆) = |f (xi ) − f (xi−1 )|.
i=1
and f is called a function of bounded variation if TV(f, [a, b]) < ∞. The set
of functions of bounded variation is denoted by BV([a, b]). (We simply denote
TV(f ) if the interval [a, b] is clear from the context.)
Example 5.27. If f : [a, b] → R is monotone, then f ∈ BV([a, b]).
Proof. WLOG, assume f is non-decreasing. Then for any ∆ > 0, there is
n
X
V(f, [a, b], ∆) = |f (xi ) − f (xi−1 )| = f (b) − f (a) < ∞.
i=1
k
2 2 2 2 X 2
V(f, [0, 1], ∆k ) = + + + ··· + = 2 →∞
2k − 1 2k − 1 2k − 3 3 j=1
2j −1
as k → ∞. Hence TV(f ) = ∞.
Theorem 5.30. The following statements hold:
1. If f ∈ BV([a, b]) then f is uniformly bounded.
2. BV([a, b]) is a linear space.
53
3. TV(f, [a, b]) = TV(f, [a, c]) + TV(f, [c, b]) for any c ∈ [a, b].
4. If f ∈ BV([a, b]), then |f | ∈ BV([a, b]).
5. If f, g ∈ BV([a, b]), then max{f, g} ∈ BV([a, b]).
Proof. Items 1 2, and 4 are trivial to prove. For item 5, note that max{f, g} =
f +g |f −g|
2 + 2 and hence it follows from item 4.
For item 3, consider any partition ∆ of [a, b], then ∆′ = ∆ ∪ {c} is also a
partition of [a, b]. Moreover,
Hence TV(f, [a, b]) ≤ TV(f, [a, c]) + TV(f, [c, b]).
On the other hand, for any ϵ > 0, there exist partition ∆1 of [a, c] and ∆2
of [c, b], such that
ϵ ϵ
TV(f, [a, c]) − < V(f, [a, c], ∆1 ), TV(f, [c, b]) − < V(f, [c, b], ∆2 )
2 2
Note that ∆ = ∆1 ∪ ∆2 is a partition of [a, b]. Hence
TV(f, [a, c]) + TV(f, [c, b]) − ϵ < V(f, [a, c], ∆1 ) + V(f, [c, b], ∆2 )
= V(f, [a, b], ∆)
≤ TV(f, [a, b])
As ϵ is arbitrary, we know TV(f, [a, c]) + TV(f, [c, b]) ≤ TV(f, [a, b]).
For a partition ∆ : a = x0 < x1 < · · · < xn = b of the interval [a, b], we
can obtain a set of n + 1 points: (x0 , f (x0 )), . . . , (xn , f (xn )) in R2 . We connect
these n + 1 points using straight line segments, and sum the lengths of these
line segments to obtain the total length:
n
X 1/2
l∆ (f ) = (xi − xi−1 )2 + (f (xi ) − f (xi−1 ))2 .
i=1
The following theorem reveals the relation between TV(f ) and l(f ):
Theorem 5.31. Suppose f : [a, b] → R. Then TV(f ) < ∞ iff l(f ) < ∞.
Pn
Proof. For any partition ∆, there is l∆ (f ) ≤ i=1 |xi − xi−1 | + |f (xi ) − f (xi−1 )|
(because (u2 + v 2 )1/2 ≤ u + v for any u, v ≥ 0). Hence l∆ (f ) ≤ (b − a) +
P n
i=1 |f (xi )−f (xi−1 )|. Therefore V(f, [a, b], ∆) ≤ l∆(f ) ≤ (b−a)+V(f, [a, b], ∆).
As ∆ is arbitrary, we know TV(f, [a, b]) ≤ l(f ) ≤ (b − a) + TV(f, [a, b]), which
implies that TV(f ) < ∞ iff l(f ) < ∞.
54
There is a very elegant characterization of functions of bounded variation:
they can always be written as the differences of two non-decreasing functions,
as shown in the following theorem.
Theorem 5.32 (Jordan). Suppose f : [a, b] → R. Then f ∈ BV([a, b]) iff there
exist two non-decreasing functions g, h : [a, b] → R such that f = g − h.
Proof. First we show the necessity. Suppose f ∈ BV([a, b]). Denote Tf (x) =
TV(f, [a, x]) for any x ∈ [a, b], which is therefore well defined since TV(f ) < ∞.
Then define g(x) = 21 (Tf (x) + f (x)) and h(x) = 21 (Tf (x) − f (x). We can show
that both g and h are non-decreasing: for x < y, there is
1 1
g(y) − g(x) = (Tf (y) + f (y)) − (Tf (x) + f (x))
2 2
1 1
= TV(f, [x, y]) + (f (y) − f (x))
2 2
1 1
≥ V(f, [x, y], ∆) − |f (y) − f (x)| ≥ 0
2 2
where ∆ : x = x0 < x1 < · · · < xn = y is a partition of [x, y]. Similarly h is
non-decreasing, and obviously f = g − h.
Now we show the sufficiency. If g, h are non-decreasing, then g, h ∈ BV([a, b]).
Hence f = g − h ∈ BV([a, b]) as BV([a, b]) is a linear space.
From Theorem 5.23, we know both g and h are differentiable a.e. since they
are monotone. Hence f = g − h is differentiable. Therefore f is differentiable
a.e. if f ∈ BV([a, b]).
R x+h
Lemma 5.33. Suppose f ∈ L([a, b]). Define Fh (x) = h1 x f (t) dt (Assume
Rb
f (x) = f (a) if x < a and f (x) = f (b) if x > b). Then limh→0 a |Fh (x)−f (x)| =
0.
Proof. Since f ∈ L([a, b]), we know for any ϵ > 0, there exists δ > 0, such that
Rb
aR
|f (x + h) − f (x)| < ϵ for any h with |h| < δ. Note that Fh (x) − f (x) =
1 x+h
h x (f (t) − f (x)) dt. Therefore, for any t < h < δ, there is
Z b Z b Z x+h
1
|Fh (x) − f (x)| dx ≤ |f (t) − f (x)| dt dx
a a h x
Z b Z h
1
= |f (x + t) − f (x)| dt dx
a h 0
Z h Z b
1
= |f (x + t) − f (x)| dx dt
0 h a
Z h
1
≤ · ϵ dt = ϵ
0 h
55
Now we have the main theorem of this subsection.
Rx
Theorem 5.34. Let f ∈ L([a, b]). Define F (x) = a f (t) dt. Then F ′ (x) =
f (x) a.e. [a, b].
Proof. Note that F ′ (x) = limh→0 Fh (x) exists a.e. [a, b]. Hence
Z b Z b Z b
|f (x) − F ′ (x)| = lim |f (x) − Fh (x)| dx ≤ lim inf |f (x) − Fh (x)| dx = 0,
a a h→0 h→0 a
′
which implies that F (x) = f (x) a.e. [a, b].
1
Rh
Corollary 5.35. Suppose f ∈ L([a, b]). Then limh→0 h 0
|f (x+t)−f (x)| dt =
0 a.e. [a, b].
Proof. For any r ∈ Q, we know |f (x) − r| ∈ L([a, b]). Hence, for almost every
x ∈ [a, b], there is
1 h
Z
lim |f (x + h) − r| dt = |f (x) − r|
h→0 h 0
Rh
by Lemma 5.33. Denote Zr = {x : limh→0 h1 0 |f (x + t) − r| dt ̸= |f (x) − r|}.
Then µ(Zr ) = 0. Let Z = (∪r∈Q Zr ) ∪ {x : f (x) = ±∞}, there is also µ(Z) = 0.
Rh
For any x ∈ / Z (i.e., limh→0 h1 0 |f (x+h)−r| dt = |f (x)−r| for all r ∈ Q and
|f (x)| < ∞) and ϵ > 0, there exists r ∈ Q and δ > 0, such that |f (x) − r| < 3ϵ
Rh
and | h1 0 |f (x + t) − r| dt − |f (x) − r|| < 3ϵ for all h with |h| < δ. Hence
1 h 1 h
Z Z
|f (x + t) − f (x)| dt ≤ |f (x + t) − r| dt − |f (x) − r| + 2|f (x) − r|
h 0 h 0
1 Z h
≤ |f (x + t) − r| dt − |f (x) − r| + 2|f (x) − r|
h 0
ϵ ϵ
< + 2 · = ϵ.
3 3
Rh
Therefore limh→0 h1 0 |f (x + t) − f (x)| dt = 0 on Z c .
Rh
Remarks. We call x a Lebesgue point if x satisfies limh→0 h1 0 |f (x + t) −
f (x)| dt = 0. The corollary above says that f has Lebesgue points a.e. [a, b] if
f ∈ L([a, b]). Note that the corollary can also be proved by invoking Lemma
Rh
4.15 (Fatou) on G(h, x) := h1 0 |f (x + t) − f (x)| dt.
Rb
Example 5.36. Suppose f ∈ L(R). For [a, b], if limh→0 h1 a |f (x + h) −
f (x)| dx = 0, then there exists constant c > 0 such that f (x) = c a.e. [a, b].
Proof. Consider any two Lebesgue points x1 , x2 on [a, b] where x1 < x2 . Then
1 Z x2 1 Z x2
(f (x + h) − f (x)) dx ≤ |f (x + h) − f (x)| dx
h x1 h x1
1 b
Z
≤ |f (x + h) − f (x)| dx → 0
h a
56
as h → 0. On the other hand,
1 Z x2 1 Z x2 +h 1 x2
Z
f (x + h) − f (x) dt = f (x) dx − f (x) dx
h x1 h x1 +h h x1
1 Z x2 +h Z x1 +h
= f (x) dx − f (x) dx
h x2 x1
→ |f (x2 ) − f (x1 )|
Rb Rb
Therefore V(F, [a, b], ∆) ≤ a
|f (x)| dx. Hence, TV(F ) ≤ a
|f | dx.
Proof. Suppose c ∈ (a, b) such that f (c) ̸= f (a). Then choose ϵ0 ∈ (0, |f (c)−f
2
(a)|
)
ϵ0
and r ∈ (0, b−a ). Define the set Ec = {x ∈ (a, c) : f ′ (x) = 0} and the collection
of closed intervals
Hence F is a Vitali cover of Ec . Then for any δ > 0 there exist mutually disjoint
intervals [x1 , x1 + h1 ], . . . , [xp , xp + hp ], such that µ(Ec \ ∪pi=1 [xi , xi + hi ]) < δ.
WLOG, assume a = x0 < x1 < x1 + h1 < · · · < xp < xp + hp < xp+1 = c. Note
57
xi + hi (h0 = 0, u0 − v0 = x1 − x0 ), we have
that by letting ui = xi+1 and vi =P
p
up = xp+1 and vp = xp + hp , and i=1 |ui − vi | < δ.
On the other hand, there is
p
X p
X
2ϵ0 < |f (c) − f (a)| ≤ |f (ui ) − f (vi )| + |f (xi + hi ) − f (xi )|
i=0 i=1
p
X p
X p
X
≤ |f (ui ) − f (vi )| + r hi ≤ |f (ui ) − f (vi )| + r(b − a).
i=0 i=1 i=0
Pp
Note that r(b − a) < ϵ0 , we know i=0 |f (ui ) − f (vi )| ≥ ϵ0 .
Definition 5.40. f : [a, b] → R is absolutely continuous if for any ϵ > 0, there
exists δ > P
0, such that for any mutuallyP disjoint intervals (xi , yi ), i = 1, . . . , p,
p p
satisfying |y
i=0 i − x i | < δ, there is i=1 |f (yi ) − f (xi )| < ϵ. The set of
absolutely continuous functions is denoted by AC([a, b]).
Theorem 5.41. The following statements hold:
1. If f ∈ AC([a, b]) then f is continuous.
2. AC([a, b]) is a linear space.
Example 5.42. If f is Lipschitz continuous then f ∈ AC([a, b]).
Pp Pp
Proof. i=1 |f (yi ) − f (xi )| ≤ M i=1 |yi − xi | ≤ M δ.
Rx
Theorem 5.43. Suppose f ∈ L([a, b]) then F (x) = a f (t) dt ∈ AC([a, b]).
Proof.
R Since f ∈ L([a, b]), we know for any ϵ > 0, there exists δ > 0, such that
E
|f | < ϵ for any E ⊂ [a, b]Psatisfying µ(E) < δ. For any disjoint intervals
p
{(x
Ppi i, y ) : i = 1, . . . , p}, if i=1 |yi − xi | < δ, then µ(E) < δ where E =
[x
i=1 i i , y ]. This implies
p
X p Z
X yi Z
|F (yi ) − F (xi )| ≤ |f (x)| dx = |f (x)| dx < ϵ,
i=1 i=1 xi E
58
Proof. If f ∈ AC([a, b]),
R x then f ′ exists a.e. [a, b] and f ′ ∈ L([a, b]) by Corollary
′
5.45. Define g(x) = a f (t) dt then g ∈ AC([a, b]) by Theorem 5.43. Since
f − g ∈ AC, f ′ − g ′ = 0 a.e., we know f − g ≡ c for some constant c (otherwise
f − g is not absolutely continuous due to Lemma 5.39, a contradiction). Hence
R x− g(a) = f (a) as g(a) = 0, which implies that f (x) = f (a) + g(x) =
c = f (a)
f (a) + a f ′ (t) dt.
Remarks. The results above can be summarizedR as follows: f ∈ AC([a, b]) iff
x
there exists g ∈ L([a, b]) such that f (x) = f (a) + a g(t) dt for all x ∈ [a, b]. In
′
this case, f = g a.e. [a, b].
Example 5.47. Suppose gk ∈ AC([a, b]) for all k. If there exists c ∈ [a, b] such
P Rb
converges and k a |gk′ (x)| dx < ∞, then P k gk (x) exists for all
P P
that k gk (c) P
x. Let g(x) = k gk (x), then g ∈ AC([a, b]) and g ′ (x) = k gk′ (x) a.e. [a, b].
P∞ R b
Proof. Since k=1 a |gk′ (x)| dx < ∞, we know by Corollary 4.37 that h(x) =
P∞ ′ P∞ R x ′ Rx
k=1 gk (x) exists, h ∈ L([a, b]), andR k=1 c gk (t) dt = c h(t) dt. Since gk ∈
x
AC([a, b]), we know gk (x) = gk (c) + c gk′ (t) dt for all x. This implies that
n
X n
X n Z
X x ∞
X Z x
gk (x) = gk (c) + gk′ (t) dt → gk (c) + h(t) dt
k=1 k=1 k=1 c k=1 c
P∞ P∞ Rx
as n → ∞ for all x. Therefore g(x) = k=1 gk (x)P = k=1 gk (c) + c h(t) dt
∞
exists and g ∈ AC([a, b]). Moreover g ′ (x) = h(x) = k=1 gk′ .
Example 5.48. Composition of absolutely continuous functions is not neces-
sarily absolutely continuous. For example, let f (y) = y 1/3 for y ∈ [−1, 1], and
(
x3 cos3 ( πx ), if x ∈ (0, 1],
g(x) =
0, if x = 0.
Then both f and g are absolutely continuous (they are Lipschitz continuous as
|f ′ | and |g ′ | are bounded), but (f ◦ g)(x) = x cos( πx ) is not.
59
6 Lp Spaces
6.1 Important inequalities
Definition 6.1. Let E ∈ M. If p ∈ (0, ∞], then the Lp norm of f on E is
defined by
Z 1/p
∥f ∥p = |f |p .
E
∥f ∥∞ = inf{M ∈ R : |f | ≤ M a.e. E}
60
p
Note that q = p−1 . If p = 2, then q = 2. If p = 1, then q = ∞.
1 p
)+ q1 log(v q ) 1 log(up ) 1 log(vq ) up vq
uv = e p log(u ≤ e + e = + ,
p q p q
which completes the proof.
Theorem 6.6 (Hölder’s inequality). For any p ∈ [1, ∞] and q be its conjugate.
If f ∈ Lp (E) and g ∈ Lq (E), then ∥f g∥1 ≤ ∥f ∥p ∥g∥q .
Proof. It is trivial if p or q is ∞. Now suppose p, q ∈ (1, ∞) and ∥f ∥p , ∥g∥q ̸= 0.
Then
|f | |g| 1 |f |p 1 |g|q 1 1
Z Z
· ≤ p + = + =1
E ∥f ∥p ∥g∥q E p ∥f ∥p q ∥g∥qq p q
where we used Hölder’s inequality above. Multiplying the constant ∥f ∥p ∥g∥q
on both sides yields the inequality.
Corollary 6.7 (Schwarz inequality). If f, g ∈ L2 (E), then ∥f g∥1 ≤ ∥f ∥2 ∥g∥2 .
Theorem 6.8. If µ(E) < ∞ and 0 < p1 < p2 ≤ ∞, then Lp2 (E) ⊂ Lp1 (E) and
1 1
∥f ∥p1 ≤ (µ(E)) p1 − p2 ∥f ∥p2
Proof. The proof is trivial if p2 = ∞. Now suppose 0 < p1 < p2 < ∞. Then
Z 1/p1 Z 1/r Z 1/s 1/p1
∥f ∥p1 = |f |p1 ≤ |f |p1 r 1s ,
p2
where r, s > 1 are conjugate. By choosing r = p1 > 1 and its conjugate
r
s = r−1 = p2p−p
2
1
, we obtain the claimed inequality.
Example 6.9. Suppose f ∈ Lr ∩ Ls where 0 < r < p < s < ∞. Let λ ∈ (0, 1)
such that p1 = λr + 1−λ λ 1−λ
s . Then ∥f ∥p ≤ ∥f ∥r ∥f ∥s .
r s
Proof. Note that λp and (1−λ)p are conjugate. Hence
Z Z
∥f ∥pp = |f |p = |f |λp |f |(1−λ)p
Z r
λp
r
Z
s
(1−λ)p
s
λp· λp (1−λ)p· (1−λ)p
≤ |f | |f |
= ∥f ∥λp (1−λ)p
r ∥f ∥s
61
Example 6.10. Let 0 < r < p < s < ∞ and f ∈ Lp (E). Then for any t > 0,
there exist g, h such that f = g + h, and ∥g∥rr ≤ tr−p ∥f ∥pp and ∥h∥ss ≤ ts−p ∥f ∥pp .
Proof. For any x, define g(x) = f (x) if f (x) > t and g(x) = 0 otherwise. Let
h = f − g. Then, by r − p < 0, there is
Z Z Z
r r r−p p r−p
∥g∥r = |g| = |g| |g| ≤ t |f |p = tr−p ∥f ∥pp .
E {f >t} E
g∥p−1
p ∥g∥p . Therefore
Z
∥f + g∥p = |f + g|p ≤ ∥f + g∥pp−1 (∥f ∥p + ∥g∥p ).
p
62
6.2 Lp space
We identify two functions f, g ∈ Lp (E) if f = g a.e. E. Suppose we define
d : Lp (E) × Lp (E) → R by d(f, g) = ∥f − g∥p for any f, g ∈ Lp (E). Then it is
easy to verify that d is a metric: (i) d(f, g) ≥ 0, and d(f, g) = 0 iff f = g a.e. E;
(ii) d(f, g) = d(g, f ); (iii) d(f, g) ≤ d(f, h) + d(h, g) for all f, g, h ∈ Lp (E) by
using Theorem 6.12 (Minkowski).
Definition 6.13. Let p ∈ [1, ∞] and d(f, g) = ∥f − g∥p for any f, g ∈ Lp (E).
Then (Lp (E), d) is a metric space.
Theorem 6.14. If ∥fk − f ∥p → 0, then ∥fk ∥p → ∥f ∥p .
63
R g : E → R which is continuous and has compact support,
1. There exists
such that E |f − g|p < ϵ.
2. There exists a simple function ϕ : E → R which is of form ϕ(x) =
Pk
i=1 ci χAi where every Ai is a finiteR union of open boxes on regular grids
and has compact support, such that E |f − ϕ|p < ϵ.
Proof. Proof of Item 1 is similar to that of Theorem 4.40. For Item 2, note that
the tolerance ϵ allows approximating f by such type of simple function ϕ.
Theorem 6.18. Suppose p ∈ [1, ∞). Then Lp space is separable.
Proof. (i) Suppose E = Rn . Then for any f ∈ Lp (E) and ϵ > 0, there exists a
Pk
simple function ϕ = i=1 ci χAi such that ∥f − ϕ∥p < ϵ/2. Hence there exists
M > 0, such that |ci | ≤ M and µ(Ai ) < M p for all i ≤ k. Note that there exists
Pk
ri ∈ Q such that |ci − ri | < ϵ/(2kM ) for every i ≤ k. Let ψ = i=1 ri χAi , then
Xk k
X
k
X
∥ϕ − ψ∥p =
ci χAi − ri χAi
≤ |ci − ri |∥χAi ∥p
i=1 i=1 p i=1
k
X ϵ ϵ
= |ci − ri |µ(Ai )1/p ≤ k · ·M = .
i=1
2kM 2
Z Z Z
p p
|f − ψ| = |g − ψ| ≤ |g − ψ|p < ϵ
E E Rn
64
Hence we have
∥f + ft ∥p − 21/p ∥f ∥p ≤ ∥f + ft ∥p − 21/p ∥g∥p + 21/p ∥g∥p − ∥f ∥p
= ∥f + ft ∥p − ∥g + gt ∥p + 21/p ∥g∥p − ∥f ∥p
ϵ ϵ ϵ
≤ ∥h + ht ∥p + 21/p ∥h∥p < + + 21/p · < ϵ,
4 4 4
which completes the proof.
1
R∞ R∞
Proof. Recall that for any α, β > 0, there is 0 α+βx 2 dx =
√1 1
dy =
αβ 0 1+y 2
q
√1 π using the change of variable y = β
αβ 2 α x. Therefore
Z ∞ 2 Z ∞
1 p 2
f dx = p · α + βx2 f dx
0 α + βx2 0
ZZ ∞∞
1
≤ dx · (α + βx2 )f 2 dx
0 α + βx2 0
Z ∞ Z ∞
π 1 2
= √ α f dx + β x2 f 2 dx
2 αβ 0 0
R∞ 2 2 R∞ 2
Letting α = 0 x f dx and β = 0 f dx yields the claimed inequality.
Theorem 6.23. If ∥fk − f ∥2 → 0, then ⟨fk , g⟩ → ⟨f, g⟩ for all g ∈ L2 .
Proof. Note |⟨fk , g⟩ − ⟨f, g⟩| = |⟨fk − f, g⟩| ≤ ∥fk − f ∥2 ∥g∥2 → 0.
65
Definition 6.24. f, g ∈ L2 is called orthogonal if ⟨f, g⟩ = 0. {ϕα : α ∈ A} is
called an orthogonal set if ⟨ϕα , ϕβ ⟩ = 0 for all distinct α, β ∈ A. If in addition
∥ϕα ∥2 = 1, then {ϕα } is called an orthonormal set.
Example 6.25. { √12π , √1π cos(kx), √1π sin(kx) : k ∈ N} is an orthonormal set
of L2 ([−π, π]).
Theorem 6.26. An orthonormal set of L2 (E) is at most countable.
Proof. Suppose {ϕα ∈ L2 (E) : α ∈ A} is an orthonormal set. Then for any
distinct α, β ∈ A, there is
Since L2 (E) is separable, there exists a countable dense set Γ ⊂√L2 (E) such
that for any α ∈ A, there exists xα ∈ Γ satisfying ∥xα − ϕα ∥2 < 2/2. Hence
|A| ≤ |Γ| = ℵ0 .
∥f ∥22 .
66
Pk Pk
Proof. For any fk = i=1 ci ϕi , 0 ≤ ∥f − fk ∥2 = ∥f ∥22 − i=1 c2i . Hence
Pk 2 2
P∞ 2 2
i=1 ci ≤ ∥f ∥2 for all k, which implies that k=1 ck ≤ ∥f ∥2 .
where we used the fact that ⟨gk , hk ⟩ = 0 from Lemma 6.31 to show that ⟨sk −
gk , hk ⟩ = 0 and obtained the second equality. Hence ai = ci for all i.
Definition 6.33 (Complete orthonormal basis). We call {ϕk } a complete or-
thonormal basis if {ϕk } is an orthonormal set, and ⟨f, ϕk ⟩ = 0 for all k implies
f = 0 a.e.
Theorem 6.34. Suppose {ϕk } is a complete orthonormal basis in L2 . Let
Pk
f ∈ L2 , ck = ⟨f, ϕk ⟩ for all k, then limk ∥ i=1 ci ϕi − f ∥2 = 0.
P∞
Proof. By Theorem 6.30 (Bessel’s inequality), we know k=1 c2k ≤ P ∥f ∥22 < ∞.
2 ∞
By Theorem 6.32 (Riesz-Fischer), there exists g ∈ L such that g = k=1 ck ϕk ,
Pk
and ∥ i=1 ci ϕi − g∥2 → 0. Note that ⟨f − g, ϕk ⟩ = ⟨f, ϕk ⟩ − ⟨g, ϕk ⟩ = 0 for all
Pk
k, we know f = g a.e. Hence ∥ i=1 ci ϕi − f ∥2 → 0.
Definition 6.35 (Linear independency). {ϕi : 1 ≤ i ≤ k} is called linearly
Pk
independent if i=1 ci ϕi = 0 implies ci = 0 for all i ≤ k. {ϕk : k ∈ N} is called
linearly independent if any finite subset is linearly independent. It is obvious
that 0 cannot be in a linearly independent set.
Example 6.36. If {ϕk } is an orthonormal set in L2 , then it is linearly inde-
pendent.
Pk
Proof. Suppose i=1 ci ϕi = 0, then multiplying both sides by ϕi yields ci ∥ϕi ∥22 =
0, which implies that ci = 0, for every i.
67
Example 6.37 (Gram-Schmidt). If {ψk : k ∈ N} is a linearly independent set,
then we can construct an orthonormal set {ϕk : k ∈ N}: define ϕ1 = ψ1 /∥ψ1 ∥2 ;
suppose we already have constructed ϕ1 , . . . , ϕk−1 , then define ϕk = (ψk −
Pk−1 Pk−1
i=1 ⟨ψk , ϕi ⟩ϕi )/∥ψk − i=1 ⟨ψk , ϕi ⟩ϕi ∥2 . It is easy to verify that {ϕk } is an
orthonormal set.
Theorem 6.38. Suppose {ϕi : i ∈ N} is an orthonormal set in L2 . If for any
f ∈ L2 and ϵ > 0, there exists a finite subset {ϕij : 1 ≤ j ≤ k} such that
Pk
∥f − j=1 cj ϕij ∥2 < ϵ, then {ϕi } is complete.
Proof. Assume {ϕi } is not complete. Then there exists nonzero f ∈ L2 such that
⟨f, ϕi ⟩ = 0 for all i. On the one hand, there exist a finite subset {ϕij : 1 ≤ j ≤ k}
Pk
such that ∥f − j=1 cj ϕij ∥2 < ∥f ∥2 /2. Moreover,
k k
X X ∥f ∥22
|⟨f, f − cj ϕij ⟩| ≤ ∥f ∥2 · ∥f − cj ϕij ∥2 ≤ .
j=1 j=1
2
which is a contradiction.
Z Z
1
|g|q = |f |p = 1,
E ∥f ∥pp E
|f |p ∥f ∥pp
Z Z
fg = = = ∥f ∥p ,
E E ∥f ∥pp−1 ∥f ∥pp−1
which prove the claim.
Remarks. Note that Hölder’s inequality implies that ∥f g∥1 ≤ ∥f ∥p ∥g∥q , and
hence ∥f ∥p ≥ supg∈Lq ∥f g∥1
∥g∥q = sup∥g∥q =1 ∥f g∥1 . The theorem above implies
that the supremum can be replaced by maximum, and shows the maximizer for
p ∈ [1, ∞).
Theorem 6.40. Suppose f ∈ L∞ (E), then ∥f ∥∞ = sup∥g∥1 =1 | E f g|.
R
68
R R
Proof. Note that, if ∥g∥1 R= 1, then | E f g| ≤ E |f g| ≤ ∥f ∥∞ ∥g∥1 = ∥f ∥∞ .
Hence ∥f ∥∞ ≥ sup∥g∥1 =1 | E f g|.
On the other hand, let M = ∥f ∥∞ . Then for any ϵ > 0, there exists A ⊂ E,
such that µ(A) = a > 0 and |f | > M − ϵ on A. Define g = sign(f )χA /a, then
Z 1Z 1
f g = |f | ≥ (M − ϵ) · a · = M − ϵ,
E a A a
R
where ∥g∥1 = 1. Hence M − ϵ ≤ sup∥g∥1 =1 | E f g| ≤ M . As ϵ is arbitrary, we
R
know ∥f ∥∞ = sup∥g∥1 =1 | E f g|.
Example 6.41. We cannot replace the supremum by maximum in the theorem
above: consider E = [0, 1] and f (x) = x, then ∥f ∥∞ = 1 but for any g ∈ L1 and
R1 R1 R1
∥g∥1 = 1, there is | 0 f g dx| ≤ 0 x|g(x)| dx < 1 (otherwise 0 (1−x)|g(x)| dx =
0 implies g = 0 a.e., a contradiction).
Definition 6.42 (Dual space of Lp ). We call Lq the dual space of Lp if q is the
conjugate of p.
Theorem 6.43. Suppose g : E → R is a measurable function. Let p ∈ [1, ∞]
and q be itsRconjugate. If there exists M > 0 such that for any simple function
ϕ there is | E gϕ| ≤ M ∥ϕ∥p , then g ∈ Lq and ∥g∥q ≤ M .
Proof. (i) First consider p ∈ (1, ∞). Let {ψk } be a sequence of simple functions
1
such that ψk ↑ |g| p−1 and ϕk = sign(g)ψk . Then
Z Z Z Z
p
gϕk → |g| p−1 = |g|q and gϕk ≤ M ∥ϕk ∥p = M ∥g∥q/pq
E E E E
69
Theorem 6.44 (Generalized Minkowski’s inequality). Suppose p ∈ [1, ∞) and
n n
fR : R R × R → pR. If for almost every y ∈ Rn , f (x, y) ∈ Lp (Rn ), and M =
1/p
(
Rn Rn
|f (x, y)| dx) dy < ∞, then
Z Z p 1/p Z Z 1/p
f (x, y) dy dx ≤M = |f (x, y)|p dx dy.
Rn Rn Rn Rn
R
Proof. Proof is trivial for p = 1. For p ∈ (1, ∞), let F (x) = Rn f (x, y) dy.
Then for any simple function ϕ, there is
Z Z Z Z
F (x)ϕ(x) dx ≤ |F (x)||ϕ(x)| dx ≤ |f (x, y)| dy |ϕ(x)| dx
Z Z
= |f (x, y)||ϕ(x)| dx dy
Z Z 1/p
≤ |f (x, y)|p dx dy∥ϕ∥q = M ∥ϕ∥q ,
where we applied Theorem 4.51 (Tonelli) to obtain the first equality and Hölder’s
inequality to obtain the last inequality. Hence F (x) ∈ Lp and ∥F ∥p ≤ M .
Applying Theorem 6.43 yields the claimed inequality.
Example 6.45 (Reduction to Minkowski’s inequality). Suppose f, g ∈ Lp (R).
Define the function h : R × [0, 2] → R by
(
f (x) if 0 ≤ y ≤ 1,
h(x, y) =
g(x) if 1 < y ≤ 2.
R2
Then 0
h(x, y) dy = f (x) + g(x) and
(
Z 1/p ∥f ∥p if 0 ≤ y ≤ 1,
p
|h(x, y)| dx =
∥g∥p if 1 < y ≤ 2.
70
7 Probability Theory
7.1 Basic concepts
Example 7.1 (Terms in measure theory vs probability theory). See Table 1.
71
Definition 7.5 (Distribution). Suppose X is a random variable on (Ω, F, P).
Let PX be the image measure of X on R, called the distribution of X. The
function F (t) = PX ((−∞, t)) = P(X < t) is called the distribution function of
X. A family of random variables {Xα : α ∈ A} is called identically distributed
if their image measures {PXα : α ∈ A} are identical.
Definition 7.6 (Joint distribution). Suppose {Xk : 1 ≤ k ≤ n} are random
variables on (Ω, F, P). Then (X1 , . . . , Xn ) : Ω → Rn , and the image measure
PX1 ,...,Xn is called the joint distribution of X1 , . . . , Xn .
Remarks. The behaviors of random variables are completely determined by
their (joint) distributions. Therefore, we often use
Z Z Z
E(X) = X(ω) dP(ω) = t dPX (t), V(X) = (t − E(X))2 dPX (t).
Ω R R
We also use Z
E(X + Y ) = (t + s) dPX,Y (t, s).
R2
Hence X1 , . . . , XQ
n are independent iff the two quantities above are identical,
n
i.e., PX1 ,...,Xn = k=1 PXk .
72
Theorem 7.8. Suppose X1 , . . . , Xn are independent random variables, and fk :
R → R are measurable, then f1 (X1 ), . . . , fn (Xn ) are also independent.
Proof. Let Yk = fk (Xk ). For any Bk ∈ F, there is
Theorem
Qn 7.9. Suppose k ≤ n} are independent and Xk ∈ L1 , then
Qn{Xk : 1 ≤Q
1 n
k=1 Xk ∈ L and E( k=1 Xk ) = k=1 E(Xk ).
V(X)
P(|X − E(X)| ≥ ϵ) < .
ϵ2
73
Proof. Note that
2
t − E(X)
Z Z
V(X)
P(|X − E(X)| ≥ ϵ) = dP ≤ dPX (t) = ,
|X−E(X)|≥ϵ R ϵ ϵ2
For Item 2, we know {Ekc } are independent since {Ek } are so. Hence
n n n
n Y Y Y Pn
P( ∪ Ejc ) = P(Ejc ) = (1 − P(Ej )) ≤ e−P(Ej ) = e− j=k P(Ej )
→0
j=k
j=k j=k j=k
P∞
as n → ∞. Hence P(lim inf k Ekc ) = P(∪∞ ∞ c
k=1 ∩j=k Ej ) ≤
∞ c
k=1 P(∩j=k Ej ) = 0,
c c
which implies that P(lim supk Ek ) = P((lim inf k Ek ) ) = 1.
Theorem 7.14 (Kolmogorov’s inequality). Suppose {Xk : 1 ≤ k ≤ n} are
independent random variables with mean 0 and variances σk2 for all k. Let
Pk
Sk = j=1 Xj for k = 1, . . . , n. Then for any ϵ > 0, there is
n
X
P max |Sk | ≥ ϵ ≤ ϵ−2 σk2 .
1≤k≤n
k=1
Proof. Note that E(Xk ) = 0 and V(Xk ) = σk2 , hence E(Sk ) = 0 and V(Sk ) =
Pk
E(Sk2 ) = j=1 σj2 . Moreover, Sk and Sn − Sk are independent. Now let Ak =
74
{|Sk | ≥ ϵ} ∩ {|Sj | < ϵ : 1 ≤ j < k}, then Ak ∩ Aj = ∅ whenever k ̸= j. Thus
{max1≤k≤n |Sk | ≥ ϵ} = ∪nk=1 Ak is a disjoint union. Therefore
n n
n X 1 X
P max |Sk | ≥ ϵ = P ∪ Ak = P(Ak ) ≤ 2 E(χAk Sk2 ),
1≤k≤n k=1 ϵ
k=1 k=1
R R |Sk |2 1 2
where we used P (Ak ) = Ak dP ≤ Ak ϵ2 dP = ϵ2 E(χAk Sk ) to obtain the
inequality. On the other hand, we have
n
h X i n
hX i
E(Sn2 ) ≥ E χAk Sn2 = E χAk Sn2 + 2Sk (Sn − Sk ) + (Sn − Sk )2
k=1 k=1
n
X n
X n
X
≥ E(χAk Sk2 ) + 2 E(χAk Sk (Sn − Sk )) = E(χAk Sk2 ),
k=1 k=1 k=1
75
Remarks. The name of moment generating function is due to the fact that
(k)
E[X k ] = MX (0), the kth derivative of MX at t = 0 for all k = 0, 1, . . . .
P∞ E[X k ] k
This also implies that MX (t) = k=0 k! t . Moment generating function
MX is essentially the Laplace transform of the distribution function F . Thus,
two random variables are identical iff their moment generating functions are
identical.
Remarks. It is straightforward to verify that MaX+b (t) = ebt MX (at) for any
a, b ∈ R and MX+Y (t) = MX (t)MY (t) for any independent random variables
X and Y .
That is, limn P(Yn ≤ a) = P(Z ≤ a) where Z ∼ N (0, 1) is the standard normal
random variable.
Proof. We assume µ = 0 and σ 2 = 1 since it is straightforward to extend
to the general case by changing variable Xk with (Xk − µ)/σ. Let F be the
distribution function of Z and Fn the distribution function of Yn , then we need
to show that Fn → F pointwisely. To this end, we consider √ their moment
generating functions MZ and MYn . We know MYn (t) = MX (t/ n)n . Noting
that MX (t) = 1 + t2 /2 + o(t2 ), we have MYn (t) = (1 + t2 /(2n) + o(t2 ))n . On
the other hand, MZ (t) = 1 + t2 /2 + o(t2 ). Hence MYn (t) → MZ (t) as n → ∞
for all t ∈ R sufficiently close to 0, and applying inverse Laplacian transform to
the moment generating functions yields the claim.
76