math3301-lecturenotes
math3301-lecturenotes
References
1 Preliminaries 5
1.1 Sets and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Some fundamental properties of the real line and the extended real line . . . . . 8
1.3.1 The real line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 The extended real line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Superior limit and inferior limit . . . . . . . . . . . . . . . . . . . . . . . . 13
3
4 CONTENTS
Chapter 1
Preliminaries
∁A , ∁A c
E , E\A, E − A, A .
Let f : X → Y be a map between two sets. Let A ⊂ X, the (direct) image of A under f is
5
6 CHAPTER 1. PRELIMINARIES
f −1 : P(Y ) → P(X)
B 7→ f −1 (B).
1. f −1 (∅) = ∅ and f −1 (Y ) = X.
2. f −1 (Ac ) = (f −1 (A))c
We use capital letters to denote sets and calligraphic letters to denote sets of sets. If B is a
collection of subsets of Y , then we write f −1 (B) = {f −1 (B)|B ∈ B). This is the direct image
under f −1 of the set B.
Note that if (Ai )i∈I is a family of subsets of X, then
[ [
f ( Ai ) = f (Ai ),
i∈I i∈I
Countable sets
A set E is called countable if there is a one to one function f : E → IN∗ where IN∗ is the set of
positive integers. This means that we can put the elements of E in a finite or infinite sequence
(x1 , x2 , . . .). So for example the set of all integers Z and the set of all rational numbers Q are
countable. A countable union of countable sets is countable, that is, if I is countable and Ai is
countable for all i ∈ I, then ∪i∈I Ai is countable. Also a finite Cartesian product of countable
sets is countable, so for example IN×IN is countable. However P(IN) and the set of real numbers
IR are uncountable.
and \ [
lim sup An = limAn = Ak ,
n∈IN∗ k≥n
Therefore, x ∈ lim inf An if starting from as certain rank, x belongs to all the An . In probability
theory, the event lim inf An is called the event that the An happen almost always (An a.a.).
On the other hand, x ∈ lim sup An if x belongs to An for infinitely many indices n. In
probability theory, the event lim sup An is called the event that the An happen infinitely often
(An i.o.).
Example. Consider the experiment of flipping a fair coin infinitely many times. Let An be
the event: Head appears on the nth flip. Then
1.1. SETS AND FUNCTIONS 7
T∞
1. n=1 An is the event: a head appears on all tosses or {(HH . . . H . . .)}.
2. lim inf An is the event: a head appears starting from a certain flip. It is the union of all
events of the form
3. lim sup An is the event: a head appears infinitely many times. This event is the comple-
ment of the event: a tail appears starting from a certain flip.
4. ∞
S
n=1 An is the event: a head appears at least once. □
and that
\ [
x ∈ lim sup An ⇔ x ∈ Ak ⇔ ∀n ∃k ≥ n x ∈ Ak ⇔ {n ∈ IN∗ | x ∈ An } is infinite.
n∈IN∗ k≥n
1.2 Topology
Let X be a set. A family T of subsets of X (that is, T ⊂ P(X)) is called a topology on X
provided the following conditions hold.
1. ∅, X ∈ T .
2. If Oi ∈ T for all i ∈ I, then ∪i∈I Oi ∈ T .
3. If O1 , O2 ∈ T , then O1 ∩ O2 ∈ T .
The elements of T are called open sets of X. A set is called closed if its complement is open.
If d is a distance on X, then d generates a topology on X.
A function f : X → Y between two topological spaces is called continuous at a point x0 if
for every neighborhood V of f (x0 ), there exists a neighborhood of x0 such that f (U ) ⊂ V . A
function f is called continuous on X if it is continuous at every point of X. Then the following
conditions are equivalent.
(i) f : X → Y is continuous.
(ii) The inverse image under f of every open set of Y is an open set of X.
(iii) The inverse image under f of every closed set of Y is a closed set of X.
1.3 Some fundamental properties of the real line and the ex-
tended real line
1.3.1 The real line
We start by formulating two fundamental properties of the set of real numbers that we shall
use in the sequel. This set is denoted by IR and it is naturally identified to a line. An upper
bound of a subset E ⊂ IR is a real number M such that x ≤ M for all x ∈ E. The set E is
called bounded from above if it has an upper bound. A lower bound of a subset E ⊂ IR is a
real number m such that m ≤ x for all x ∈ E. The set E is called bounded from below if it has
a lower bound.
1.3. SOME FUNDAMENTAL PROPERTIES 9
Definition 1.1 Let E ⊂ IR be bounded from above. The number α is called the least upper
bound of E if the following hold.
i) α is an upper bound of E.
ii) If β < α, then β is not an upper bound of E.
In this case, we write α = sup E. If E is not bounded from above, we write sup E = +∞.
Proposition 1.1 Let E ⊂ IR be bounded from above. Then α = sup E if and only if the
following hold.
i) x ≤ α for all x ∈ E.
ii) ∀ε > 0 ∃y ∈ E such that α − ε < y.
Definition 1.2 Let E ⊂ IR be bounded from below. The number α is called the greatest lower
bound of E if the following hold.
i) α is a lower bound of E.
ii) If β > α, then β is not a lower bound of E.
In this case, we write α = inf E. If E is not bounded from below, we write inf E = −∞.
Proposition 1.2 Let E ⊂ IR be bounded from below. Then α = inf E if and only if the
following hold.
i) x ≥ α for all x ∈ E.
ii) ∀ε > 0 ∃y ∈ E such that α + ε > y.
Theorem 1.1 Any nonempty subset of IR which is bounded from above has a least upper
bound. Any nonempty subset of IR which is bounded from below has a greatest lower bound.
Any proof of this theorem involves going back to the construction of the real numbers from
the rational numbers. We do not address this issue here.
Recall now that a subset E ⊂ IR is an interval if [x, y] ⊂ E whenever x and y belong to E.
An interval has thus one of the following eleven forms ∅, {a}, ]a, b[, ]a, b], [a, b[, [a, b], ]−∞, a],
] − ∞, a[, ]a, +∞[, [a, +∞[ and IR. We shall say that an interval is trivial if it is empty or a
singleton.
Theorem 1.2 Any nontrivial interval of the real line contains rational as well as irrational
numbers.
This theorem is not difficult to prove using the following evident facts: 1- for each real
number a, there is an integer n bigger than a (this is known as the Archimedean property); 2-
every nonempty subset of IN has a smallest element (IN is well ordered).
Proposition 1.3 Let a and b be two real numbers. If a ≤ b + ε for all ε > 0 then a ≤ b.
Theorem 1.3 Every open set O of the real line is the union of a countable family of pairwise
disjoint open intervals.
Proof. Let (Iλ )λ∈L be the collection of connected components of O. We know that each Iλ
is an open interval (recall that in a locally connected space, a connected component of an open
set is open). Since the rational numbers are dense in IR, each Iλ contains a rational number.
By the axiom of choice, we can choose exactly one rational number in each Iλ . This defines a
function between L and Q which is one to one because two distinct components are disjoint.
Therefore L is countable. Finally we have indeed O = ∪λ∈L Iλ . □
10 CHAPTER 1. REVIEW AND COMPLEMENTS
tan x
if x ∈] − π2 , π2 [
−1
h (x) = +∞ if x = π2
−∞ if x = − π2 .
3. IR is dense in IR. This follows from the fact that IR is homeomorphic to ] − π2 , π2 [ which
is dense in [− π2 , π2 ].
Theorem 1.4 Every open set O of IR is the union of a countable family of pairwise disjoint
open intervals of IR.
Proof. Observe first that the connected subsets of IR are exactly the intervals of IR. Next,
an open interval of IR contains an open interval of IR which therefore contains rational points.
Therefore the proof of Theorem 1.3 can be repeated. □
We will have to deal later with sequences and series in IR. So it is important to formulate
some rules for using them.
Proof. Let (xn ) be a monotonic sequence in IR. Let yn = h(xn ) where h : IR → [− π2 , π2 ] is the
homeomorphism defined above. Then (yn ) is also monotonic and therefore convergent (since it
is bounded). Continuity of h−1 implies that (xn ) is convergent.
i) Let (xn ) be nondecreasing. We distinguish between two cases
a) xn < +∞ for all n ∈ IN∗ . The fact that sup xi = +∞ means that the set {xi |i ∈ IN∗ }
is not bounded from above. This in turn means that for all M > 0, there exists k such
that xk ≥ M . Then xn ≥ M for all n ≥ k. This means that lim xn = +∞ and so
lim xn = sup xi .
b) xn0 = ∞ for some n0 . Then xn = ∞ for all n ≥ n0 . In this case, we have lim xn = +∞
and so lim xn = sup xi .
3. (∀n ∈ IN∗ , xn ≤ yn ) ⇒ ∞
P P∞
n=1 xn ≤ n=1 yn .
Lemma 1.1 The sum of a series of nonnegative terms does not depend on the order of sum-
mation.
Proof. Let P∞(xn ) be aP sequence of [0, +∞] and let σ : IN∗ → IN∗ be a bijection. We have to
prove that n=1 xn = ∞ n=1 xσ(n) . Let n be given and let N = max{σ(1), · · · , σ(n)}. Then
Pn PN P∞
{σ(1), · · · , σ(n)} ⊂ {1, · · · , N }. PIt follows that k=1 xσ(k) ≤ k=1 x k ≤ xk because
k=1P
xk ≥ 0. Letting n → ∞, we get ∞ ∞ ∞ ∞
P P
x
k=1 σ(k) ≤ x
k=1 k . By symmetry x
k=1 k ≤ k=1 xσ(k) .
Hence the equality. □
P
Remark 1.2 If the sequence (xn ) is of variable sign, then the sum of the series xn may
depend on the order of summation. Here is an example. We shall prove later that
1 1 1 1 1
1− + − + ··· + − + · · · = ln 2.
2 3 4 2k − 1 2k
However,
1 1 1 1 1 1 1 1 1
1− − + − − + ··· + − − + · · · = ln 2.
2 4 3 6 8 2k − 1 4k − 2 4k 2
Indeed, let Sn be the nth partial sum of the first series and let Sn′ be the nth partial sum of the
second series. Then
n X n
′
X 1 1 1 1 1 1
S3n = − − = − = S2n .
2k − 1 4k − 2 4k 4k − 2 4k 2
k=1 k=1
′ → 1 ln 2. Since S ′
Therefore S3n ′ 1 ′ ′ 1
2 3n−1 = S3n + 4n and S3n−2 = S3n−1 + 4n−2 , it follows that
′
S3n−1 ′
and S3n−2 also converge to 12 ln 2. Consequently, Sn′ converges to 12 ln 2. □
1.3. SOME FUNDAMENTAL PROPERTIES 13
Let now A be an infinite countable set and let (xa )a∈A be a family of [0, +∞]. Then there exists
a bijection φ : IN∗ → A. We set
X X∞
xa = xφ(n) .
a∈A n=1
The previous lemma ensures that this definition makes sense. Using arguments similar to those
used in the proof of Lemma 1.1, one can prove the following
X ∞ X
X ∞ ∞ X
X ∞
xn,m = xn,m = xn,m .
(n,m)∈IN∗ ×IN∗ n=1 m=1 m=1 n=1
Examples.
1. Let xn = (−1)n . Then yn := inf{xk | k ≥ n} = −1. Hence lim inf xn = lim yn = −1. On
n→∞
the other hand, zn := sup{xk | k ≥ n} = 1. Hence lim sup xn = lim zn = 1.
n→∞
1 1 1 1 1 1
2. Consider the sequence (1, 2, 1 + 2 , 2 + 2 , 1 + 3 , 2 + 3 , 1 + 4 , 2 + 4 , . . .). Then lim sup xn = 2
and lim inf xn = 1.
Theorem 1.5 The lim sup and lim inf satisfy the following properties.
(i) xn ≤ yn ⇒ lim inf xn ≤ lim inf yn and lim sup xn ≤ lim sup yn .
(v) lim sup xn is the biggest limit point of (xn ) and lim inf xn is the smallest limit point of
(xn ).
enough. In particular xn ≥ inf k≥n xk > ℓ − ε. Thus, ℓ − ε < xn < ℓ + ε for all n large enough.
This proves that lim xn = ℓ.
Conversely, assume that lim xn = ℓ. If ℓ = +∞, then supk≥n xk → ∞ since supk≥n xk ≥ xn .
Therefore lim sup xn = +∞. Now since xn → ∞, we have for all A > 0, there is n0 such that
xn ≥ A for all n ≥ n0 and so inf k≥n xk ≥ inf k≥n0 xk ≥ A. Letting n → ∞ we get lim inf xn ≥ A.
Since A is arbitrary, we conclude that lim inf xn = +∞. The case ℓ = −∞ is similar. Now we
assume that ℓ ∈ IR. Let ε > 0 be given. Then there is n0 such that ℓ − ε < xk < ℓ + ε for all
k ≥ n0 . Thus
ℓ − ε ≤ inf xk ≤ sup xk ≤ ℓ + ε
k≥n k≥n
∀ε > 0 ∃n0 such that L ≤ sup xk < L + ε ∀n ≥ n0 (by definition of the limsup)
k≥n
and
In particular, for ε = 1 and n = 1, ∃k1 ≥ 1 such that L + 1 > xk1 > L − 1. Next take ε = 1/2
and n = k1 + 1. Then there exits k2 ≥ k1 + 1 such that L + 12 > xk2 > L − 21 . At the pth step,
we get an index kp ≥ kp−1 + 1 such that L + p1 > xkp > L − p1 . This defines a subsequence (xkp )
that converges to L.
Let now (xφ(n) ) be a subsequence that converges to ℓ. We have
Therefore, supk≥n xφ(k) ≤ supk≥n xk . Letting n → ∞, we get lim sup xφ(n) ≤ lim sup xn , that
is, lim xφ(n) ≤ lim sup xn . □
Chapter 2
Our target in this chapter is to develop a general theory of measure that includes the notions
of length, area, volume and probability as special cases. So let us start by formulating some
natural requirements that a measure has to possess. Consider a random experiment and let
X denote its sample space, that is, the set of all possible outcomes (in probability theory,
X is usually denoted by Ω). The subsets of X are usually called events. However, we may
be interested in assigning a probability only to some special subsets of X; we will call them
events. Let A denote the collection of events (to which a probability will be assigned). First,
it is natural to assign a probability 1 to the ”certain” event X. So X should belong to A
and P (X) = 1. We would like also to assign a probability 0 to the impossible event ∅. So ∅
should belong to A and P (∅) = 0. Second, given an event A ∈ A, we would like to consider
its complement X\A as an event and set P (X\A) = 1 − P (A). Third, given two events A and
B, we would like to consider their union A ∪ B as an event. If moreover A and B are disjoint
we would like to have P (A ∪ B) = P (A) + P (B). It will follow by induction that if A1 , . . . , An
are events then their union is an event. Moreover, if these events are disjoint, we would have
P (A1 ∪ · · · ∪ An ) = P (A1 ) + · · · + P (An ). Now sometimes we have to deal with an infinite but
countable collection of events (An ) and we would like to consider their union as an event. If
moreover these events are disjoint we would like to have
(i) X ∈ A.
(ii) ∅ ∈ A.
1. P (X) = 1
2. P (∅) = 0.
3. P (X\A) = 1 − P (A).
4. P ( ∞
S P∞
n=1 An ) = n=1 P (An ) if An are pairwise disjoint.
What we said about events and probabilities can be formulated for subsets of the plane and
their area. However, the area of the plane should be considered as infinite, so the requirement
15
16 CHAPTER 2. AXIOMATIC MEASURE THEORY
1. above is not essential to our general theory. Also note that requirement (ii) is redundant
because it can be deduced from (i) and (iii).
Now we are sufficiently motivated to start developing our general theory. A collection of
subsets having properties (i)-(iv) above is called a σ−algebra. If in requirement (iv) we only
consider finite sequences, then the collection is called an algebra.
(i) X ∈ A.
countable.
e) The collection of all subsets of IN which are either
S finite or have a finite complement is not a
σ−algebra on IN. Indeed, take An = {2n} then ∞ n=1 An is the set of all even positive integers:
it is infinite and its complement is infinite.
Definition 2.2 A measurable space is a couple (X, A) where X is a set and A is a σ−algebra
on X. The elements of A are called the measurable subsets of X.
1. ∅ ∈ A.
∞
\
2. An ∈ A for n = 1, 2, . . . ⇒ An ∈ A. We say that A is closed or stable under countable
n=1
intersections.
N
[
3. An ∈ A for n = 1, 2, . . . , N ⇒ An ∈ A. We say that A is stable under finite unions.
n=1
N
\
4. An ∈ A for n = 1, 2, . . . , N ⇒ An ∈ A. We say that A is stable under finite intersec-
n=1
tions.
2.1. MEASURABLE SPACES 17
(ii) If A ∈ A, then A ∈ Ai for all i ∈ I and so X\A ∈ Ai for all i since Ai is a σ−algebra.
Therefore, X\A ∈ A.
S∞
Si∞for all n and all i. Therefore n=1 An ∈ Ai
(iii) If An ∈ A for all n = 1, 2 . . ., then An ∈ A
for all i ∈ I since Ai is σ−algebra. Thus, n=1 An ∈ A. □
Thanks to this lemma we now can define the notion of a σ−algebra generated by a family
of sets.
Definition 2.3 Let X be a set and S ⊂ P(X) be a collection of subsets of X. The intersection
of all σ−algebras containing S is called the σ−algebra generated by S. It is the smallest (with
respect to inclusion) σ−algebra that contains S. It is denoted by σ(S).
Examples. a) If (X, T ) is a topological space, then the σ−algebra σ(T ) generated by the
open sets of X is called the Borel σ−algebra of (X, T ). It is also denoted by B(X, T ) or just
B(X) if no confusion arises. An element B ∈ B(X) is called a Borel set or a Borel measurable
set.
b) Let A be a σ−algebra on a set X and B be a σ−algebra on a set Y . We denote by A ⊗ B
the σ−algebra generated by the following family
Remark 2.3 (Methodology) 1. To prove that A = σ(C), we show that A ⊂ σ(C) and
that C ⊂ A (which implies that σ(C) ⊂ σ(A) = A).
2. To prove that σ(C1 ) = σ(C2 ), we show that C1 ⊂ σ(C2 ) and that C2 ⊂ σ(C1 ).
The following important results will be proved in the exercises.
Proposition 2.2 The Borel σ−algebra on IR, that we denote by B(IR) or B, is generated by
anyone of the following collections: (a, b ∈ IR or a, b ∈ Q)
i. the open intervals.
Proposition 2.3 The Borel σ−algebra of IR that we denote by B(IR) is generated by anyone
of the following collections where (where a, b ∈ Q or a, b ∈ IR).
i. the intervals of the form ]a, +∞].
(i) µ(∅) = 0.
Definition 2.5 A measure space is a triple (X, A, µ) such that A is a σ−algebra on X and µ
is a measure on (X, A).
Examples. In Examples (1) to (4), X is any set and one can take A = P(X).
(1) The trivial measure. µ(A) = 0 for all A ∈ A.
(
0 if A = ∅
(2) The infinite measure. µ(A) =
+∞ if A ̸= ∅.
Proof. (i) It is clear that µ(∅) = 0. (ii)SLet now (An ) be a sequence S∞ of pairwise disjoint
∞
elements of A.P∞ If A n = ∅ for
P∞ all n, then n=1 nA = ∅ and soS∞ n=1 An ) = 0. On the
µ(
S hand, n=1 µ(An ) = n=1 0 = 0. If An ̸= ∅ for some m, Pn=1
other An ̸= ∅ and therefore
µ( ∞ A
n=1 n ) = ∞. On the other hand the series is equal to ∞ since ∞
n=1 µ(An ) ≥ µ(Am ) = ∞.
□
(
1 if x ∈ A
(3) The Dirac measure or Dirac mass. Let x ∈ X. We set δx (A) =
0 if x ∈
/ A.
Proof. (i) Since x ∈ / ∅, it follows that δx (∅) = 0. (ii) Let (An ) be a sequence of pairwise
disjoint elements of A. We distinguish between two cases. If x belongs to the union, then it
belongs to exactly one Am since the An are Ppairwise disjoint. Then δx (∪∞ n=1 An ) = 1, δx (Am ) = 1
∞
and δx (Ak ) = 0 for k ̸= m. Therefore n=1 δx (An ) = 1. Next, if x does not belong to the
union, it does not belong to any of the An . So both terms are zero. □
(
card(A) if A is finite
(4) The counting measure. Let µ(A) =
+∞ if not.
Proof. (i) µ(∅) = card (∅) = 0. (ii) Let (An ) be a sequence of pairwise disjoint elements of
A. We distinguish between two cases.
∗
Case 1. ∪∞ n=1 An is finite. Then there is some N ∈ IN such that An = ∅ for all n ≥ N .
Otherwise, card An ≥ 1 for infinitely many n. So ∪n=1 An would be infinite. Then ∪∞
∞
n=1 An =
∪N A
n=1 n and so
∞ N N N N ∞
! ! !
[ [ [ X X X
µ An = µ An = card An = card An = µ(An ) = µ(An ).
n=1 n=1 n=1 n=1 n=1 n=1
Case 2. ∪∞
n=1 An
is infinite. Then µ(∪∞n=1 An )
= +∞. Here P we also distinguish between
two cases. 2a) Some Ak is infinite. Then µ(Ak ) = +∞ and so ∞ n=1 µ(An ) = +∞. 2b) All
∞
An are finite. But then An ̸= ∅ for infinitely manyPn because ∪n=1 An is infinite. But then
µ(An ) = card (An ) ≥ 1 for infinitely many n and so ∞n=1 µ(An ) = +∞. □
(5) Restriction of a measure. Let (X, A, µ) be a measure space and let D ∈ A. We define
a new measure ν on (X, A) by ν(A) = µ(A ∩ D). This is indeed a measure which is called the
restriction of µ to D. In fact, ν is also a measure on (D, AD ) where AD = {D ∩ A|A ∈ A}.
(6) Positive linear combination of measures. Let (µn ) be a sequence ofPmeasures on a
measurable space (X, A) and let (αn ) be a sequence of [0, +∞]. Then µ := ∞ n=1 αn µn is a
measure on (X, A).
P
Proof. (i) µ(∅) = αn µn (∅) = 0. (ii) Let (Ak ) be a sequence of pairwise disjoint elements
of A. Then
X X X XX
µ(∪k Ak ) = αn µn (∪k Ak ) = αn µn (Ak ) = αn µn (Ak )
n n k n k
XX
= αn µn (Ak ) we can interchange the order of summation by Proposition 1.5
k n
X
= µ(Ak ).
k
20 CHAPTER 2. AXIOMATIC MEASURE THEORY
P □
In particular, if (xn ) is a sequence of X, then µ := n αn δxn is a measure. A measure of
this form is called a discrete measure. In particular, if X = {x1 , . . . , xN } is finite then
µ := N1 N
P
δ
n=1 xn is the familiar probability measure
card A
µ(A) = .
card X
Definition 2.6 Let (X, A, µ) be a measure space. We say that
3. µ is σ−finite if X = ∞
S
n=1 An with An ∈ A and µ(An ) < ∞ for all n ∈ IN.
(b) If B ⊂ A then µ(A\B) + µ(B) = µ(A). In particular, if µ(B) < +∞ then µ(A\B) =
µ(A) − µ(B).
(d) µ(A ∪ B) + µ(A ∩ B) = µ(A) + µ(B). In particular, if µ(A ∩ B) < +∞, then µ(A ∪ B) =
µ(A) + µ(B) − µ(A ∩ B).
This is called the σ−subadditivity (note that the An are not necessarily pairwise disjoint).
Proof. (a). Set An = ∅ for n > k. Then the sequence {An }n∈IN∗ is a family of pairwise
disjoint sets. Now, on the one hand, ∪kn=1 An = ∪∞
n=1 An . On the other hand, µ(An ) = 0 for
n > k since µ(∅) = 0. Therefore
∞
X k
X ∞
X k
X
µ(An ) = µ(An ) + µ(An ) = µ(An ).
n=1 n=1 n=k+1 n=1
Therefore,
k k
!
[ X
µ An = µ(An ).
n=1 n=1
2.2. MEASURE SPACES 21
(b) It follows from (a) that µ(C ∪ D) = µ(C) + µ(D) if C and D are disjoint. Now note that
A = (A\B) ∪ B and A\B and B are disjoint. Therefore µ(A) = µ(A\B) + µ(B).
(c) follows from (b) and the fact that a measure is non-negative.
(d) Observe first that A ∪ B = A ∪ B\(A ∩ B) and the two sets are disjoint. It follows from
part (a) that µ(A ∪ B) = µ(A) + µ(B\(A ∩ B)). But µ(B\(A ∩ B)) + µ(A ∩ B) = µ(B) by part
(b). Hence the conclusion follows.
(e) Define a sequence {Bn } in the following way: B1 = A1 , B2 = A2 \A1 , B3 = A3 \(A1 ∪ A2 )
and more generally, Bn = An \(A1 ∪ · · · ∪ An−1 ). Then the sequence {Bn } has the following
properties (see the exercises).
2. ∞
S S∞
n=1 An = n=1 Bn .
Theorem 2.1 (Continuity of measures) Let (X, A, µ) be a measure space and let {An }n∈IN∗ ⊂
A. Then the following hold.
Proof. (i). Define a sequence {Bn } in the following way: B1 = A1 , B2 = A2 \A1 , B3 = A3 \A2
and more generally, Bn = An \An−1 . Then the sequence {Bn } has the following properties
3. ∞
S S∞
n=1 An = n=1 Bn .
(ii). Let A = ∩∞ ∞
n=1 An and Cn = A1 \An . Then ∪n=1 Cn = A1 \A and therefore
∞
!
[
µ Cn = µ(A1 \A) = µ(A1 ) − µ(A)
n=1
22 CHAPTER 2. AXIOMATIC MEASURE THEORY
Note that {Cn } is increasing and therefore by (i) we know that µ(∪∞
n=1 Cn ) = limn→∞ µ(Cn ).
From the other hand, µ(Cn ) = µ(A1 \An ) = µ(A1 ) − µ(An ) and so
Remark 2.5 The assumption µ(A1 ) < ∞ is essential. For example, if X = IN and µ is the
counting measure, then the sequence
Ak = {n ∈ IN∗ | n ≥ k}
∞
\
is decreasing but µ(Ak ) = ∞ and therefore limk→∞ µ(Ak ) = ∞, whereas µ( Ak ) = µ(∅) = 0.
k=1
Definition 2.7 Let (X, A, µ) be measure space. A subset A ∈ A is said to be of full measure
if µ(Ac ) = 0.
Remark 2.6 If µ is finite, then A is of full measure if and only if µ(A) = µ(X). However, if
µ is not a finite measure, then a subset A satisfying µ(A) = µ(X) need not be of full measure.
Can you give an example?
Definition 2.9 Let P be a predicate on a measure space (X, A, µ), that is, for each x ∈ X,
there is a proposition P (x) which is either true or false depending on x. We say that P holds
µ−almost everywhere (µ−a.e) or just almost everywhere if the set {x ∈ X | P (x) is false} is µ−
negligible.
For example two functions f, g : X → IR are equal almost everywhere if the set {x ∈ X | f (x) ̸=
g(x)} is negligible. We shall give more examples in the next chapter. A function from
(X, A, µ) → IR is called µ−negligible if it is equal to 0 almost everywhere.
Remark 2.7 A predicate P holds µ−a.e. if and only if there is A ∈ A such that µ(Ac ) = 0
and P (x) is true ∀ x ∈ A. Otherwise stated, P holds µ−a.e. if and only if it holds on a set of
full measure. Indeed, suppose first that P holds µ−a.e. This means that there exists B ∈ A
such that {x ∈ X | P (x) is false} ⊂ B and µ(B) = 0. Let A = B c . Then A ∈ A and A =
B c ⊂ {x ∈ X | P (x) is true}. This means that P (x) is true ∀x ∈ A with µ(Ac ) = 0. Conversely,
suppose that P (x) is true ∀x ∈ A. This means that A ⊂ {x ∈ X | P (x) is true}. Therefore
{x ∈ X | P (x) is false} ⊂ Ac with µ(Ac ) = 0. This means precisely that {x ∈ X | P (x) is false}
is negligible, that is, P holds µ−a.e.
2.3. MEASURABLE FUNCTIONS 23
Exercise. Let P1 and P2 be two predicates on a measure space. If both P1 and P2 hold
almost everywhere then the predicate P1 ∧P2 (conjunction) also holds almost everywhere. More
generally, if (Pn ) is a sequence of predicates that hold almost everywhere then the predicate
∧∞n=1 Pn also holds almost everywhere. If P ⇒ Q and P holds a.e, then Q holds a.e.
Remark 2.8 Many mathematicians say that a predicate P holds almost everywhere if {x ∈
X | P (x) is false} has measure 0. Our definition is therefore more general. However, due to the
previous remark, this makes little difference in practice.
The following lemma gives a useful criterion for the measurability of a function.
Lemma 2.2 Let (X, A) and (Y, B) be two measurable spaces. Suppose that B is generated by
a family C ⊂ P(Y ). Then f : X → Y is (A, B)−measurable if and only if f −1 (C) ⊂ A.
Proof. Suppose first that f is measurable. Then f −1 (C) ⊂ f −1 (B) ⊂ A. Conversely, suppose
that f −1 (C) ⊂ A and consider the family of subsets of Y
F := {E ⊂ Y |f −1 (E) ∈ A}.
Then one can check that F is σ−algebra on Y . But this σ−algebra contains C and so it contains
B since B is the smallest σ−algebra containing C. Now B ⊂ F means precisely that f −1 (B) ∈ A
for all B ∈ B, that is, f is (A, B)−measurable. □
Remark 2.9 Let (X, A) and (Y, B) be a two measurable spaces and A ∈ A. We already
observed that (A, P(A) ∩ A) is a measurable space. Therefore it makes sense to say that a
function f : A → Y is measurable. This means that f −1 (B) ∈ A, whenever B ∈ B, because
f −1 (B) ⊂ A and so f −1 (B) ∈ P(A) automatically.
Lemma 2.3 (The pasting lemma) Let (X, A) and (Y, B) be a two measurable spaces, and
let A, B ∈ A. Let f : A → Y and g : B → Y be measurable functions that coincide on A ∩ B
(this condition is satisfied if A ∩ B = ∅). Then the function h : A ∪ B → Y defined by
(
f (x) if x ∈ A
h(x) =
g(x) if x ∈ B
24 CHAPTER 2. AXIOMATIC MEASURE THEORY
is measurable.
Proof. This follows from the fact that h−1 (E) = f −1 (E) ∪ g −1 (E). Thus if E ∈ B, then
f −1 (E) and g −1 (E) belong to A (and they are contained in A ∪ B). □
Then f˜ is measurable.
Corollary 2.2 Let f : X → IR be measurable and let E ⊂ X be a measurable set. Then the
function h : X → IR defined by
(
f (x) if x ∈ X\E
h(x) =
0 if x ∈ E.
is measurable.
Proof. Note that the restriction of a measurable function to a measurable subset is measurable.
□
Proposition 2.5 Let f : X → Y be (A, B)−measurable and let µ be a measure on (X, A).
Then the formula
ν(B) := µ(f −1 (B))
for all B ∈ B defines a measure on (Y, B). This measure is denoted by f∗ (µ) and is called the
image measure of µ by f . It is also called the pushforward measure of µ by f .
Example 2.1 Let X and Y be two arbitrary sets and let A = P(X) and B ⊂ P(Y ) be an
arbitrary σ−algebra on Y . Let µ be the counting measure on X. Then every function f : X → Y
is (A, B)−measurable and ν = f∗ µ is the measure that counts the number of preimages
(
card{x ∈ X|f (x) ∈ B} if this set is finite
ν(B) = µ(f −1 (B)) =
+∞ if not.
2.3. MEASURABLE FUNCTIONS 25
Example 2.2 Let X be a set, a ∈ X and consider the measure space (X, P(X), δa ). Let (Y, B)
be an arbitrary measure space and let f : X → Y be a function (it is necessarily measurable).
Then for any A ⊂ X, we have
( (
−1 1 if a ∈ f −1 (A) 1 if f (a) ∈ A
f∗ (δa (A)) = δa (f (A)) = −1
= = δf (a) (A).
0 if a ∈
/ f (A) 0 if f (a) ∈
/A
Remark 2.10 The concept of image measure is very important in probability. Let (X, A, µ)
be a probability space and let f : X → IR is a measurable function (IR is equipped with its
Borel σ−algebra). Then f is called a random variable and f∗ (µ) is called the probability law
of f . In order to compute probabilities related to f , we have to know what is f∗ (µ).
Remark 2.11 Let f : X → IR. To simplify the notation, we write {f < a} instead of
f −1 ([−∞, a[) = {x | f (x) < a}. This notation is used in probability theory. Most often in
probability theory, a probability space is denoted by (Ω, F, P ) and symbols like X, Y, Z, W are
used to denote random variables, i.e, measurable functions on Ω. Then P (X < a) denotes the
measure (probability) of the event {X < a}.
Lemma 2.4 Let (X, A) be a measurable space and f : X → IR. Then the following conditions
are equivalent.
1. f is measurable.
Proof. This follows from Lemma 2.2 and the fact that B(IR) is generated by subsets of the
form [−∞, a[ etc. □
Theorem 2.2 Let (X, A) be a measurable space. Let f, g : X → IR be measurable and let
α be a real constant. Then the following functions are defined on measurable subsets and are
measurable.
f
f + g, αf, f g, |f |, .
g
26 CHAPTER 2. AXIOMATIC MEASURE THEORY
Proof. f + g is not defined when f (x) = +∞ and g(x) = −∞ or vice versa. Otherwise stated
f + g is not defined on the set E = f −1 (+∞) ∩ g −1 (−∞) ∪ f −1 (−∞) ∩ g −1 (+∞). Now observe
that {+∞} is a closed set in IR. It is therefore a Borel subset of IR. It follows that f −1 ({+∞})
is measurable because f is measurable. Similarly, the sets g −1 (−∞), f −1 (−∞) g −1 (+∞) are
measurable. Consequently, E is measurable and so the set X\E on which f + g is defined is
measurable.
For similar reasons, fg is defined on a measurable set. f g is defined everywhere because of
our convention 0 × ±∞ = 0.
a) Let h = f + g. Let a ∈ Q. We claim that
[
{h < a} = {f < p} ∩ {g < q}
where the union is taken over all couples (p, q) ∈ Q2 such that p + q = a. Indeed, if f (x) < p
and g(x) < q with p + q = a, then h(x) = f (x) + g(x) < p + q = a. Conversely suppose that
h(x) < a. Note that this implies that f (x) < ∞ and g(x) < ∞. Let p be a rational number
such that f (x) < p < a − g(x). Let q = a − p ∈ Q. Then g(x) < q. This proves the claim. But
the claim means that {h < a} is a countable union of measurable sets.
b) αf is measurable because
{f < αa } if α>0
∅ if α = 0 and a ≤ 0
{αf < a} =
X
if α = 0 and a > 0
{f > a }
if α ≥ 0.
α
c) f 2 is measurable because
(
2 ∅ if a < 0
{f ≤ a} = √ √
{− a ≤ f ≤ a} if a ≥ 0.
d) We can write
+∞ if f (x) = +∞, g(x) > 0 or f (x) = −∞, g(x) < 0 or vice versa
f (x)g(x) = −∞ if f (x) = +∞, g(x) < 0 or f (x) = −∞, g(x) > 0 or vice versa
1 2 2 2
2 (f (x) + g(x)) − f (x) − g(x) if f (x), g(x) ∈ IR.
Since constant functions are measurable, it follows from a), b), c) and the pasting lemma that
f g is measurable.
e) |f | is measurable because
(
∅ if a < 0
{|f | ≤ a} =
{−a ≤ f ≤ a} if a ≥ 0.
1
f) g is measurable because
1
1 {g > a } ∪ {g < 0}
if a > 0
{ < a} = {g < 0} if a = 0
g
1
{ a < g < 0} if a < 0.
2.3. MEASURABLE FUNCTIONS 27
Theorem 2.3 Let (X, A) be a measurable space and let (fn ) be a sequence of measurable
functions from X to IR, then the following functions
inf fn , sup fn , lim inf fn , lim sup fn
are measurable. In particular, if (fn ) converges pointwise, its limit is measurable.
Proof. a) Let h = sup(fn ). Then h is measurable because
∞
\
{h ≤ a} = {fn ≤ a}.
n=1
c) Consequently,
lim inf fn = sup inf fn
n≥1 k≥n
is measurable.
d) Similarly,
lim sup fn = inf sup fn
n≥1 k≥n
is measurable. □
Corollary 2.3 If f and g are two measurable functions from X to IR, then max(f, g) and
min(f, g) are measurable. In particular, the functions f + := max(f, 0) and f − := − min(f, 0) =
max(−f, 0) are measurable.
Lemma 2.5 Let f : X → IR be a simple function. Then f can be written in the form
n
X
f= ai 1Ai (2.1)
i=1
b) 0 ≤ h1 ≤ h2 ≤ · · · ≤ f .
Proof. For each n ∈ IN∗ divide the interval [0, n] into n2n intervals each of length 2−n and
set (
{x | k2−n ≤ f (x) < (k + 1)2−n } if 0 ≤ k ≤ n2n − 1
En,k =
{x | f (x) ≥ n} if k = n2n .
Then for each n, the family {En,k }k=0,...,n2n is a partition of X. Set
(
k2−n if x ∈ En,k for some 0 ≤ k ≤ n2n − 1
hn (x) =
n if x ∈ En,n2n .
Otherwise stated,
n
n2
X k
hn = 1E is a simple function.
2n n,k
k=0
(1) k2−n ≤ f (x) < (k + 1)2−n for some 0 ≤ k ≤ n2n − 1. Then hn (x) = k2−n . Also the
inequality of this case is equivalent to 2k2−n−1 ≤ f (x) < (2k + 2)2−n−1 . So we distinguish
between two cases.
(1a) 2k2−n−1 ≤ f (x) < (2k + 1)2−n−1 , that is, x ∈ En+1,2k . In this case, hn+1 (x) =
2k2−n−1 = k2−n = hn (x).
(1b) (2k +1)2−n−1 ≤ f (x) < (2k +2)2−n−1 , that is, x ∈ En+1,2k+1 . In this case, hn+1 (x) =
(2k + 1)2−n−1 > k2−n = hn (x).
(2a) n ≤ f (x) < n + 1. Then k2−n−1 ≤ f (x) < (k + 1)2−n−1 for some k ≤ (n + 1)2n+1 − 1
which means that x ∈ En+1,k . But then (k + 1)2−n−1 > n (because f (x) ≥ n) and so
k ≥ n2n+1 . Therefore, hn+1 (x) = k2−n−1 ≥ n = hn (x).
(2b) f (x) ≥ n + 1. Then hn+1 (x) = n + 1 > n = hn (x).
1. f (x) < ∞. Let ε > 0 be given. Choose N such that f (x) < N and 2−N < ε. For
n ≥ N we have k2−n ≤ f (x) < (k + 1)2−n for some k, and so hn (x) = k2−n , therefore
|f (x) − hn (x)| < 2−n ≤ 2−N < ε. Since ε was arbitrary, it follows that hn (x) → f (x).
2. f (x) = +∞. Then f (x) > n for all n and so hn (x) = n by construction. Therefore,
hn (x) → +∞.
2.4. OUTER MEASURES AND CARATHEODORY’S THEOREM 29
d) Let f (x) ≤ M for all x ∈ X. Let ε > 0 be given. Choose N such that N > M and 2−N < ε.
Then |f (x) − hn (x)| < 2−n ≤ 2−N < ε for all x ∈ X and all n ≥ N . □
µ∗ : P(X) → [0, ∞]
such that
(i) µ∗ (∅) = 0;
Then µ∗ is an outer measure which is not a measure (unless X is a one point set). Indeed,
1. µ∗ (∅) = 0.
We now show that µ∗ is not a measure if X has at least two points. Let a and b be two distinct
points of X. Then µ∗ ({a} ∪ {b}) = 1 whereas µ∗ ({a}) + µ∗ ({b}) = 2.
3) Define µ∗ : P(IR) → IR+ by
(
0 if A is countable
µ∗ (A) =
1 if not,
The theorem of Caratheodory below shows that an outer measure is not very far from
a measure. More precisely, an outer measure is a measure when restricted to some suitable
σ−algebra.
Proof. Complete the finite sequence Ai in an infinite one by setting Aj = ∅ for j > k. □
µ∗ (A) = µ∗ (A ∩ E) + µ∗ (A\E).
Remark 2.12 Since A = (A ∩ E) ∪ (A\E), it follows from the preceding lemma that µ∗ (A) ≤
µ∗ (A ∩ E) + µ∗ (A\E). Thus, a set E ⊂ X is µ∗ −measurable if and only of for every A ⊂ X,
µ∗ (A) ≥ µ∗ (A ∩ E) + µ∗ (A\E).
Theorem 2.5 (Caratheodory) Let µ∗ be an outer measure on a set X. Let Mµ∗ denote the
set of µ∗ −measurable subsets of X. Then
1. Mµ∗ is a σ−algebra.
2. µ := µ∗ |Mµ∗ is a measure on (X, Mµ∗ ).
3. (X, Mµ∗ , µ) is complete.
This theorem associates to each outer measure on a set X a measure space
Thus
µ∗ (A) = µ∗ (A ∩ (E1 ∪ E2 ) + µ∗ A ∩ (E1 ∪ E2 )c ,
which means that E1 ∪ E2 ∈ Mµ∗ . Now it is easily seen by induction that, if E1 , . . . , En ∈ Mµ∗ ,
then E1 ∪ · · · ∪ En ∈ Mµ∗ .
Step 2. Let E1 , . . . , En be pairwise disjoint elements of Mµ∗ . We prove that for any
subset A ⊂ X,
n
X
µ∗ A ∩ (∪ni=1 Ei ) = µ∗ (A ∩ Ei ).
i=1
The claim is indeed true for n = 1. Assume that it is true for some n. Then
n+1
[ n+1
[ n+1
[
µ∗ A ∩ ( Ei ) = µ∗ A ∩ ( Ei ) ∩ En+1 ) + µ∗ A ∩ ( c
Ei ) ∩ En+1 )
i=1 i=1 i=1
µ∗ A ∩ (∪n+1 ∗ ∗ n
i=1 Ei ) = µ (A ∩ En+1 ) + µ A ∩ (∪i=1 Ei )
n
X
∗
= µ (A ∩ En+1 ) + µ∗ (A ∩ Ei ) by the induction assumption
i=1
n+1
X
= µ∗ (A ∩ Ei ).
i=1
∞
X
∗
(∪∞ µ∗ (A ∩ Ei ).
µ A∩ i=1 Ei ) =
i=1
µ∗ (A) ≥ µ∗ (A ∩ E) + µ∗ (A ∩ E c ).
Theorem 2.6 Let (X, A, µ) be a measure space. Let à denote the set of subsets of X of the
form A ∪ N where A ∈ A and N is µ−negligible. Then
Proof. (1) We prove first that à is a σ−algebra. Let N denote the set of µ−negligible
subsets of X.
(i) X = X ∪ ∅ with X ∈ A and ∅ ∈ N . Therefore X ∈ Ã.
(ii) Let (An ∪ Nn ) be a sequence of pairwise disjoint elements of Ã. Observe that the sets
(An ) are also pairwise disjoint. Then
!
[ X X
µ̃ (An ∪ Nn ) = µ̃ (∪n An ∪ ∪n Nn ) = µ(∪n An ) = µ(An ) = µ̃(An ∪ Nn ).
n n n
Definition 2.15 The space (X, Ã, µ̃) constructed above is called the completion of the space
(X, A, µ). We shall also say that à is the completion of A.
Remark 2.13 In this context, a property holds µ−almost everywhere if and only if it holds
µ̃−almost everywhere.
Exercise. a) Show that for each B ∈ Ã, there exist A ∈ A and N ∈ N such that B = A ∪ N
and moreover A ∩ N = S∅. Hint. Let M ∈ A be such that N ⊂ M and µ(M ) = 0. Write
A ∪ N = (A ∪ N ) ∩ M c (A ∪ N ) ∩ M .
b) Show that B ∈ Ã if and only if there exist A1 , A2 ∈ A such that A1 ⊂ B ⊂ A2 and
µ(A2 \A1 ) = 0.
Proposition 2.6 Let (X, A, µ) be a measure space and let (X, Ã, µ̃) be its completion. Let
f : X → IR be Ã−measurable. Then there exists a function g : X → IR which is A−measurable
and coincides with f almost everywhere.
Proof. We prove this in two steps.
P
Step 1. Let f = ai 1Ai be an Ã−measurable simple function.P Then for each i, there exists
Bi ∈ A such that Bi ⊂ Ai and Ai \Bi is negligible. Let g = ai 1Bi . Then g is A−measurable.
We claim that x ∈/ ∪(Ai \Bi ) ⇒ f (x) = g(x). Indeed, let x ∈
/ ∪(Ai \Bi ). Since the {Ai } form
34 CHAPTER 2. AXIOMATIC MEASURE THEORY
Our target in this chapter is to give a mathematical meaning to the intuitive notions of length.
We shall assign a measure (a length) to many subsets of IR. This measure is called the Lebesgue
measure and the class of subsets having a measure is called the Lebesgue σ−algebra.
3. Use the second part of Caratheodory’s theorem to construct a measure λ on (IR, L) called
the Lebesgue measure. Establish some important properties λ.
35
36 CHAPTER 3. THE LEBESGUE MEASURE ON IR
∞
P∞ [
Then XA ⊂ YA . Conversely, let s ∈ YA then s = n=1 ℓ(In ) where A ⊂ In . If all intervals
n=1
In are bounded, then s ∈ XA . If some interval In is unbounded then s = +∞ ∈ XPA .
A sequence of intervals In such that A ⊂ ∪In is called a covering of A and ∞n=1 ℓ(In ) is
called the total length of the covering.
Third, the word open can be removed from the definition of λ∗ (A). Indeed, let
(∞ ∞
)
X [
ZA = ℓ(In ) A ⊂ In , and In is an interval .
n=1 n=1
Then YA ⊂ ZA and so inf ZA ≤ inf YA . On the other hand, let ε > 0 be a given and let (In ) be a
covering of A by intervals. Observe that for each n, there exists an open interval Jn containing
In such that ℓ(Jn ) = ℓ(In ) + 2εn ; for example if In = [an , bn ], take
P∞Jn =]an − P
ε
2n+1 n
ε
, b + 2n+1 [.
∞
It follows that P
(Jn ) is a covering of A by open intervals and n=1 ℓ(Jn ) = n=1 ℓ(In ) + ε.
Consequently, ∞ n=1 ℓ(In ) ≥ inf YA − ε and so inf ZA ≥ inf YA − ε. Since ε was arbitrary, we
conclude that inf ZA ≥ inf YA and hence equality.
If λ∗ (An ) = +∞ for some n then the inequality is of course satisfied. So we assume that
λ∗ (An ) < +∞ for all n ∈ IN∗ . Let ε > 0 be given. By a fundamental property of the
infimum of a subset of IR, for each n ∈ IN∗ , there exists a sequence {Inm }m∈IN∗ of open
intervals such that
∞ ∞
[ X ε
An ⊂ Inm and ℓ(Inm ) < λ∗ (An ) + n .
2
m=1 m=1
We have therefore,
∞ ∞ X
∞ ∞ ∞
∗
[ X X
∗ ε X ∗
λ ( An ) ≤ ℓ(Inm ) < λ (An ) + n = λ (An ) + ε.
2
n=1 n=1 m=1 n=1 n=1
Since ε is arbitrary, we conclude that
∞ ∞
!
[ X
λ∗ An ≤ λ∗ (An ).
n=1 n=1
□
3.1. CONSTRUCTION AND PROPERTIES 37
Definition 3.1 Let E ⊂ IR and a ∈ IR. We set E + a = {x + a|x ∈ E}. We say that E + a is
a translate of E. For example [0,1]+3=[3,4]. We set aE = {ax|x ∈ E}. aE is the image of E
under the homothecy of center 0 and ratio a. For example if E = [1, 2] then 3E = [3, 6].
Proposition 3.2 The Lebesgue outer measure λ∗ satisfies the following properties.
4. λ∗ is translation invariant, that is, λ∗ (E + a) = λ∗ (E) for every E ⊂ IR and every a ∈ IR.
Proof. 1. Let ε > 0 be given. Let I1 =]p − ε, p + ε[ and In = ∅ for n > 1. Then {In } is
a countable covering of p by open intervals whose total length is 2ε. It follows that 2ε ∈ X{p}
and so λ∗ ({p}) ≤ 2ε. Since ε was arbitrary, we have λ∗ ({p}) = 0.
2. Let E be countable. We can write E = ∪p∈E {p}. Since λ∗ is an outer measure, λ∗ (E) ≤
∗
P
p∈E λ ({p}) = 0.
3. Let I be an interval (possibly unbounded). Set I1 = I and In = ∅ for n >1. Then {In } is
a countable covering of I by intervals whose total length is ℓ(I). Therefore ℓ(I) ∈ XI and so
λ∗ (I) ≤ ℓ(I).
Conversely, let {In } be
P∞a countable covering of I by open intervals. We already proved in
the exercises that ℓ(I) ≤ n=1 ℓ(In ). Therefore ℓ(I) is a lower bound for XI and consequently,
ℓ(I) ≤ λ∗ (I).
Remark. The proof that ℓ(I) ≤ ∞
P
n=1 ℓ(In ) used the compactness of the interval [a, b]. It
seems that the compactness of [a, b] (in terms of open covering) were discovered by Borel and
Lebesgue in their proof that λ∗ ([a, b]) = b − a.
4. Let E ⊂ IR and let a ∈ IR. We need to show that λ∗ (E + a) = λ∗ (E). The claim is indeed
true if E is an interval (use point 3. above). Let now E be an arbitrary subset of IR. If {In } is
a countable covering of E by open intervals, then {In + a} is a countable covering of E + a by
open intervals. Hence
∞
X ∞
X ∞
X
λ∗ (E + a) ≤ λ∗ (∪∞
n=1 (In + a)) ≤ λ∗ (In + a) = ℓ(In + a) = ℓ(In ).
n=1 n=1 n=1
On the other hand, E = (E+a)−a and according to what we said λ∗ ((E+a)−a) ≤ λ∗ (E+a).
Therefore λ∗ (E) ≤ λ∗ (E + a). Hence the equality.
5. If a = 0, the result is trivial. If not,
X X 1
λ∗ (aE) = inf{ ℓ(In )|aE ⊂ ∪In } = inf{ ℓ(In )|E ⊂ ∪ In }
a
X 1 1
= inf{ |a|ℓ( In )|E ⊂ ∪ In }
a a
X 1 1
= |a| inf{ ℓ( In )|E ⊂ ∪ In } = |a|λ∗ (E).
a a
□
λ∗ (A) = λ∗ (A ∩ E) + λ∗ (A\E)
for every subset A ⊂ IR. An element in L is called a Lebesgue-measurable set. It turns out
that L is a big set. It contains the Borel σ−algebra of IR, but it is much bigger as we shall see.
Recall that the Borel σ−algebra B(IR) is the smallest σ−algebra containing the open subsets of
IR. It is also the smallest σ−algebra containing the closed subsets of IR. An element of B(IR)
is called Borel-measurable. We also denote the Borel σ− algebra on IR by B.
Proposition 3.3 B ⊂ L, that is, every Borel subset of the real line is Lebesgue-measurable.
Proof. Recall that B is generated by the family of intervals of the form ]a, ∞[. Therefore it is
enough to prove that such intervals belong to L. Let A be an arbitrary subset of IR and let a ∈ IR.
Set A1 = A∩]a, ∞[ and A2 = A∩] − ∞, a]. We need to show that λ∗ (A1 ) + λ∗ (A2 ) ≤ λ∗ (A).
The inequality is satisfied if λ∗ (A) = +∞, therefore we assume that λ∗ (A) < +∞.
Let ε > 0 be given. Then, by a fundamental property P∞ of the infimum, there exists a sequence
{In } of open intervals such that A ⊂ ∪∞ I
n=1 n and n=1 ℓ(I n ) < λ ∗ (A) + ε. Set I ′ = I ∩]a, ∞[
n n
and In′′ = In ∩] − ∞, a]. Then, In′ and In′′ are disjoint intervals (possibly empty) such that
In = In′ ∪ In′′ . Therefore,
ℓ(In ) = ℓ(In′ ) + ℓ(In′′ ).
Now, since A1 ⊂ ∪∞ ′
n=1 In , we have
∞
X
∗
λ (A1 ) ≤ ℓ(In′ ).
n=1
Similarly,
∞
X
∗
λ (A2 ) ≤ ℓ(In′′ ).
n=1
Therefore,
∞
X ∞
X
λ∗ (A1 ) + λ∗ (A2 ) ≤ ℓ(In′ ) + ℓ(In′′ ) = ℓ(In ) < λ∗ (A) + ε.
n=1 n=1
Since ε was arbitrary, we conclude that
□
3.1. CONSTRUCTION AND PROPERTIES 39
Corollary 3.1 Countable sets, intervals, open sets and closed sets are Lebesgue measurable.
In practice, all the real sets that we deal with are Lebesgue measurable. Constructing a non-
measurable set is not a trivial matter. See below.
Remark 3.1 You could ask if the inclusion B ⊂ L, is strict. It is. There are two ways to see
this. First, we can construct explicitly a Lebesgue measurable set which is not Borel measurable.
But this is not trivial. The known examples use the middle third Cantor set that we introduce
in the next section. The second way is to show that there is no bijection form B to L. This
result uses the theory of cardinals that you probably do not know. In fact it can be shown
that B is in bijection with IR whereas L is in bijection with P(IR). To prove this is is also a
nontrivial task.
δ = λ∗ ((A − a) ∩ E) + a) + λ∗ A ∩ (E c + a)
= λ∗ ((A − a) ∩ E) + a) + λ∗ ((A − a) ∩ E c ) + a)
= λ∗ (A − a) since E ∈ L
∗
= λ (A) by the translation invariance of λ∗ .
5. λ is σ−finite.
Proof. The first three points follow from the same properties of λ∗ after recalling that
countable sets and intervals are measurable, that L is invariant under translations and sym-
metries. Point 4. follows from Caratheodory’s theorem. Point 5. follows from the fact that
IR = ∪∞
n=1 [−n, n] and λ([−n, n]) = 2n < ∞. □
Here is another important relation between the Lebesgue σ−algebra and the Borel σ−algebra
on IR.
Proposition 3.6 The Lebesgue σ−algebra is the completion of the Borel σ−algebra. This
means that a A is Lebesgue measurable if and only if there exists a Borel set B and a negligible
set N , such that A = B ∪ N .
Remark 3.3 It follows from Theorem 2.6 that a set N ⊂ IR is negligible if and only if it is
contained in a Borel set of measure 0.
Corollary 3.2 Let f : IR → IR be Lebesgue measurable. Then there exists a Borel measurable
function g : IR → IR that coincides with f almost everywhere.
Proof. This follows from the previous proposition and Proposition 2.6.
We end this section with a characterization of the sets of measure zero. The proof is
straightforward if you recall the definition of the measure in terms of covering by open intervals.
We can think of a set of measure zero as a ”thin” set.
Proposition 3.7 A set E ⊂ IR has measure zero if and only if for every ε > 0 there exists a
countable covering of E by open intervals whose total length is less that ε.
3.2. COUNTEREXAMPLES 41
3.2 Counterexamples
3.2.1 A Lebesgue non measurable set
This example is due to Vitali.
Construction.
1. Define on ]0,1[ the equivalence relation: x ∼ y if x − y is rational. This defines a partition
of ]0,1[ into equivalence classes. The class containing x is [x] = {x + q | q ∈ Q∩] − 1, 1[}.
2. By the axiom of choice, there exists a set P that contains exactly one element from each
of the equivalence classes. It is clear that P ⊂]0, 1[.
2. The Pn are pairwise disjoint. Otherwise there are two integers n ̸= m such that Pn ∩Pm ̸=
∅. Let t ∈ Pn ∩Pm . Then there are two numbers x and z in P such that t = x+rn = z+rm .
It follows that x − z = rm − rn is rational and so x ∼ z. But this contradicts the
construction of P (P contains only one element from each equivalence class).
3. Suppose that P is measurable. Then each Pn is measurable and λ(Pn ) = λ(P ) by the
translation invariance of the Lebesgue measure. It follows from the σ−additivity of λ and
the previous step that
∞
X ∞
X
λ(∪∞
n=1 Pn ) = λ(Pn ) = λ(P ).
n=1 n=1
Step 2: Remove the open middle third of each of the intervals of K1 , we get
1 2 1 2 7 8
K2 = 0, ∪ , ∪ , ∪ ,1 .
9 9 3 3 9 9
..
.
Step n: Remove the open middle third of each of the intervals of Kn−1 , to get
[ h (n) (n) i
Kn = ai , bi .
i∈Jn
1
Note that Kn is a union of 2n pairwise disjoint intervals each of length .
3n
This defines a decreasing sequence (Kn ) of closed subsets of [0,1]. We set
∞
\
K= Kn .
n=1
This set is called the middle third Cantor set. It is clear that K is a compact set. It is non
empty because it is the intersection of a decreasing sequence of nonempty closed sets in the
compact space [0,1]. In fact, it should be clear from the construction that the endpoints of the
intervals removed at each step belong to K, so for example 0, 1, 31 , 23 , 19 , . . . belong to K, that is,
(n) (n)
the sequences ai and bi belong to K.
Proposition 3.9 The middle third Cantor set satisfies the following properties.
4. It is uncountable.
2. Any set E of Lebesgue measure zero has an empty interior. For otherwise, E would contain
an nonempty open interval I. But then λ(E) ≥ λ(I) > 0.
3. Let x ∈ K. We need to show that any neighborhood of x meets K at a point y ̸= x. Let
1 n
therefore ε > 0 be given.
h Let n i be an integer satisfying 3 < ε. Since x ∈ Kn , there is some
(n) (n) n
i ∈ Jn such that x ∈ ai , bi . The length of this interval being 13 . We already observed
(n) (n) (n) (n) (n) (n)
that ai and bi belong to K. Now, if x = ai , let y = bi . If x = bi , let y = ai . Finally,
(n) (n) (n) n
if ai < x < bi , let y = ai . In all cases, y ∈ K, y ̸= x and |y − x| ≤ 13 < ε. Since ε was
arbitrary, this means that any neighborhood of meets K at a point different from x.
4. A theorem of topology states that a compact Hausdorff space with no isolated points is
uncountable. □
3.2. COUNTEREXAMPLES 43
Remark 3.4 Let I denote the collection of open intervals of IR and let J denote the collection
of all intervals of IR. Then we have
I ⊂ J ⊂ B ⊂ L ⊂ P(IR).
The first inclusion is clear. The second inclusion follows from Proposition 2.2. The third
inclusion follows from Proposition 3.3. The last inclusion is clear. In fact all the inclusions
are strict. This is clear for the first two. That the third inclusion is strict was pointed out in
Remark 3.1. The last inclusion is strict because of the existence of a Lebesgue non measurable
set.
44 CHAPTER 3. THE LEBESGUE MEASURE ON IR
Chapter 4
Remark 4.1 The Riemann integral is defined by approximation from the integrals of step
functions, whereas the Lebesgue integral is constructed from the integral of simple functions.
In the first case, we divide the domain of the function (that is the interval on which it is defined)
into small parts. In the second case, we divide the range of the function. This is a fundamental
difference.
n
X m
X
bi µ(Bi ) = cj µ(Cj ).
i=1 j=1
n
X X
bi 1Bi = bi 1Bi ∩Cj .
i=1 i,j
Similarly,
m
X X
cj 1Bj = cj 1Cj ∩Bi .
j=1 j,i
Therefore, X X
bi 1Bi ∩Cj = cj 1Cj ∩Bi
i,j j,i
45
46 CHAPTER 4. THE LEBESGUE INTEGRAL
Thus, bi = cj , for all i, j such that Bi ∩ Cj ̸= ∅. Let Λ be the set of indices (i, j) for which
Bi ∩ Cj ̸= ∅. Now
n
X n
X m
X X
bi µ(Bi ) = bi µ(Bi ∩ Cj ) = bi µ(Bi ∩ Cj ).
i=1 i=1 j=1 (i,j)∈Λ
Similarly,
m
X m
X n
X X
cj µ(Cj ) = cj µ(Bi ∩ Cj ) = cj µ(Bi ∩ Cj ).
j=1 j=1 i=1 (i,j)∈Λ
Example 4.3 Consider the measure space (X, P(X), δa ) where δa is the Dirac measure at
a ∈ X. Let f : X → [0, ∞] be a simple function. Then
Z
f dδa = f (a).
X
P
Indeed, let f = simple nonnegative function (this implies that {Ai }i is a
Rαi 1Ai be a P
partition of X). Then, X f dδa = αi δa (Ai ) =
Pαk where k is the index of the unique set Ak
to which a belongs. But if a ∈ Ak then f (a) = αi δa (Ai ) = αk . Hence the equality.
4.1. CONSTRUCTION AND PROPERTIES 47
Example 4.4 Consider the measure space (IN, P(IN), µ) where µ is the counting measure. Let
f : IN → [0, ∞] be a simple function. Then
Z X
f dµ = f (n).
IN n∈IN
P P
PmIndeed, note first that for E ⊂ IN, µ(E) = n∈E 1 = n∈IN 1E (n). Next, let f =
α 1
i=1 i Ai be a simple nonnegative function. Then,
Z m
X m
X X m X
X m
XX X
f dµ = αi µ(Ai ) = αi 1Ai (n) = αi 1Ai (n) = αi 1Ai (n) = f (n).
IN i=1 i=1 n∈IN i=1 n∈IN n∈IN i=1 n∈IN
Lemma 4.2 Let (X, A, µ) be a measure space, E, F ∈ A and let f : X → [0, ∞] be simple.
Then the following hold.
R
(i) E 0 dµ = 0.
R R
(ii) E ⊂ F ⇒ E f dµ ≤ F f dµ.
R
(iii) µ(E) = 0 ⇒ E f dµ = 0.
Pn R Pn
Proof.
R Pn(i) is trivial. (ii) Let f = i=1 a i 1 A i . Then E f dµ = i=1 ai µ(Ai ∩ E) and
F f dµ = i=1 ai µ(Ai ∩ F ). Since µ(Ai ∩ E) ≤ µ(Ai ∩ F ) and ai ≥ 0, the result follows. (iii)
follows from the definition and the fact that µ(Ai ∩ E) = 0. □
Lemma 4.3 R Let f be a simple nonnegative function defined on a measure space (X, A, µ).
Then A 7→ A f dµ is a measure on (X, A).
Proof. Set δ(A) = A f dµ with f = ni=1 ai 1Ai .(i) It should be clear from the definition
R P
and the convention 0 × ∞ = 0 that δ(∅) = 0. (ii) Let (En ) be a sequence of pairwise disjoint
measurable sets. Then
Z Xn n ∞ ∞ X n
∞
X X X
f dµ = ai µ ∪n=1 (Ai ∩ En ) = ai µ(Ai ∩ En ) = ai µ(Ai ∩ En )
∪∞ i=1 i=1 n=1 n=1 i=1
n=1 En
∞ Z
X
= f dµ.
n=1 En
Proposition 4.1 The Lebesgue integral satisfies the following properties (f and g are two
nonnegative simple functions).
Z Z Z
(i) Additivity: (f + g) dµ = f dµ + f dµ.
E E E
Z Z
(ii) Positive homogeneity: (αf ) dµ = α f dµ for any constant α ≥ 0.
E E
Z Z
(iii) Monotonicity: f ≤ g on E ⇒ f dµ ≤ g dµ.
E E
P P
Proof. Let f = i∈I ai 1Ai and g = j∈J bj 1Bj be admissible representations of f and g.
(i) Note that (Ai ∩ Bj )(i,j)∈I×J form a finite partition of E. It is also easy to see that
X
f +g = (ai + bj )1Ai ∩Bj
(i,j)∈I×J
48 CHAPTER 4. THE LEBESGUE INTEGRAL
(iii) Write g = (g − f ) + f where g − f is nonnegative and simple. It follows from (i) that
Z Z Z Z
g dµ = (g − f ) dµ + f dµ ≥ f dµ
E E E E
Definition 4.2 Let E ⊂ X be a measurable set and f ∈ M+ (X). The Lebesgue integral of f
on E (with respect to the measure µ) is
Z Z
f dµ = sup h dµ | h : X → [0, ∞] is simple and 0 ≤ h ≤ f .
E E
We need first to justify that this definition of the integral is an extension of the previous
one, i.e., that both definitions coincide when f is simple. Indeed, let f be simple and let us
denote its integral according to the first definition by SE (f ). Also let
□
R
Proposition 4.3 Let f ∈ M+ (X, A, µ) and E ∈ A. Then E f dµ = 0 if and only if f = 0
µ−almost everywhere on E.
Proof. Suppose first that f = 0 on E\A where A has measure zero. Then every simple
function h such that 0 ≤ h ≤ f is zero on E\A. Therefore, by Lemma 4.3 and Lemma 4.2 (iii)
Z Z Z
h dµ = h dµ + h dµ = 0,
E E\A A
R R R
and so E f = sup0≤h≤f E h = 0. Conversely, suppose that E f dµ = 0. Let A = {x ∈
E | f (x) > 0} and An = {x ∈ E | f (x) > n1 } so that A = ∪∞ 1
n=1 An . The functions hn := n χAn
are simple functions such that 0 ≤ hn ≤ f . Consequently,
Z Z
1
µ(An ) = hn dµ ≤ f dµ = 0
n E E
Lemma 4.4 (Markov’s inequality) For every f ∈ M+ (X) and every constant a ≥ 0, we
have Z
f dµ ≥ aµ({f ≥ a}).
E
Proof. Let A = {x | f (x) ≥ a}. Then h := a1A is a simple function such that 0 ≤ h ≤ f .
Consequently, Z Z
f dµ ≥ h dµ = aµ(A) = aµ({f ≥ a}).
E E
□
R
Corollary 4.1 Let f ∈ M+ (X) satisfy X f dµ < ∞. Then µ({f = ∞}) = 0.
It follows that µ(An ) →0. But (An ) is decreasing and µ(A1 ) < ∞, therefore by the continuity
property of a measure, µ({f = ∞}) = µ(∩An ) = lim µ(An ) = 0. □
Theorem 4.1 (Monotone convergence theorem or Beppo Levi theorem) Let (X, A, µ)
be a measure space and let (fn ) be a sequence of M+ (X) satisfying
1. fn (x) ≤ fn+1 (x) for all n and all x ∈ X (that is, fn is nondecreasing).
Then, Z Z
lim fn dµ = f dµ.
n→∞ X X
Z Z Z
Otherwise stated lim fn dµ = lim fn dµ, that is, we can interchange lim and .
n→∞ X E n→∞
Proof. Observe
R first that
R fn (x) ≤ f (x) for all x ∈ E and n ∈ IN∗ . It follows from Proposition
4.2 (i) that X fn dµ ≤ X f dµ for all n and therefore
Z Z
lim fn dµ ≤ f dµ
n→∞ X X
R
(the limit exists in [0, ∞] since the sequence { X fn dµ} is nondecreasing). So it remains to
prove the reverse inequality.
Let α ∈]0, 1[ and let h be a simple function such that 0 ≤ h ≤ f . Set
An = {x ∈ X | fn (x) ≥ αh(x)}.
Then An is an increasing sequence of sets and we claim that X = ∪∞ n=1 An . Indeed, let x ∈ X.
If h(x) = 0, then fn (x) ≥ αh(x) = 0 and so x ∈ An for all n. If not, choose ε < (1 − α)h(x).
Since fn (x) → f (x), there exists m such that f (x) − fn (x) < ε for all n ≥ m. In particular,
f (x) − fm (x) < (1 − α)h(x) and so f (x) − fm (x) < f (x) − αh(x) since h(x) ≤ f (x). Therefore
fm (x) ≥ αh(x) and so x ∈ Am . R
Now we already know that A 7→ A h is a measure ν, and as any measure it satisfies
ν(∪∞n=1 An ) = limn→∞ ν(An ), that is
Z Z
hdµ = lim h dµ.
X n→∞ A
n
R R R
Note also that An αh dµ ≤ fn dµ ≤
An Xfn dµ. Hence
Z Z
α lim h dµ ≤ lim fn dµ
n→∞ A n→∞ X
n
and thus Z Z
α h dµ ≤ lim h dµ.
X n→∞ E
Letting α → 1, we get Z Z
h dµ ≤ lim fn dµ.
X n→∞ X
Since this inequality holds for any simple function h such that 0 ≤ h ≤ f , by taking the
supremum over such h, we finally get
Z Z
f dµ ≤ lim f dµ.
X n→∞ X
Example 4.5 Consider the measure space (X, P(X), δa ) where δa is the Dirac measure at
a ∈ X. Let f : X → [0, ∞] be a nonnegative measurable function. Then
Z
f dδa = f (a).
X
The result is true for simple functions. Let now f ∈ M+ (X). Then there exists an increasing
sequence (hn ) of simple nonnegative functions that converge to f . By Beppo Levi’s theorem,
Z Z
f dδa = lim hn dδa = lim hn (a) = f (a).
X X
4.1. CONSTRUCTION AND PROPERTIES 51
Example 4.6 Consider the measure space (IN, P(IN), µ) where µ is the counting measure. Let
f : IN → [0, ∞] be a nonnegative measurable function. Then
Z X
f dµ = f (n).
X n∈IN
Then, (sk )k is an increasing sequence of simple functions that converges to f . By Beppo Levi’s
theorem
Z Z X k
X ∞
X
f dµ = lim sk dµ = lim sk (n) = lim f (n) = f (n).
IN k→∞ IN k→∞ k→∞
n∈IN n=0 n=0
Proof. By Theorem 2.4, there exist two nondecreasing sequences (fn ) and (gn ) of simple
nonnegative functions that converge respectively to f and g. Therefore (fn + gn ) is a nonde-
creasing sequence of simple nonnegative functions that converges to (f + g). By the previous
theorem
Z Z Z Z Z Z
lim fn = f, lim gn → g, and lim (fn + gn ) = (f + g)
n→∞ E E n→∞ E E n→∞ E E
and thus,
Z Z Z Z Z Z Z
(f + g) = lim fn + gn = lim fn + lim gn = f+ g.
E n→∞ E E n→∞ E n→∞ E E E
□
R R
Proposition 4.5 Let f, g ∈ M+ (X). If f = g µ−a.e, then X f dµ = X g dµ.
Proof. Let (
max(f, g) − min(f, g) if min(f, g) < ∞
h=
0 otherwise.
Then h ∈ M+ (X) and max(f,Rg) = min(f, g) + h. Furthermore, h = 0 on the set {f = g}.
Therefore, h = 0 µ−a.e.and so X h dµ = 0. By the previous proposition,
Z Z
max(f, g) dµ = min(f, g) dµ.
X X
R R
But both X f dµ and X g dµ lie between these equal values by monotonicity of the integral,
and therefore they are equal to each other. □
52 CHAPTER 4. THE LEBESGUE INTEGRAL
Lemma 4.5 (Additivity of domains) Let f ∈ M+ (X). If E1 and E2 are disjoint measurable
sets then Z Z Z
f= f+ f.
E1 ∪E2 E1 E2
Proof. The result follows from the additivity of domains for simple functions and the
approximation of functions in M+ (X) by simple functions (reason as above). □
Z Z
Corollary 4.2 Let f ∈ M+ (X). If E and A are measurable sets then f 1A = f.
E A∩E
Proof. By the previous lemma,
Z Z Z Z Z Z
f 1A = f 1A + f 1A = f+ 0= f.
E E∩A E\A E∩A E\A E∩A
Proposition 4.6 (Integration term by term) Let (fn ) be a sequence of M+ (X). Then
∞ ∞ Z
Z !
X X
fi = fi .
X i=1 i=1 X
Pn
Proof.
P∞ Set gn = i=1 fi . Then gn is nondecreasing sequence of M + (X) that converges to
i=1 fi . By the monotone convergence theorem
Z Z X ∞
lim gn = fi .
n→∞ X X i=1
On the other hand, it follows by induction from the previous proposition that
n n Z
Z Z !
X X
gn = fi = fi .
X X i=1 i=1 X
We denote by L1 (E)the set of summable functions on E. Other notations: L1IR (E), L1 (µ),
L1 (E, µ). The set of summable functions f : E → IR is denoted by LIR (E). So we have
LIR (E) ⊂ LIR (E). But as we shall see, there is little difference between these sets.
4.1. CONSTRUCTION AND PROPERTIES 53
f = f+ − f− and |f | = f + + f − .
Therefore, f is summable if and only if both f + and f − are summable. It is natural to define
the integral of a summable function by
Z Z Z
f= +
f − f −.
E E E
Example 4.7 Consider the measure space (X, P(X), δa ) where δa is the Dirac measure at
a ∈ X. Let f : X → IR be an arbitrary function (it is necessarily measurable). Then f is
summable if and only if |f (a)| < +∞, that is, if and only if f (a) ∈ IR. In this case
Z Z Z
f dδa = +
f dδa − f − dδa = f + (a) − f − (a) = f (a).
X X X
Example 4.8 Consider the measure space (IN, P(IN), µ) where µ is the counting measure. Let
f : IN → PIR be a function (it is necessary measurable). Then f is P summable if and only if
the series n∈IN |f (n)| is convergent, that is, if and only if the series n∈IN f (n) is absolutely
convergent. In this case we have
Z Z Z X X X X
f dµ = +
f dµ − f − dµ = f + (n) − f − (n) = (f + (n) − f − (n)) = f (n).
IN IN X n∈IN n∈IN n∈IN n∈IN
Proposition 4.7 Let (X, A, µ) be a measure space and E ∈ A. Then the following hold.
Proof. 1. Let f, g ∈ L1 (E) and α, β ∈ IR. Then |αf + βg| ≤ |α||f | + |β||g|. Consequently,
Z Z Z Z
|αf + βg| ≤ |α||f | + |β||g| = |α| |f | + |β| |g| < +∞,
E E E E
and so αf + βg ∈ L1 (E).
R R R R R
2. We check only that (f + g) = f + g. The homogeneity E αf = α E f is left as an
exercise (distinguish between the case α ≥ 0 and α < 0).
From the identity
f + g = (f + g)+ − (f + g)− = (f + − f − ) + (g + − g − ),
Now all the functions belong to M+ (E) and so by the linearity of the integral for functions in
M+ (E) we deduce that
Z Z Z Z Z Z
+ − − − +
(f + g) + f + g = (f + g) + f + g+
E E E E E E
54 CHAPTER 4. THE LEBESGUE INTEGRAL
and so Z Z Z Z Z Z
(f + g)+ − (f + g)− = f+ − f− + g+ − g−
E E E E E E
Proposition 4.8 Let f ∈ L1 (E), and let (En ) be a sequence of pairwise disjoint measurable
subsets of E. Then Z XZ
f= f
∪En En
Proof. Using the additivity of domains for nonnegative measurable functions, we have
Z Z Z XZ XZ XZ Z XZ
+ − + − − −
f= f − f = f − f = ( f − f )= f.
∪En ∪En ∪En En En En En En
Proof. We have
Z Z Z Z Z Z Z
+ − + − + −
f = f − f ≤ f + f = (f + f ) = |f |.
E E E E E E E
The space of summable complex valued functions is denoted by L1C (X). Other notations: L1C (µ),
L1 (X, C), L1 (X, µ, C).
It is not difficult to see that L1C (X) is a vector space on C and that the map f 7→ X f dµ
R
R
R R |R f|
Proof. If f = 0, the inequality is satisfied. Assume therefore that f ̸= 0 and let α = f
.
Then Z Z Z
f =α f= αf.
R R R R
Now, αf is real. Therefore αf = Re αf = Re(αf ) (the last equality holds by definition).
It follows that Z Z Z Z
f = Re(αf ) ≤ |αf | = |f |
because |α| = 1. □
Counter-example. Consider the sequence defined by fn (x)R= nx(1 − x2 )n for x ∈ [0, 1]. This
1 n
sequence converges to the zero function. Note however that 0 fn (x) dx = 2n+2 → 12 whereas
R1
0 lim fn (x) dx = 0.
R
There are of course many cases in which the interchange of and lim is possible. In the
Riemann theory, a sufficient condition is uniform convergence.
R However this is a restrictive
requirement. A better criterion for the interchange of and lim is the monotone convergence
theorem. Now we give another criterion known as the Lebesgue dominated convergence theorem.
But first, a lemma.
Proof. Set gn = inf fk . Then gn ≤ fn , gn ≤ gn+1 and lim inf fn = lim gn . Therefore
R R k≥n
R R
gn ≤ fn and so lim gn ≤ lim inf fn . It follows from this and the monotone convergence
theorem that Z Z Z Z
lim inf fn = lim gn = lim gn ≤ lim inf fn .
Theorem 4.2 (The Lebesgue dominated convergence theorem) Let (X, A, µ) be a mea-
sure space and E ∈ A. Let (fn ) be a sequence of measurable functions from E to IR or C.
Suppose that
Z Z
(b) lim fn dµ = f dµ.
n→∞ E E
Proof. Let A be a set on which the assumptions hold and such that µ(Ac ) = 0. Modify f, fn
and g by setting f (x) = fn (x) = g(x) = 0 for x ∈ / A. This does not modify the measurability
and summability properties but (i) and (ii) now hold everywhere.
Now the fn are summable since |fn | ≤ g and g is summable. Also it follows by letting
n → ∞ in (ii) that |f | ≤ g and therefore f is also summable. Next, |f − fn | ≤ |f | + |fn | ≤ 2g
and so setting φn := 2g − |f − fn | we have that φn is summable and nonnegative. Since fn → f ,
it follows that lim inf φn = lim φn = 2g and by Fatou’s lemma
Z Z Z Z Z
2g = lim inf φn ≤ lim inf φn = 2g + lim inf (−|f − fn |).
E E E E E
R R
It follows that lim inf E (−|f − fn |) ≥ 0 and so lim sup E |f − fn | ≤ 0. Therefore
Z
lim |f − fn | = 0.
n→∞
Corollary 4.4 (Integration term by term) Let (fn ) be a sequence of measurable functions
from X to IR or C. Suppose that
XZ
|fn | dµ < +∞.
n≥1 X
P
Then n≥1 fn is µ− integrable and
Z X XZ
fn dµ = fn dµ.
X n≥1 n≥1 X
P
Proof. Let g = n≥1 |fn |. Using Proposition 4.6, we get
Z Z X XZ
g dµ = |fn | dµ = |fn | dµ < ∞.
X X n≥1 n≥1 X
Therefore g ∈ L1 (X). It follows that g is finite µ−a.e. and so the series n≥1 fn is absolutely
P
convergent (and
P hence convergent) a.e.
Let gn = nk=1 fk . P
P
Then (gn ) converges to k≥1 fk a.e. and |gn | ≤ g a.e. By the dominated
convergence theorem, k≥1 fk is µ− integrable and
Z Z X
lim gn dµ = fk dµ,
n→∞ X X k≥1
that is
n
Z X Z X
lim fk = fk dµ.
n→∞ X X k≥1
k=1
R Pn Pn R P R
But limn→∞ X( k=1 fk ) dµ = limn→∞ k=1 X fk dµ = k≥1 X fk dµ. Therefore,
XZ Z X
fk dµ = fk dµ.
k≥1 X X k≥1
4.3. RELATIONS WITH THE RIEMANN INTEGRAL 57
Proof. Let (Pn ) be an increasing1 sequence of partitions of [a, b] such that ||Pn || → 0. For
example one can take
k
Pn = {a + n (b − a) | k = 0, . . . , 2n },
2
n
which divides [a, b] into 2 equal subintervals. Then form the Darboux upper and lower sums
corresponding to each Pn
X X
U (f, Pn ) = Mn,i ∆xn,i , L(f, Pn ) = mn,i ∆xn,i .
i∈In i∈In
We claim that Z b
lim U (f, Pn ) = lim L(f, Pn ) = f (x) dx.
n→∞ n→∞ a
Indeed, let ε > 0 be given, then by Theorem A.3, there exists n0 such that U (f, Pn )−L(f, Pn ) <
Rb Rb Rb
ε for all n ≥ n0 . Therefore a f (x) dx ≤ U (f, Pn ) ≤ a f (x) dx + ε. Similarly, a f (x) dx − ε ≤
Rb
L(f, Pn ) ≤ a f (x) dx.
Now define two sequence of step functions Gn and gn by
Gn (x) = Mn,i if xi−1 ≤ x < xi ; gn (x) = mn,i if xi−1 ≤ x < xi ; Gn (b) = gn (b) = f (b).
Otherwise stated,
X X
Gn = Mn,i 1[xi−1 ,xi [ + f (b)1{b} , gn = gn,i 1[xi−1 ,xi [ + f (b)1{b} .
i∈In i∈In
Then clearly, Z Z
Gn dλ = U (f, Pn ) and gn dλ = L(f, Pn ).
[a,b] [a,b]
Moreover,
G1 (x) ≥ G2 (x) ≥ · · · ≥ f (x) and g1 (x) ≤ g2 (x) ≤ · · · ≤ f (x).
Hence
G(x) := lim Gn (x) ≥ f (x) and g(x) := lim gn (x) ≤ f (x).
n→∞ n→∞
Note that G and g and measurable as limits of measurable functions. Also G and g are summable
on [a, b] since they are bounded. It follows from the monotone convergence theorem2 that
Z Z Z b
G dλ = lim Gn dλ = lim U (f, Pn ) = f (x) dx,
[a,b] n→∞ [a,b] n→∞ a
and Z Z Z b
g dλ = lim gn dλ = lim L(f, Pn ) = f (x) dx.
[a,b] n→∞ [a,b] n→∞ a
1
By increasing we mean that Pn+1 is a refinement of Pn
2
R
R (Gn − G) is a decreasing
R sequence
R in M+ ([a, b]) that converges to 0 and such that (G1 − G) < ∞. Therefore,
(Gn − G)R → 0, i.e GnR → G. Also, (gRn − g1 ) Ris an increasing sequence M+ ([a, b]) that converges to g − g1 .
Therefore (gn − g1 ) → (g − g1 ) and so gn → g
58 CHAPTER 4. THE LEBESGUE INTEGRAL
Therefore, Z Z
G dλ = g dλ
[a,b] [a,b]
and so G = g almost everywhere. Hence f = G = g a.e. This implies first that f is measurable
(because G is measurable as a limit of step functions), and second that
Z Z Z b
f dλ = G dλ = f (x) dx.
[a,b] [a,b] a
□
Next we show the relation between
R∞ absolutely convergent Riemann integrals and Lebesgue
Rb
integrals. Recall that the integral a f (x) dx is called absolutely convergent if limb→∞ a |f (x)| dx
exists.
Theorem 4.4 Let f : [a, ∞[→ IR be Riemann integrable on any compact interval [a, b]. Then
the following conditions are equivalent.
Z ∞
(i) f (x) dx is absolutely convergent.
a
Proof. (i)⇒(ii). Let fn = f 1[a,n] . Then (fn ) is a sequence of measurable functions that
converges to f and so f is measurable. Now the sequence (|fn |) is nondecreasing and converges
to |f |. By the monotone convergence theorem,
Z Z Z
|f | dλ = lim |fn | dλ = lim |f | dλ.
[a,∞[ n→∞ [a,∞[ n→∞ [a,n]
Theorem 4.5 Let f : [a, b[→ IR be Riemann integrable on any compact interval [a, c] ⊂ [a, b[.
Then the following conditions are equivalent.
Z b
(i) f (x) dx is absolutely convergent.
a
Proof. Suppose first that f is Riemann integrable. We proceed as in the proof of Theorem
4.3: that is, we consider an increasing sequence (Pn ) of partitions of [a, b] whose norms tend to
zero, and then define two monotone sequences of step functions Gn and gn converging to G and
g respectively so that g(x) ≤ f (x) ≤ G(x) and G = g a.e. Now let
[
E := {x ∈ [a, b] | g(x) ̸= G(x)} ∪ Pn .
n
S
Then E has measure zero since n Pn is countable. We claim that f is continuous on [a, b]\E.
Indeed, let x0 ∈ [a, b]\E and let ε > 0 be given. Then g(x0 ) = G(x0 ) and since gn (x0 ) → g(x0 )
and Gn (x0 ) → G(x0 ) we deduce that Gk (x0 ) − gk (x0 ) < ε for all k large enough. Choose and
fix such k. Now x0 ∈ / Pk and so it must be an interior point of some subinterval of the partition
Pk where gk and Gk are constant. Hence there exists δ > 0 such that Gk (x) = Gk (x0 ) and
gk (x) = gk (x0 ) for all |x − x0 | < δ. From the above and the inequalities g(x) ≤ f (x) ≤ G(x),
we conclude that
inside [a, b]). Then each interval ]tj−1 , tj [ is contained either in some Ink or in some Jxi . Let
J = {j | ]tj−1 , tj [⊂ Ink for some k} and ∆j = tj − tj−1 . Then
N
X
U (f, P ) − L(f, P ) = ∆j sup{|f (y) − f (z)| | y, z ∈ [tj−1 , tj ]}
j=1
X X ε
≤ ∆j 2M + ∆j
2(b − a)
j∈J j ∈J
/
ε ε
≤ 2M + (b − a) =ε
4M 2(b − a)
Hence f is Riemann integrable. □
Remark 4.3 The characteristic function f of Q ∩ [0, 1] is not Riemann integrable because it is
discontinuous everywhere (this is because every interval, no matter how small, contains rational
and irrational numbers). However it is Lebesgue integrable because it is Lebesgue measurable
(as Q ∩ [0, 1] is Lebesgue measurable) and its integral, by definition, is λ(Q ∩ [0, 1]) = 0.
On the other hand the characteristic function of the middle third Cantor set (restricted to
[0,1]) is Riemann integrable.
Remark. Note that the sequence (1 + x/n)n is nondecreasing for x ≥ 0 and therefore we can
also use the monotone convergence theorem.
Proof. We need to show that for every sequence {tn } ⊂ J converging to s0 , we have
I(tn ) → I(s0 ). Let fn (x) = f (x, tn ) and h(x) = f (x, s0 ). Then fn → h a.e. The dominated
convergence theorem gives the result. □
Theorem 4.8 (Derivation under the integral) Let (X, A, µ) be a measure space and J be
an open interval of IR and f : X × J → IR (or C) be a function satisfying
∂f (x, s)
|φn (t)| ≤ sup | | ≤ g(x).
s ∂s
By the dominated convergence theorem
The integral of a function f over an interval [a, b] is thought of as the area of the planar region
bounded by the graph of f , and the lines y = 0, x = a and x = b. But how to define this area?
The Riemann approach is to divide the interval [a, b] into many small subintervals [xi−1 , xi ],
next, to choose a point ci in each subinterval and then consider the rectangles with base [xi−1 , xi ]
and height f (ci ). The sum of the areas of all these small rectangles is then an approximation
of the ”area” under the graph of f . It is natural to expect that the more we have rectangles,
or the finer is the decomposition of the interval [a, b], the better will be the approximation of
the area under the graph of f . In what follows, we shall elaborate these intuitive ideas and
construct a theory of the integral known as the Riemann integral.
A.1 Definitions
Definition A.1 Let [a, b] be a given interval. A partition P of [a, b] is a finite set of points
{x0 , x1 , . . . , xn } such that a = x0 < x1 < · · · < xn−1 < xn = b. We write ∆xi = xi − xi−1 for
i = 1, . . . , n. The biggest of the numbers ∆xi is called the norm of the partition and is denoted
by ∥P ∥.
Definition A.2 Let P1 and P2 be two partitions of [a, b]. We say that P2 is a refinement of P1
if P1 ⊂ P2 .
Let f : [a, b] → IR be a bounded function. In order to define the integral of f over [a, b], we
proceed as follows. Let P = {x0 , . . . , xn } be a partition of [a, b] and let c1 , . . . , cn be a sequence
of points in [a, b] such that ci ∈ [xi−1 , xi ] for all i = 1, . . . , n. Then we form the finite sum
n
X
σ(f, P, c1 , . . . , cn ) = f (ci )∆xi
i=1
called a Riemann sum of f corresponding to the the partition P . This Riemann sum is said to
have a limit I as the partition becomes finer and finer, or as ∥P ∥ → 0 if for every ε > 0, there
exists a number δ > 0 such that
|σ(f, P ) − I| < ε
for every partition P of [a, b] such that ∥P ∥ < δ and any choice of the points c1 , . . . , cn . In this
case we write
lim σ(f, P ) = I.
∥P ∥→0
63
64 APPENDIX A. THE RIEMANN INTEGRAL
Proof. Take a uniform partition of [a, b], that is, divide [a, b] into equal intervals each of
length b−a b−a
n and choose ck = a + k n , we obtain the Riemann sum
n
X b−a b−a
f a+k .
n n
k=1
Z b
According to the above, this sum tends to f (x) dx as n → ∞. □
a
Rb Ra
Definition
R a A.4 If a > b and f is integrable on [b, a] we set a f (x) dx = − b f (x) dx. Also
we set a f (x) dx = 0.
Mi = sup f (x)
xi−1 ≤x≤xi
mi = inf f (x).
xi−1 ≤x≤xi
We need the following quantities called respectively the Darboux upper sum and the Darboux
lower sum of f corresponding to the partition P
n
X
U (f, P ) = Mi ∆xi
i=1
Xn
L(f, P ) = mi ∆xi .
i=1
Lemma A.1 Let P be a partition of [a, b] and let f be a bounded function. Then,
Proof. Straightforward. □
(ii) U (P ∗ , f ) ≤ U (P, f ).
Otherwise stated, inserting an extra point into a partition increases the lower sum and decreases
the upper sum.
Proof. We prove (i). Let P = {x0 , . . . , xn }. It is enough to prove the claim when P ∗
contains just one point more than P . Let this point be x∗ . Then there is i = 1, . . . , n such that
xi−1 < x∗ < xi . Let
mi = inf f (x).
xi−1 ≤x≤xi
Now
Corollary A.1 For any partition P and any partition Q of [a, b],
Therefore, sup L(f, P ) ≤ inf U (f, Q) where the sup and inf are taken over all possible partitions
of [a, b].
Proof. Let P and Q be two partitions of [a, b] and let P ∗ = P ∪ Q so that P ∗ is a refinement
of both P and Q. It follows from the theorem above that L(P, f ) ≤ L(P ∗ , f ) ≤ U (P ∗ , f ) ≤
U (Q, f ). □
Proof. It is enough to prove the lemma when P ∗ has just one more point than P . Let this
point be x∗ . Then there is i = 1, . . . , n such that xi−1 < x∗ < xi . Then,
U (f, P ) − U (f, P ∗ ) ≤ ||f ||(x∗ − xi−1 ) + ||f ||(xi − x∗ ) + ||f ||(xi − xi−1 ) = 2||f ||(xi − xi−1 )
≤ 2||f ||||P ||.
Lemma A.3 By a choice of the intermediate points ci , the Riemann sum σ(f, P, c1 , . . . , cn ) can
be made arbitrarily close to the upper Darboux sum U (f, P ) as well as to the lower Darboux
sum L(f, P ).
Proof. By the property of the supremum, for any ε > 0, there exists a point ci in [xi−1 , xi ]
such that
ε
Mi − < f (ci ) ≤ Mi .
b−a
Multiplying both inequalities by ∆xi and summing from i = 1 to i = n, we get
This proves the first assertion of the lemma. The second assertion is proved similarly by using
the property of the infimum. □
Theorem A.4 Let f : [a, b] → IR be a bounded function. Then the following conditions are
equivalent.
(ii) For every ε > 0 there exits a number δ > 0 such that U (f, P ) − L(f, P ) < ε, for any
partition P of [a, b] such that ∥P ∥ < δ.
(iv) For every ε > 0 there exits a partition P of [a, b] such that U (f, P ) − L(f, P ) < ε.
Rb
In addition, if one of the above conditions hold then a f (x) dx = sup L(f, P ) = inf U (f, P ).
Rb
Proof. (i)⇒(ii). Let f be integrable and set I = a f (x) dx. Let ε > 0 be given. Then by
definition, there exists a number δ > 0 such that
ε ε ε
|σ(f, P, c1 , . . . , cn ) − I| < , or I− < σ(f, P, c1 , . . . , cn ) < I + ,
2 2 2
for every partition P of [a, b] such that ||P || < δ and any choice of the intermediate points ci .
By the previous lemma, the lower Darboux sum L(f, P ) belongs to this interval for some choice
of ci . The same is true for the upper Darboux sum U (f, P ). This means that both sums belong
to the same interval of length ε.
A.3. CLASSES OF INTEGRABLE FUNCTIONS 67
where the inf and sup are taken over all possible partitions of [a, b]. Condition (ii) then implies
that
0 ≤ inf U (f, P ) − sup L(f, P ) < ε.
for all ε > 0. This means that inf U (P, f ) = sup L(P, f ).
(iii)⇒(i). Let I be this common number. It follows from Darboux’s theorem that
Lemma A.4 Let f : [α, β] → IR be a bounded function. Then the oscillation of f is also equal
to sup{|f (x) − f (y)| | x, y ∈ [α, β]}.
68 APPENDIX A. THE RIEMANN INTEGRAL
Theorem A.7 Let f : [a, b] → IR be a function and let c ∈]a, b[. If f is integrable on [a, c] and
on [c, b], then f is integrable on [a, b] and
Z b Z c Z b
f (x) dx = f (x) dx + f (x)dx.
a a c
P
Proof. Consider the sum ωi ∆xi for some partition. If the point c belongs to the partition,
then this sum consists of two similar sums for the intervals [a, c] and [c, b], each of which tends
to zero as the norm of the partition tends to zero. The conclusion remains true also in the case
where c does not belong to the partition: adding c to the partition we would change only one
term in the sum which itself tends to zero. □
Theorem A.9 A bounded function having finitely many points of discontinuities is Riemann
integrable.
Example. Consider the function x 7→ sin( x1 ). This function is discontinuous at x = 0 but it is
not piecewise continuous. According to to the previous theorem it is integrable on any compact
interval.
The above theorem and the next one can be proved inside the Riemann theory but they can be
easily deduced from Theorem 4.6.
c) f g is integrable.
Proof. a) and b) are proved by taking Riemann sums and going to the limit.
c) is proved in the exercises . □
Proposition A.3 If an integrable function f : [a, b] → IR satisfies f (x) ≥ 0 for every x ∈ [a, b],
Z b
then f (x) dx ≥ 0.
a
Proof. If f (x) ≥ 0 then any Riemann sum is nonnegative. Going to the limit, we obtain the
result. □
Corollary A.2 If f and g are two integrable functions satisfying f (x) ≤ g(x) for all x ∈ [a, b],
Z b Z b
then f (x) dx ≤ g(x) dx.
a a
We rarely compute integrals by going to the definition. The most practical way to compute
the integral of elementary functions is given by the following theorem which connects integration
and the search for antiderivatives.
Proof. The result follows from the derivative of a product and the second fundamental
theorem of Calculus.
Proof. By the chain rule and the second fundamental theorem of Calculus, both sides are
equal to F (φ(b)) − F (φ(a)) where F is an antiderivative of f .
A.6. LIMITS AND INTEGRATION 71
The above example show that one should be careful before interchanging limits and integrals.
However, there is a stronger form of convergence that permits this interchange and also preserves
the properties of continuity and integrability.
Example A.2 The sequence of functions defined by fn (x) = nx(1 − x2 )n for x ∈ [0, 1] does
not converge uniformly because
n
1 n 1
sup fn (x) = fn √ = √ 1− → ∞ as n → ∞.
x∈[0,1] 2n + 1 2n + 1 2n + 1
Otherwise stated,
Z b Z b
lim fn (t) dt = lim fn (t) dt.
n→∞ a a n→∞
Rb
Proof. The first step is to show that the sequence { a fn } converges. Let ε > 0. Since {fn }
converges uniformly, there exists an integer N such that |fn (x) − fm (x)| < ε for all x ∈ [a, b]
and all n, m ≥ N . It follows that
Z b Z b Z b Z b
fn (x) dx − fm (x) dx = [fn (x) − fm (x)] dx ≤ |fn (x) − fm (x)| dx ≤ ε(b − a)
a a a a
for all n, m ≥ N . This means that the sequence of integrals is a Cauchy sequence and therefore
is convergent to some limit L.
Chose n ≥ N . Since fn is integrable, there exists δ > 0 such that |σ(fn , P, c1 , . . . , cn ) −
Rb
a fn (x) dx| ≤ ε for any partition P with ∥P ∥ ≤ δ. From the other hand, it is easily seen that
|σ(f, P, c1 , . . . , ck ) − σ(fn , P, c1 , . . . , ck )| ≤ ε(b − a).
The result now follows from
Z b Z b
|σ(f, P ) − L| ≤ |σ(f, P ) − σ(fn , P )| + σ(fn , P ) − fn (x) dx + fn (x) dx − L .
a a
Then f ′ (x) = g(x). We shall prove that fn converges uniformly to f . Taking the difference
between the last two identities, it follows from the triangle inequality that
Z x
|fn (x) − f (x)| ≤ |fn (x0 ) − ℓ| + |fn′ (t) − g(t)| dt .
x0
Now given ε > 0, we can make |fn (x0 ) − ℓ| < ε for all n large enough. Also we can make |fn′ (t) −
Rx
g(t)| < ε for all n large enough and all t ∈ [a, b]. Thus integrating we get x0 |fn′ (t) − g(t)| dt ≤
(b − a)ε for all x ∈ [a, b]. This means that
for all n large enough and all x ∈ [a, b]. Since ε is arbitrary, the theorem is proved. □
Now we give the analog of the above theorems for series. The following theorem can be
easily proved by taking partial sums.
P
Theorem A.17 Let fn : I → IR be sequence of functions such that the series fn converges
uniformly to a function f .