0% found this document useful (0 votes)

26 views76 pages

Notes GSzabo

Uploaded by

Tan Ch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views76 pages

Notes GSzabo

Uploaded by

Tan Ch

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 76

Lecture notes Probability and Measure

Gábor Szabó KU Leuven – G0P63a

Academic Year 2023–2024, Semester 1

Contents
1.1 σ-Algebras and Measurable Spaces

1.2 Measurable functions . . . . . . . . .

1 Lebesgue’s Integration Theory

..............2..............4

1.3 Measures on σ-algebras, Measure Spaces . . . . . . . . . . . .

1.4 Integration theory . . . . . . . . . . . . . . . . . . . . . . . . .

1.5 Convergence Theorems . . . . . . . . . . . . . . . . . . . . . . 19

2.3 Application: The Lebesgue measure on R

2.4 Application: Lebesgue–Stieltjes Measures

2.5 Product measures . . . . . . . . . . . . .

2.6 Infinite products of probability measures

2 Carathéodory’s Construction of Measures
23

2.1 Measures on Semi-Rings and Rings . . . . . . . . . . . . . . . 23

2.2 Outer measures . . . . . . . . . . . . . . . . . . . . . . . . . . 26

. . . . . . . . . . . . 30 . . . . . . . . . . . . 33 . . . . . . . . . . . . 35 . . . . . . . . . . . . 38

3.2 Independence . . . . . .

3.3 Law of Large Numbers .

3.4 Central Limit Theorem .

..
..

3 Probability
3.1 Probability spaces and random variables

. . . . . . . . . . . . 40 . . . . . . . . . . . . . 43

. . . . . . . . . . . . . 46 . .

. . . . . . . . . . . . . . 50 . .

1 Lebesgue’s Integration Theory

1.1 σ-Algebras and Measurable Spaces
Definition 1.1. Let Ω be a non-empty set. A σ-algebra M on Ω is a set of
subsets of Ω satisfying
A1. ∅ ∈ M. A2. If E ∈ M, then Ec ∈ M. A3. For any sequence En ∈ M, one
has ⋃ n∈N En ∈ M. We will refer to the pair (Ω,M) as a measurable space.
Remark 1.2. A σ-algebra M is closed with respect to all countable oper
ations on sets one can perform using complement, union, intersection, and
difference. In fact, any intersection and difference can be rewritten in terms
of unions and complements, namely

E ∩ F = (Ec ∪ F c)c ,

E \ F = (Ec ∪ F )c .

Example 1.3. The entire power set 2Ω of Ω is the largest possible σ-algebra
on Ω, whereas {∅, Ω} is the smallest one. We may also consider

M = {E ⊆ Ω | E or Ec is countable} ,

which sits strictly in between if Ω is an uncountably infinite set.

Proposition 1.4. If {Mi}i∈I is an arbitrary family of σ-algebras on a set Ω,

then the intersection ⋂ i∈I Mi is a σ-algebra. Consequently, if S ⊆ 2Ω is

a family of subsets of Ω, then there exists a smallest σ-algebra M containing

S. We will refer to it as the σ-algebra generated by S.

Proof. It is a trivial exercise to prove the first part of the statement. For the
second part, if S is given, consider the family { M ⊆ 2Ω | M is a σ-algebra
with S ⊆ M } . Since this family contains 2Ω , it is non-empty, and hence the
intersection of this family is the smallest possible σ-algebra that contains S.

Example 1.5. Recall that a topology T on a set Ω is a family of subsets of Ω

satisfying

T1. ∅, Ω ∈ T .

T2. For any family of sets O ⊆ T , one has ⋃ O ∈ T . T3. For O1, O2 ∈ T ,
one has O1 ∩ O2 ∈ T .

If we are given a topological space (Ω, T ), then the σ-algebra generated by

T is called the Borel-σ-algebra of (Ω, T ). Its elements are called the Borel
subsets of Ω.

Remark 1.6. Given a topology T as above, a base B ⊆ T is a subset with the

property that every set O ∈ T can be expressed as a union of a family of sets
in B. If T has a countable base B, then every σ-algebra M containing B will
necessarily also contain T . It follows that the Borel-σ-algebra generated by
T coincides with the σ-algebra generated by B.

Proposition 1.7. Let (Ω, d) be a separable metric space. countable base

consisting of open balls, i.e., sets of the form

B(x, r) = {y ∈ Ω | d(x, y) < r}

Then Ω has a

for x ∈ Ω and r > 0. In particular, the Borel-σ-algebra on Ω coincides with

the σ-algebra generated by all open balls.

Proof. In light of the previous remark, the second part of the statement
follows automatically if we can show that the metric topology on Ω has a
countable base consisting of open balls. As we assumed Ω to be separable,
choose a countable dense set D ⊆ Ω, and consider

B = {B(x, r) | x ∈ D, 0 < r ∈ Q} .

This is a countable family of open balls, and we claim that it is a base for the
metric topology. Indeed, let O ⊆ Ω be an open set. For x ∈ D ∩ O, set Rx =
{0 < r ∈ Q | B(x, r) ⊆ O}. Then evidently

⋃ {B(x, r) | x ∈ D ∩ O, r ∈ Rx} ⊆ O.

We claim that this is an equality: Given y ∈ O, it follows from the definition

of openness that there exists δ > 0 with B(y, δ) ⊆ O. By making δ smaller, if
necessary, we may assume δ ∈ Q. Since D is a dense subset of Ω, there exists
x ∈ D with d(x, y) < δ/2. By the triangle inequality, we have y ∈ B(x, δ/2) ⊆
B(y, δ) ⊆ O. In summary, we have shown that O is the countable union of sets
in B, which finishes the proof.
Example 1.8. Let us consider Ω = R with the standard topology. Then half-
open intervals are Borel because, for example, for a, b ∈ R with a < b, one
has [a, b) = ⋂ (a − n 1 , b).

n∈N

Singleton sets are also Borel since they are closed. To summarize, there are
in general many more Borel sets than open sets. Note that in light of the
previous remark, the Borel-σ-algebra on R is generated by all open intervals.

1.2 Measurable functions

Definition 1.9. Let (Ω1,M1) and (Ω2,M2) be two measurable spaces. A map
f : Ω1 → Ω2 is called measurable, if for every subset E ⊆ Ω2, E ∈ M2
implies f−1(E) ∈ M1.

Remark 1.10. The composition of two measurable maps is always measur

able.

Proposition 1.11. Let (Ω1,M1) and (Ω2,M2) be two measurable spaces and f
: Ω1 → Ω2 a map. Suppose that M2 is the σ-algebra generated by a set S ⊆
2Ω2 .

Then f is measurable if and only if for all O ∈ S, one has

f−1(O) ∈ M1.

Proof. The “only if” part is trivial, so let us consider the “if part”. Consider
the set

M = { O ⊆ Ω2 | f−1(O) ∈ M1 } .

By assumption, we have S ⊆ M. The claim amounts to showing that M2 ⊆ M,

and by assumption on S, it therefore suffices to show that M is a σ algebra.
But this will be part of the exercise sessions.
Proposition 1.12. Let (Ω,M) be a measurable space and (Y, T ) a topological
space. For a map f : Ω → Y , the following are equivalent:

i. f is measurable with respect to the Borel-σ-algebra on Y .

ii. For all O ∈ T , one has f−1(O) ∈ M.

If furthermore there exists a countable base B ⊆ T , then this is equivalent to

(iii) For all O ∈ B, one has f−1(O) ∈ M.

Proof. By definition of the Borel-σ-algebra as the one generated by T , the

equivalence (i)⇔(ii) becomes a special case of Proposition 1.11. If we
further more assume that B is a countable base of T , then the equivalence
(i)⇔(iii) is also a special case of Proposition 1.11 in light of Remark 1.6.

Theorem 1.13. Let (Ω,M) be a measurable space and (Y, d) a metric space.
We equip Y with the Borel-σ-algebra associated to the metric topology. Sup
pose that a sequence of measurable functions fn : Ω → Y converges to a map
f : Ω → Y pointwise. Then f is measurable.

Proof. As a consequence of Proposition 1.12, it suffices to show that the

preimages of open sets under f belong to M. Since preimages respect com
plements of sets, it suffices to show that the preimage of every closed set C
⊆ Y belongs to M. Since we have a metric space, we have

C = C = ⋂ { y ∈ Y ∣∣∣∣ x∈C inf d(x, y) < k 1 }

k∈N ︸

︷︷

=:Ck

Then each of the sets Ck is open. For every x ∈ Ω, we use fn(x) → f(x) and
observe

x ∈ f−1(C)
⇔ f(x) ∈ C ⇔ ∀ k ∈ N : f(x) ∈ Ck

fn(x)→f(x) ⇔

∀ k ∈ N : ∃ n0 ∈ N : ∀ n ≥ n0 : fn(x) ∈ Ck ⇔ x ∈ ⋂ ⋃ ⋂ f−1 n (Ck).

k∈N n0∈N n≥n0

In summary, the preimage f−1(C) can be realized as a countable intersection

of countable unions of countable intersections of sets in M, and hence also
belongs to M.

Notation 1.14. We will equip [0, ∞] := [0, ∞) ∪ {∞} with the topology of the
one point compactification, that is, we define a subset O ⊆ [0, ∞] to be open
when the following holds:

if 0 ∈ O, then there exists ε > 0 with [0, ε) ⊆ O.

for every x ∈ O ∩ (0, ∞), there exists ε > 0 with (x − ε, x + ε) ⊆ O.
if ∞ ∈ O, then there is b ≥ 0 with (b, ∞] ⊆ O.

This topology is induced by a metric, for example by

d(x, y) = ∣∣∣∣ 1 + 1 x − 1 + 1 y ∣∣∣∣, where we follow the convention 1 := 0.

We extend the usual addition and

multiplication from [0, ∞) to [0, ∞] by defining1

x + ∞ := ∞,

x · ∞ := ∞ (x > 0),

0 · ∞ := 0.

Then the addition map + : [0, ∞] × [0, ∞] → [0, ∞] is continuous, but this is
not true for the multiplication map. 2 We also extend the usual order relation
“≤” of numbers to [0, ∞] in the obvious way.
1.3 Measures on σ-algebras, Measure Spaces
Definition 1.15. Let (Ω,M) be a measurable space. A (positive) measure µ
on (Ω,M) is a map M → [0, ∞] satisfying:

M1. µ(∅) = 0. M2. µ is σ-additive, i.e., for every sequence En ∈ M

consisting of pairwise

disjoint sets, one has

µ ( n∈N ⋃ En ) = n∈N ∑ µ(En).

The triple (Ω,M, µ) is called a measure space. If µ(Ω) < ∞, we call µ a finite
measure, and if more specifically µ(Ω) = 1, we call it a probability measure
and the triple (Ω,M, µ) a probability space. If there exists a sequence En ∈ M
with µ(En) < ∞ and Ω = ⋃ n∈N En, then we say that µ is σ-finite.

Remark 1.16. Note that

the series ∑ n∈N µ(En) is a series in [0, ∞], which we may define as
the supremum supN≥1 ∑N n=1 µ(En).
σ-additivity implies finite additivity: If E1, E2 ∈ M are two disjoint
sets, then

µ(E1 ∪ E2)

= µ(E1 ∪ E2 ∪ ∅ ∪ ∅ ∪ . . . ) = µ(E1) + µ(E2) + µ(∅) + µ(∅) + . . .

︷︷

= µ(E1) + µ(E2).
1Keep in mind that we implicitly force everything to be commutative, so the
order of addition and multiplication does not matter here by default.

2Convince yourself why this is not the case!

• If there is at least some E ∈ M with µ(E) < ∞, then σ-additivity already

implies µ(∅) = 0:

µ(E) = µ(E ∪ ∅ ∪ ∅ ∪ . . . )

= µ(E) + ∞ · µ(∅).

If µ(E) < ∞, then the above can only happen when µ(∅) = 0.

Example 1.17 (Counting measure). For any non-empty set Ω, we can con
sider the σ-algebra 2Ω and define a measure µ via

µ(E) = #E ∞

, E is finite , E is infinite.

Example 1.18 (Dirac measure). If (Ω,M) is any measurable space with a

distinguished point x ∈ Ω, one can define the measure δx via

δx(E) = 01

, x ∈ E , x /∈ E.

Example 1.19. Consider R with its Borel-σ-algebra B. We will later con

struct the Lebesgue measure on B, which is a unique measure µ : B → [0, ∞]
with the property that µ([a, b]) = b − a for all a, b ∈ R with a ≤ b. An anal
ogous unique measure exists on Rn as well.

Proposition 1.20. Let (Ω,M, µ) be a measure space. Then µ has the fol
lowing extra properties:

i. µ is monotone, i.e., for E, F ∈ M with E ⊆ F one has µ(E) ≤ µ(F ). If

µ(F ) < ∞, then µ(E) = µ(F ) − µ(F \ E).
ii. µ is subadditive, i.e., for E, F ∈ M one has µ(E ∪ F ) ≤ µ(E) + µ(F ).
iii. µ is σ-subadditive, i.e., for a sequence En ∈ M one has µ(⋃ n∈N En) ≤
∑ n∈N µ(En).
iv. If En ∈ M is an increasing sequence (w.r.t. “⊆”), then µ(⋃ n∈N En) =
supn∈N µ(En).
v. If En ∈ M is a decreasing sequence and µ(E1) < ∞, then µ(⋂ n∈N En) =
infn∈N µ(En).

Proof. (i): If E ⊆ F , then we may write F = E ∪ (F \ E) as a disjoint union,

so it follows from additivity that µ(F ) = µ(E) + µ(F \ E) ≥ µ(E). If µ(F ) < ∞,
we also obtain the last part of the claim by substracting µ(F \E).

(ii): One has E ∪ F = (E \ F ) ∪ F , which is a disjoint union. Thus µ(E ∪ F )

= µ(E \ F ) + µ(F ) ≤ µ(E) + µ(F ).

(iii)+(iv): We construct a sequence Fn ∈ M as follows. We set F1 = E1 and

Fn = En\(⋃ k<n Ek) for n ≥ 2. Then the sequence Fn consists of pairwise
disjoint sets, but at the same time one has that ⋃ k≤n Ek = ⋃ k≤n Fk for all n
≥ 1. Hence µ(⋃ n∈N En) = µ(⋃ n∈N Fn) = ∑ n∈N µ(Fn) ≤ ∑ n∈N µ(En),
which proves (iii). For (iv), additionally assume that En was increasing.
Then

µ ( n∈N ⋃ En )

= ∑ µ(Fn)

n∈N

= sup ∑ µ(Fk)

n∈N k≤n

= sup n∈N µ ( k≤n ⋃ Fk ) = sup µ(En).

n∈N

(v): Consider Fn = E1 \ En for all n ≥ 1. As En was assumed to be

decreasing, the sets Fn will be increasing, and moreover ⋃ n∈N Fn = E1 \
( ⋂ n∈N En ) .
By using (i) and (iv), we see

µ(E1) − n∈N inf µ(En)

= sup µ(Fn)

n∈N

= µ ( n∈N ⋃ Fn )
= µ(E1) − µ ( n∈N ⋂ En )

This implies the claim. Remark 1.21. In condition (v) above, it is really
necessary to assume that at least one of the sets En has finite measure. For
example, we may consider Ω = N with the counting measure µ, and the sets
En = {k ∈ N | k ≥ n}. Then µ(En) = ∞ for all n, the sequence is decreasing,
and ⋂ n∈N En = ∅.

We omit the proof of the following statement, which is a very simple

exercise:

Proposition 1.22. Let (Ω,M) be a measurable space with two measures µ1,
µ2. For any numbers α1, α2 ≥ 0, the map

α1µ1 + α2µ2 : M → [0, ∞],

is a measure.

E → α1µ1(E) + α2µ2(E)

Definition 1.23. Let (Ω,M, µ) be a measure space. A subset N ⊆ Ω is called

a null set if there is some E ∈ M with N ⊆ E and µ(E) = 0.
If P (x) is a statement about elements x ∈ Ω that can either be true or false,
we say that P holds almost everywhere (w.r.t. µ) if the set of elements x ∈ Ω
for which P (x) is false is a null set.

Notation 1.24. For brevity, we may also say “P holds a.e.” (a.e.=almost
everywhere) or “P (x) holds for (µ-)almost all x”.

1.4 Integration theory

Notation 1.25. For a set Ω with a subset E ⊆ Ω, its indicator function (or
characteristic function) is defined via

χE : Ω → {0, 1} ,

χE(x) = 01

, x ∈ E , x /∈ E.

Definition 1.26. Let Ω be a set. A simple function (or step function) on Ω is a

function with finite range. In particular, a simple function s : Ω → Y with Y
∈ {[0, ∞), [0, ∞],R,C} can always be written as

∑ λ · χs−1(λ).

λ∈s(Ω)

We will refer to this as the canonical form for s.

Proposition 1.27. Let (Ω,M) be a measurable space. Let s : Ω → Y be a

simple function, and assume that Y is equipped with a σ-algebra that contains
all singleton sets. (In particular this is the case for Y ∈ {[0, ∞), [0, ∞],R,C}.)
Then s is measurable if and only if s−1(λ) ∈ M for all λ ∈ s(Ω).

Proof. The “only if” part is clear. Since we assumed the σ-algebra on Y to
contain all singleton sets, it follows in particular that s−1(y) ∈ M for all y ∈
Y.
For the “if” part, let A be a measurable subset of Y . Then s−1(A) = ⋃
λ∈s(Ω) s−1(A ∩ {λ}). By assumption this is a finite union of sets of the form
s−1(λ) for λ ∈ s(Ω), and hence if these are measurable, then so is s−1(A).

Definition 1.28. Let (Ω,M, µ) be a measure space. Let s : Ω → [0, ∞) be a

measurable simple function with canonical form s = ∑ λ∈s(Ω) λ · χs−1(λ).
We define the integral of s over Ω with respect to µ as

∫ Ω s dµ := λ∈s(Ω) ∑ λ · µ(s−1(λ)).

(Keep in mind the convention 0 · ∞ = 0.)

Lemma 1.29. Let (Ω,M, µ) be a measure space. Given k ≥ 0, numbers α1, . . .

, αk ≥ 0, and sets A1, . . . , Ak ∈ M, we consider the positive measurable step
function s = ∑k j=1 αj · χAj .

i. Suppose that A1, . . . , Ak are pairwise disjoint. Then ∫ Ω s dµ = j=1 k∑

αjµ(Aj).
ii. If t : Ω → [0, ∞) is another positive measurable step function, then ∫ Ω
(s + t) dµ = ∫ Ω s dµ + ∫ Ω t dµ.
iii. One has ∫ Ω s dµ = j=1 k∑ αjµ(Aj).

Proof. (i): If the union ⋃k j=1 Aj is not Ω, then we may set Ak+1 = Ω\(⋃k
j=1 Aj) and αk+1 = 0. Then we still have s = ∑k+1 j=1 αj ·χAj and the sets
A1, . . . , Ak+1 are pairwise disjoint. Clearly our claim boils down to the
same claim for this expression of s, so we may simply assume without loss of
generality that we had Ω = ⊔k j=1 Aj to begin with.

Due to this assumption, we can notice that every value λ ∈ s(Ω) has to equal
one of the coefficients αj, and conversely every such coefficient with Aj = ∅
is in s(Ω). Upon grouping all such indices together, we can observe

s−1(λ) = ⊔{Aj | 1 ≤ j ≤ k, αj = λ},

λ ∈ s(Ω).

Using the additivity of the measure µ, we thus obtain

∫
∫ Ω s dµ = λ∈s(Ω) ∑ λµ(s−1(λ)) = λ∈s(Ω) ∑ λ · k∑ j=1 =λ µ(Aj) = j=1 k∑
αjµ(Aj).

αj

(ii): We keep the earlier assumption that Ω = ⊔k j=1 Aj. We can write t = ∑ℓ
i=1 βiχBi for finitely many coefficients β1, . . . , βℓ ≥ 0 and sets B1, . . . , Bℓ
∈ M with Ω = ⊔ℓ i=1 Bi, for instance via the canonical form of t. Then we
can

write

s = k∑ αj · ℓ∑ χAj∩Bi ,

j=1

i=1

This gives rise to an expression

t = ℓ∑ βi · k∑ χAj∩Bi .

i=1 j=1

s + t = k∑ ℓ∑ (αj + βi)χAj∩Bi ,

j=1 i=1

where Ω = ⊔ℓ i=1 ⊔k j=1(Aj ∩ Bi). Using part (i) and the additivity of µ,
we obtain

∫ Ω (s + t) dµ

k∑ ℓ∑ (αj + βi)µ(Aj ∩ Bi) j=1 i=1

= k∑ αj ℓ∑ µ(Aj ∩ Bi) + ℓ∑ βi k∑ µ(Aj ∩ Bi)

j=1 i=1

i=1 j=1

= k∑ αjµ(Aj) + ℓ∑ βiµ(Bi)

j=1

i=1

= ∫ Ω s dµ + ∫ Ω t dµ.

(iii): We keep in mind that every summand αjχAj can itself be understood as
a positive measurable step function whose integral is equal to αjµ(Aj) by
part (i). Since we have already proved in part (ii) that the integral
construction on such functions is additive, this yields the desired sum
expression.

Proposition 1.30. Let (Ω,M, µ) be a measure space. Then the assignment s →

∫ Ω s dµ, which assigns to every measurable step function Ω → [0, ∞) its
integral, satisfies the following properties:

i. ∫ Ω (c · s) dµ = c · ∫ Ω s dµ for all constants c ≥ 0.

ii. If s ≤ t, then ∫ Ω s dµ ≤ ∫ Ω t dµ.

Proof. We note that (i) is an immediate consequence of Lemma 1.29.

For (ii), suppose that s and t are given as

s = k∑ αjχAj ,

j=1

t = ℓ∑ βiχBi

i=1

with Ω = ⊔k j=1 Aj = ⊔ℓ i=1 Bi, for example via their canonical forms.
Then we can also write
s = k∑ αj · ℓ∑ χAj∩Bi ,

j=1

t = ℓ∑ βi · k∑ χAj∩Bi .

i=1

i=1 j=1

Now clearly s ≤ t implies that αj ≤ βi whenever Aj ∩Bi = ∅. Hence it

follows from Lemma 1.29 applied to the above equalities that ∫ Ω s dµ ≤ ∫ Ω
t dµ.

Proposition 1.31. Let (Ω,M, µ) be a measure space and s : Ω → [0, ∞) a

measurable simple function. Then the map ν : M → [0, ∞], E → ∫ Ω sχE dµ
defines a measure.

Proof. Since positive linear combinations of measures are again measures, it

suffices by Lemma 1.29 and Proposition 1.30 to consider the case where s is
of the form s = χA for A ∈ M. In this case ν(E) = ∫ Ω χAχE dµ = ∫ Ω χA∩E
dµ = µ(A ∩ E). Indeed it is clear that ν(∅) = 0. For σ-additivity, let En ∈ M
be a sequence of pairwise disjoint sets, and observe

ν ( n∈N ⋃ En )

µ ( A ∩ n∈N ⋃ En )

= µ ( n∈N ⋃ (A ∩ En) )

= ∑ µ(A ∩ En)

n∈N

= ∑ ν(En).

n∈N
Definition 1.32. Let (Ω,M, µ) be a measure space. For a measurable func tion
f : Ω → [0, ∞], we define its integral as ∫ Ω f dµ = sup {∫ Ω s dµ ∣∣∣∣ s is a
positive measurable step function with s ≤ f } . We call f integrable when ∫ Ω
f dµ < ∞.

Remark 1.33. We should first convince ourselves that the new definition of
the integral does not contradict the old one in the case where f is assumed to
be a step function. But indeed, s = f is the largest step function with

the property that s ≤ f , so by Proposition 1.30(ii)

indeed attained at the value ∫ Ω s dµ in the sense of

the above supremum is

Definition 1.28.

Secondly, we remark that the above definition a priori makes sense even
when f is not assumed to be measurable. However, we will see in the
exercises that the resulting notion of integral will have some undesirable
properties when evaluated on non-measurable functions, for example not
being additive.

Example 1.34. If δx is the Dirac measure associated to a point x ∈ Ω in a

measurable space, then ∫ Ω f dδx = f(x).

If µ is the counting measure on an infinite set Ω, then ∫ Ω f dµ = ∑ x∈Ω f(x).3

If µ is the (yet to be constructed) Lebesgue measure and f : [a, b] → [0, ∞) is
a continuous function, then f is measurable and integrable, and its integral in
the sense of Definition 1.32 will coincide with its Riemann integral.

3Note that this sum might be over an uncountable index set!

Proposition 1.35. Let (Ω,M) be a measureable space and f : Ω → [0, ∞] a

Then f is measurable if and only if there exists a (point positive function.

wise) increasing sequence of positive measurable step functions sn : Ω →
[0, ∞) such that f = supn∈N sn = limn→∞ sn. 4

Proof. We already know that the “if” part is true. For the “only if” part,
assume that f is measurable. We claim that it suffices to consider the special
case f = id as a function [0, ∞] → [0, ∞]. Indeed, if we can realize id =
supn∈N tn for an increasing sequence of positive measurable step functions
tn, then f = id ◦f = n→∞ lim tn ◦ f = sup n∈N (tn ◦ f), where sn = tn ◦ f is an
increasing sequence of positive measurable step functions. But we can come
up with the following sequence tn, which is easily seen to do the trick:

tn = nχ[n,∞] + n2n∑ k 2n − 1 χ[ k−1 2n , 2n k )

k=1

Proposition 1.36. Let (Ω,M, µ) be a measure space and f : Ω → [0, ∞] a

positive measurable function. Suppose that sn is an increasing sequence of
positive measurable step functions such that f = supn∈N sn. Then ∫ Ω f dµ =

supn∈N ∫ Ω sn dµ.

Proof. Since we have by assumption sn ≤ f for all n, it follows from the

definition of the integral that supn∈N ∫ Ω sn dµ ≤ ∫ Ω f dµ.

For the reverse inequality, it suffices to prove that c · ∫ Ω s dµ ≤ supn∈N ∫ Ω

sn dµ for every positive measurable step function s ≤ f and all constants 0 < c
< 1. For a fixed such function s and constant c, we consider the sequence of
sets

En = {x ∈ Ω | sn(x) ≥ cs(x)} . Since sn and s are measurable, it follows that

En ∈ M. Since sn converges to f pointwise and s ≤ f , we can see that Ω = ⋃
n∈N En. As in Proposition 1.31, we define a new measure ν via ν(E) = ∫ Ω
sχE dµ. Then it follows from

4In exactly one instance later, we will use an extra property that we arrange
in our proof. Namely, we can find (sn)n with |f(x) − sn(x)| ≤ 2−n for all x ∈ Ω
with f(x) ≤ n, and sn(x) = n whenever f(x) ≥ n.
Proposition 1.20(iv) that

c ∫ Ω s dµ cν(Ω)

= c · supn∈N ν(En) = sup n∈N ∫ Ω csχEn dµ ≤ sup n∈N ∫ Ω snχEn dµ ≤ sup

n∈N ∫ Ω sn dµ.

Proposition 1.37. Let (Ω,M, µ) be a measure space. Let f, g : Ω → [0, ∞]

be measurable functions. Then

Proof. Both

i. ∫ Ω (c · f) dµ = c · ∫ Ω f dµ for all constants c ≥ 0.

ii. ∫ Ω (f + g) dµ = ∫ Ω f dµ + ∫ Ω g dµ.
iii. If f ≤ g, then ∫ Ω f dµ ≤ ∫ Ω g dµ.
i. and (iii) are trivial consequences of the definition of the inte-

gral. For (ii),

take sequences sn and tn of positive measurable step functions

that increase pointwise to f and g, respectively. Then sn + tn increases

pointwise to f + g. Hence it follows from Proposition 1.36 that

∫ Ω f + g dµ

sup n∈N ∫ Ω sn + tn dµ

= (sup ∫ Ω sn dµ) + (sup ∫ Ω tn dµ) = ∫ Ω f dµ + ∫ Ω g dµ.

Here we have used that these suprema are in fact limits.

Proposition 1.38. Let (Ω,M, µ) be a measure space and f : Ω → [0, ∞] Then
one has ∫ Ω f dµ = 0 if and only if f(x) = 0 a measurable function. for µ-
almost all x ∈ Ω. Furthermore, if f is integrable, then f(x) < ∞ for µ-almost
all x ∈ Ω.

Proof. Suppose ∫ Ω f dµ = 0. Consider the measurable set En = f−1(( n 1 ,

∞]), and observe that by definition n 1 χEn ≤ f . Hence n 1 µ(En) ≤ ∫ Ω f dµ =
0,

so µ(En) = 0 for all n. The union of En is equal to E = f−1((0, ∞]). 14

By continuity of measures, we have µ(E) = supn∈N µ(En) = 0, so indeed f(x)

= 0 for µ-almost all x ∈ Ω.

For the converse, assume that f(x) = 0 for µ-almost all x ∈ Ω, which means
that the set E above is a null set. If s : Ω → [0, ∞) is a simple measurable
function with s ≤ f , then it means in particular that s−1(λ) ⊆ E for all λ = 0.
This immediately implies ∫ Ω s dµ = 0, but hence also ∫ Ω f dµ = 0 by
definition.

Now assume that f is integrable. Consider E = f−1(∞) ∈ M and sn = nχE.

Then sn ≤ f for all n ∈ N, and hence nµ(E) = ∫ Ω sn dµ ≤ ∫ Ω f dµ. Since this
holds for all n ≥ 1 and we assume the right side to be finite, this is only
possible for µ(E) = 0. But this is what it means that f(x) < ∞ for µ-almost all
x ∈ Ω. This finishes the proof.

Remark 1.39. Let Ω be a set and f : Ω → C a function. Then it is always

possible to decompose f into f = u + i · v for real-valued functions u, v : Ω
→ R, the real and imaginary parts of f . By defining u± = max(0, ±u) and v±
= max(0, ±v), we can always uniquely write

u = u+ − u− , v = v+ − v− , u+u− = 0, v+v− = 0,

where u+ , u− , v+ , v− take values in [0, ∞). We then have u± , v± ≤ |f |, but

also |f | ≤ u+ + u− + v+ + v− by the triangle inequality.
Furthermore, if Ω carries a σ-algebra for which f becomes measurable, then
the functions u+ , u− , v+ , v− are also measurable.

In what follows we wish to exploit such decompositions to extend the

integral construction to the case of complex valued functions.

Definition 1.40. Let (Ω,M, µ) be a measure space. We say that a mea surable
function f : Ω → C is integrable, if |f | is integrable in the sense of Definition
1.32. In this case we define the integral as

∫ Ω f dµ = ∫ Ω u+ dµ − ∫ Ω u− dµ + i ( ∫ Ω v+ dµ − ∫ Ω v− dµ ) ,

where we use the decomposition from Remark 1.39. The set of all such
functions is denoted L1(Ω,M, µ) or L1(µ).

Theorem 1.41. L1(Ω,M, µ) is a complex vector space with the usual oper
ations. Moreover the integral

L1(Ω,M, µ) → C,

is a linear map.

f → ∫ Ω f dµ

Proof. The fact that L1 is a vector space is left as an exercise. Let us proceed
to prove linearity of the integral.

Let us first assume that f, g ∈ L1 are real-valued. Set h = f + g and observe

that hence h+ −h− = f+ −f−+g+ −g− , or equivalently h++f−+g− = h− + f+ +
g+ . All of these summands are positive integrable functions, so it follows by
Proposition 1.37 that

∫ Ω h+ dµ + ∫ Ω f− dµ + ∫ Ω g− dµ = ∫ Ω h− dµ + ∫ Ω f+ dµ + ∫ Ω g+ dµ.

All of these summands are finite, and hence we can rearrange this equation to

∫ Ω h+ dµ − ∫ Ω h− dµ = ∫ Ω f+ dµ − ∫ Ω f− dµ + ∫ Ω g+ dµ − ∫ Ω g− dµ,

∫ ∫ ∫
and hence ∫ Ω h dµ = ∫ Ω f dµ + ∫ Ω g dµ. It is also clear from the definition
that ∫ Ω f + ig dµ = ∫ Ω f dµ + i ∫ Ω g dµ. These two equations imply together
that the integral is indeed additive.

Consider a scalar α > 0. Then (αf)± = α · f± , so from Proposition 1.37 it

follows that∫ Ω αf dµ = ∫ Ω αf+ dµ − ∫ Ω αf− dµ = α ∫ Ω f dµ.

If α < 0, then (αf)± = (−α)f∓ , so the similar calculation shows

∫ Ω αf dµ = ∫ Ω (−αf−) dµ − ∫ Ω (−αf+) dµ = α ∫ Ω f dµ.

If more generally α = α1 + iα2 ∈ C for α1, α2 ∈ R, then we use the above to

calculate∫ Ω α(f + ig) dµ

= ∫ ∫ Ω Ω α1f α1f − − α2g α2g + dµ i(α1g + i ∫ Ω + α1g α2f) + dµ α2f dµ

= α1 ∫ Ω f dµ − α2 ∫ Ω g dµ + i ( α1 ∫ Ω g dµ + α2 ∫ Ω f dµ ) = α ( ∫ Ω f dµ + i
∫ Ω g dµ ) = α ∫ Ω f + ig dµ.

This finishes the proof.

Proposition 1.42. For all f ∈ L1(Ω,M, µ), one has

∣∣∣∣ ∫ Ω f dµ ∣∣∣∣ ≤ ∫ Ω |f | dµ.

Proof. By considering the polar decomposition of ∫ Ω f dµ and multiplying f
with a scalar of modulus one, we may assume without loss of generality that ∫
Ω f dµ ≥ 0. Write f = u + iv for real-valued functions u, v. Then

0 ≤ ∫ Ω f dµ = ∫ Ω u dµ ≤ ∫ Ω u+ dµ u+≤|f ≤ | ∫ Ω |f | dµ.
Proposition 1.43. Let (Ω,M, µ) be a measure space and f : Ω → C a
measurable function.

i. If f = 0 a.e., then f is integrable and ∫ Ω f dµ = 0.

ii. If f is integrable and g : Ω → C is another measurable function such that
f = g a.e., then g is integrable and ∫ Ω f dµ = ∫ Ω g dµ.
iii. If f is integrable and ∫ Ω f · χE dµ = 0 for all E ∈ M, then f = 0 a.e.

Proof. (i): By the previous proposition, it suffices to show ∫ Ω |f | dµ = 0, but

this is a consequence of Proposition 1.38.

(ii): The assumption means that g −f = 0 a.e., so g −f is integrable and has

vanishing integral. Hence g = f + (g − f) is integrable, and according to
additivity of the integral it has the same integral as f .

(iii): Consider E = {x ∈ Ω | Ref(x) ≥ 0}. Decompose f = u+ − u− + i(v+ −

v−) as before, and observe 0 = Re ( ∫ Ω fχE dµ ) = ∫ Ω Re(fχE) dµ = ∫ Ω u+
dµ. By Proposition 1.38 it follows that u+ = 0 a.e.. Repeating this argument
with different choices for E, one can also see that u− , v+ , v− vanish almost
everywhere. But this clearly implies that f = 0 a.e..

Proposition 1.44. Let (Ω,M, µ) be a σ-finite measure space and f : Ω → C an

integrable function. Let F ⊆ C be a closed set with the property that for all E
∈ M with 0 < µ(E) < ∞, one has

µ(E) 1 ∫ Ω fχE dµ ∈ F.

Then f(x) ∈ F for µ-almost all x ∈ Ω.

Proof. As µ is σ-finite, we can write Ω = ⋃ En as an increasing union of sets

En ∈ M with µ(En) < ∞. If the set of all x ∈ En with f(x) /∈ F is a null set for
all n ≥ 1, then by σ-subadditivity the same is true for the set of all x ∈ Ω with
f(x) /∈ F . So by restricting the problem to each En in place of Ω, we may
assume without loss of generality that µ is a finite measure.

If F = C, there is nothing to prove, so let us assume that F has non empty

complement. Let B ⊂ C \ F be a closed ball around some point z with radius
r > 0. Set g = fχf−1(B) and write g = zχf−1(B) + g0, where |g0| ≤ rχf−1(B).
Then either µ(f−1(B)) = 0, or

∣∣∣∣z − µ(f−1(B)) 1 ∫ Ω g dµ ∣∣∣∣ = ∣∣∣∣

µ(f−1(B)) 1 ∫ Ω g0 dµ ∣∣∣∣ ≤ r.
The latter would imply that the number µ(f−1(B)) 1 ∫ Ω g dµ is in B, which
Hence we conclude µ(f−1(B)) = 0. is disjoint from F , a contradiction.

Since we can write C \ F as a countable union of closed balls, this implies

µ(f−1(C \ F )) = 0, which confirms the claim.

Proposition 1.45. Let (Ω,M, µ) be a measure space.

Consider
N = {f : Ω → C | f is measurable and f(x) = 0 for µ-almost all x} .

Then N is a linear subspace of L1(Ω,M, µ), and f ∈ N holds if and only if f is

measurable and |f | ∈ N .

Proof. From Proposition 1.43 we get that indeed N ⊆ L1(Ω,M, µ), and the
second equivalence of the statement is clear.

Let f, g ∈ N be given. First of all, it is clear that α · f ∈ N for all α ∈ C. So

we shall show f + g ∈ N . We do have that this function is measurable, so it
remains to show that it vanishes almost everywhere. Indeed, we have (f +g)
−1(C\{0}) ⊆ f−1(C\{0})∪g−1(C\{0}), which by assumption implies µ((f +
g)−1(C \ {0})) = 0, or that indeed f(x) + g(x) = 0 holds for µ-almost all x.

Definition 1.46. Let (Ω,M, µ) be a measure space. Then in light of the above,
we define the quotient vector space

L1(µ) = L1(Ω,M, µ) = L1(Ω,M, µ)/N. For a function f ∈ L1 , we may for the

moment denote its coset by [f ] = f+N , but in time we will transition to
denote it simply by f , even though this is some abuse of notation. 5 From the
previous results above it follows that the assignment

L1(Ω,M, µ) → C,

[f ] → ∫ Ω f dµ
is a well-defined linear map, which we will call the integral. N happens
precisely when |f | ∈ N , it makes sense to define |[f ]| = [|f |]. Consequently,
the following semi-norm ∥ · ∥1 on L1 becomes a norm (cf. Proposition 1.38)
on the L1 -space via

∥f∥1 = ∫ Ω |f | dµ,

1.5 Convergence Theorems

Since f ∈

∥[f ]∥1 := ∥f∥1.

Theorem 1.47 (Monotone Convergence Theorem). Let (Ω,M, µ) be a mea

sure space and fn : Ω → [0, ∞] a (pointwise) increasing sequence of measur
able functions. Then f = supn≥1 fn is also measurable, and we have

∫ Ω f dµ = sup n≥1 ∫ Ω fn dµ.

Proof. By Proposition 1.35, for every n ≥ 1, we find an increasing sequence

of measurable step functions sn,k : Ω → [0, ∞] such that fn = supk≥1 sn,k.
Con sidering the remark in the footnote, we may assume that |fn(x) − sn,k(x)|
≤ 2−k whenever fn(x) ≤ k, and sn,k(x) = k whenever fn(x) ≥ k.

Since (sn,k)k is increasing, the sequence of positive measurable step func

tions tn = maxj≤n sj,n is increasing. Since fn is increasing, it follows that tn ≤
fn for all n. We claim f = supn≥1 tn. Indeed, let x ∈ Ω be given. If f(x) = ∞,
then fn(x) → ∞, so tn(x) ≥ sn,n(x) ≥ min(n, fn(x) − 2−n) → ∞. If f(x) < ∞,
then in particular f(x) ≤ n for all large enough n, and hence
|f(x) − tn(x)|

≤ |f(x) − sn,n(x)| ≤ |fn(x) − f(x)| + |fn(x) − sn,n(x)| ≤ |fn(x) − f(x)| + 2−n → 0.

We appeal to Proposition 1.36 and see that

∫ Ω f dµ = sup n≥1 ∫ Ω tn dµ tn≤fn ≤ sup n≥1 ∫ Ω fn dµ.

The “≥” relation is on the other hand clear from the fact that the integral is
monotone. This finishes the proof.

5But this is justified by the fact that we will mostly form integrals, which do
not depend on the representative for such a coset.

Lemma 1.48 (Fatou). Let (Ω,M, µ) be a measure space and fn : Ω → [0, ∞] a

sequence of measurable functions. Then lim n→∞ inf fn is also measurable,
and we have

∫ Ω (lim n→∞ inf fn) dµ ≤ lim n→∞ inf ∫ Ω fn dµ. Proof. Denote gk = infn≥k
fn, and recall that lim infn→∞ fn = supk≥1 gk. We thus see that lim infn→∞
fn is a measurable function. If n ≥ k, then evidently gk ≤ fn, so ∫ Ω gk dµ ≤ ∫ Ω
fn dµ. In particular, this is true for arbitrarily large n, so ∫ Ω gk dµ ≤ lim
infn→∞ ∫ Ω fn dµ. Using the Monotone Convergence Theorem, we thus see ∫
Ω lim n→∞ inf fn dµ = sup k≥1 ∫ Ω gk dµ ≤ lim n→∞ inf ∫ Ω fn dµ.

Theorem 1.49 (Dominated Convergence Theorem). Let (Ω,M, µ) be a mea

sure space and fn ∈ L1(Ω,M, µ) a sequence converging to a function f point
wise. Suppose that there exists a positive integrable function g : Ω → [0, ∞)
such that |fn| ≤ g for all n. Then f ∈ L1(Ω,M, µ), and

n→∞ lim ∫ Ω |f − fn| dµ = 0,

∫ Ω f dµ = n→∞ lim ∫ Ω fn dµ.

Proof. We have that f is measurable (Theorem 1.13) and |f | ≤ g, hence f ∈

L1(Ω,M, µ). Furthermore, we have |f − fn| ≤ 2g, and so we may apply Fatou’s
Lemma to the sequence 2g − |f − fn| to get
∫ ∫ ∫ ∫
∫ Ω 2g dµ ≤ lim n→∞ inf ∫ Ω 2g − |f − fn| dµ = ∫ Ω 2g dµ − lim n→∞ sup ∫ Ω
|f − fn| dµ .

︸︷︷︸
≥0

Since g was assumed to be integrable, this is equivalent to

0 = lim n→∞ sup ∫ Ω |f − fn| dµ = n→∞ lim ∫ Ω |f − fn| dµ. As a consequence

we obtain

∣∣∣∣ ∫ Ω f dµ − ∫ Ω fn dµ ∣∣∣∣ = ∣∣∣∣ ∫ Ω

f − fn dµ ∣∣∣∣ ≤ ∫ Ω |f − fn| dµ n→∞
−→ 0.
This finishes the proof.

Remark 1.50. For practical applications of the Dominated Convergence

Theorem, it is useful to observe that it is only necessary to assume that the
pointwise convergence fn(x) → f(x) and the inequality |fn(x)| ≤ g(x) holds for
µ-almost all x ∈ Ω. This is a consequence of Proposition 1.43, as we may
just re-define all functions fn on a common measurable null set to have value
zero and thus enforce these statements to hold on all points, yet the integrals
in the conclusion of the statement remain the same.

Remark 1.51. The Dominated Convergence Theorem really only holds for
sequences of functions, and its analogous generalizations for more general
families of functions (such as nets) are false. A counterexample is discussed
in the exercise sessions.
Theorem 1.52. Let (Ω,M, µ) be a measure space. Suppose that a sequence fn
∈ L1(Ω,M, µ) satisfies the Cauchy criterion in the semi-norm ∥·∥1. Then
there exists a subsequence (fnk)k and a function f ∈ L1(Ω,M, µ) such that

fnk(x) k→∞ −→ f(x)

for µ-almost all x ∈ Ω,

and moreover ∥f − fn∥1 n→∞ −→ 0. In particular, it follows that L1(Ω,M, µ)

is a Banach space with respect to the norm ∥ · ∥1.

Proof. By applying the Cauchy criterion inductively, we can find a subse

quence (fnk)k such that ∥fnk − fnk+1∥1 ≤ 2−k . We define the measurable
function g : Ω → [0, ∞] as g(x) = ∑∞ k=1 |fnk(x) − fnk+1(x)|. Then it follows
from the Monotone Convergence Theorem that

∫ Ω g dµ = k=1 ∞∑ ∫ Ω |fnk − fnk+1| dµ = k=1 ∞∑ ∥fnk − fnk+1∥1 ≤ k=1 ∞∑

2−k = 1.

In particular g is integrable, and by Proposition 1.38, the set E = g−1([0, ∞))

has a complement of zero measure. For all x ∈ E, we have by definition that
the series ∞∑ [fnk+1(x) − fnk(x)] converges absolutely, and hence the
function

k=1

f(x) := fn1(x) + ∞∑ [fnk+1(x) − fnk(x)] = k→∞ lim fnk(x),

k=1

x∈E
is well defined and measurable on E. We extend f to a measurable function
on Ω by defining it to be zero on the complement of E. We get by the triangle
inequality that for all k ≥ 1, the function fnk is dominated (on E) by the
integrable function |fn1|+g. Therefore it follows from the Dominated
Convergence Theorem that ∥f − fnk∥1 k→∞ −→ 0. Since the sequence (fn)n
was assumed to satisfy the Cauchy criterion in ∥ · ∥1 and we just showed that
a subsequence converges to f in this norm, it follows that also ∥f − fn∥ → 0. 6
This finishes the proof.

Proposition 1.53. Let (Ω1,M1, µ1) be a measure space, Ω2 a non-empty set,

and φ : Ω1 → Ω2 a map. Then

φ∗M1 := { E ⊆ Ω2 | φ−1(E) ∈ M1 }

is a σ-algebra and

φ∗µ1 : φ∗M1 → [0, ∞],

φ∗µ1(E) = µ1(φ−1(E))

These are called the push-forward σ-algebra and the push forward measure
with respect to φ. Then with respect to this measure space structure on Ω2, φ
becomes a measurable map.

is a measure.

Proof. We have already seen in the exercise sessions that φ∗M1 is a σ-

algebra. For the measure condition, we observe that φ∗µ1(∅) = 0 is trivial.
Moreover if En ∈ φ∗M1 is a sequence of disjoint sets, then φ−1(En) ∈ M1 is
a sequence of disjoint sets, and hence

φ∗µ1(⋃ En) = µ1(⋃ φ−1(En)) = ∑ µ1(φ−1(En)) = ∑ φ∗µ1(En). The last

part of the statement is trivial.

Theorem 1.54. Let (Ω1,M1, µ1) be a measure space, Ω2 a non-empty set,

and φ : Ω1 → Ω2 a map. We define M2 = φ∗M1 and µ2 = φ∗µ1 in the above

sense.

(i) If f : Ω2 → [0, ∞] is a measurable function, then

∫ Ω2 f dµ2 = ∫ Ω1 f ◦ φ dµ1.
(ii) If f : Ω2 → C is a µ2-integrable function, then f ◦ φ is µ1-integrable, and
we have

∫ Ω2 f dµ2 = ∫ Ω1 f ◦ φ dµ1.

6As an exercise, fill in this detail yourself! This is a standard ε/2-argument,

which is very similar to how one proves the completeness of R out of the
axiom that bounded sets have suprema.

Proof. (i): By definition, a set E ⊆ Ω2 belongs to M2 precisely when φ−1(E)

∈ M1. Since χE ◦ φ = χφ−1(E), we have

∫ Ω2 χE dµ2 = µ2(E) = µ1(φ−1(E)) = ∫ Ω1 χE ◦ φ dµ1.

In other words, the claim holds for functions of the form f = χE. By linearity
of the integral, the desired equation holds for all positive measurable step
functions in place of f . Now let f be as general as in the statement, and write
f = supn∈N sn for an increasing sequence of positive measurable step
functions sn : Ω2 → [0, ∞), using Proposition 1.35. Clearly we also have f ◦
φ = supn∈N sn ◦ φ. Then by Proposition 1.36, we see that

∫ Ω2 f dµ2 = sup n∈N ∫ Ω2 sn dµ2 = sup n∈N ∫ Ω1 sn ◦ φ dµ1 = ∫ Ω1 f ◦ φ

dµ1.

(ii): By definition, f being µ2-integrable means that ∫ Ω2 |f | dµ2 < ∞, which

by part (i) implies ∫ Ω1 |f ◦φ| dµ1 < ∞. In other words, f ◦φ is µ1-integrable.
The desired equality thus follows directly from part (i), the linearity of the in
tegral, and the fact that we may write f as a linear combination of integrable
positive functions.

Remark 1.55. In the above theorem, we pushed forward the measure space
structure from Ω1 to get one on Ω2 which makes the statement of the theorem
true. It may of course happen that we have an a priori given measure space
(Ω2,M2, µ2) and a measurable map φ : Ω1 → Ω2 with the property that
µ1(φ−1(E)) = µ2(E) for all E ∈ M2. Convince yourself that the statement of
the theorem will still be true!
2 Carathéodory’s Construction of
Measures
2.1 Measures on Semi-Rings and Rings
Definition 2.1. Let Ω be a non-empty set. A semi-ring on Ω is a set of subsets
A ⊆ 2Ω such that

1. ∅ ∈ A.
2. If E, F ∈ A, then E ∩ F ∈ A.
3. If E, F ∈ A, then E \ F is a finite disjoint union of sets in A.

Example 2.2. The set A of all subsets of Ω with at most one element form a
semi-ring. More interestingly, if n ≥ 2, then the set of half-open cubes

A = {(a1, b1] × · · · × (an, bn] | aj, bj ∈ R, aj ≤ bj}

is also a semi-ring on Rn . (This will be justified later.) Definition 2.3. Let Ω

be a non-empty set. A ring over Ω is a set of subsets R ⊆ 2Ω such that

1. ∅ ∈ R.
2. If E, F ∈ R, then E ∪ F ∈ R.
3. If E, F ∈ R, then E \ F ∈ R.

Remark 2.4. One always has E ∩ F = E \ (E \ F ), so every ring is closed

under forming intersections. In particular, it follows that every ring is also a
semi-ring. The converse is of course not true, e.g., the set of (left) half open
intervals in R is a semi-ring, but not a ring.

Lemma 2.5. Let A be a semi-ring over Ω. Then the smallest ring R over Ω
containing A is given as

R = {E1 ∪ · · · ∪ En | Ej ∈ A are pairwise disjoint} . We will refer to it as

the ring generated by A.
Proof. If R is given as above, then evidently every ring containing A will
also contain R. Hence the claim will follow once we show that R is indeed a
ring. It is first of all clear that ∅ ∈ R. Let E, F ∈ R. We shall first show

E \ F ∈ R.

By definition, we find pairwise disjoints sets E1, . . . , En ∈ A and pairwise

disjoint sets F1, . . . , Fm ∈ A such that E = ⋃n j=1 Ej and F = ⋃m ℓ=1 Fℓ.

Then

E \ F = j=1 n⋃ Ej \ ( ℓ=1 m⋃ Fℓ ) = j=1 n⋃ [(((Ej \ F1) \ F2 ) \ . . . ) \ Fm ]

Since A was assumed to be a semi-ring, the set Ej \ F1 is a finite union of

pairwise disjoint sets in A. Once we know this, it follows again by the semi
ring property that (Ej \ F1) \ F2 is a finite union of pairwise disjoint sets in
A. By induction, it follows that each expression of the form (((Ej \ F1) \ F2 )
\ . . . ) \ Fm is a finite union of disjoint sets in A. These unions are in turn
pairwise disjoint in j, and hence it follows that E \ F ∈ R.

Now from this we also get E∪F ∈ R because one can write it as a disjoint
union E ∪ F = E ∪ (F \ E), and it is clear that R is closed under disjoint
unions of its elements.

Definition 2.6. Let A be a semi-ring on Ω. A measure on A is a map µ : A →

[0, ∞] such that

i. µ(∅) = 0.
ii. If En ∈ A is a sequence of pairwise disjoint sets with ⋃ n≥1 En ∈ A,
then µ(⋃ n≥1 En) = ∑ n≥1 µ(En).
(σ-additivity)

In particular, if R is a ring on Ω, then a measure on the ring R is defined

as a measure on R viewed as a semi-ring. Proposition 2.7. Let A be a
semi-ring over Ω, and let R be the ring generated by A. Then every
measure µ0 on A extends uniquely to a measure µ
n R. Proof. By the definition of R, it is clear that if µ exists, then the σ-
additivity
f µ implies that µ is uniquely determined by µ0. So let us argue why µ
exists in the first place. Let E ∈ R, and write it as E = E1 ∪· · ·∪En
with pairwise disjoint sets Ej ∈ A. Define
µ(E) := µ0(E1) + · · · + µ0(En).
We have to show that µ is well-defined, and that it is σ-additive.
Suppose that F1, . . . , Fm ∈ A is another collection of pairwise disjoint
sets with E = F1 ∪ · · · ∪ Fm. Then one has for all j ∈ {1, . . . , n} that

Ej = Ej ∩ E = m⋃ (Ej ∩ Fℓ),

ℓ=1

and this union is a disjoint union. Analogously we have Fℓ = ⋃n j=1(Ej ∩

Fℓ) for all ℓ ∈ {1, . . . , m}. By using the additivity of µ0, it follows that

n∑ µ0(Ej) = n∑ m∑ µ0(Ej ∩ Fℓ) = m∑ n∑ µ0(Ej ∩ Fℓ) = m∑ µ0(Fℓ).

j=1

j=1 ℓ=1

So we see that µ is a well-defined map.

ℓ=1 j=1

ℓ=1

Now let us see why µ is σ-additive. Suppose that En ∈ R is a sequence of

pairwise disjoint sets with E = ⋃ n≥1 En ∈ R. Write E = A1 ∪ · · · ∪ Am for
pairwise disjoint sets A1, . . . , Am ∈ A, and moreover write En = An,1 ∪ · ·
· ∪ An,mn ∈ A for all n ≥ 1 some mn ≥ 1. Then applying σ-additivity of µ0
several times, we can see

µ(E) = m∑ µ0(Aj)

j=1

= j=1 m∑ µ0 ( n≥1 ⋃ (Aj ∩ En) )

= j=1 m∑ µ0 ( n≥1 ⋃ k≤mn ⋃ (Aj ∩ An,k) ) = m∑ ∞∑ mn∑ µ0(Aj ∩ An,k)

j=1 n=1 k=1

= ∞∑ m∑ mn∑ µ0(Aj ∩ An,k) = ∞∑ µ(En).

n=1 j=1 k=1

2.2 Outer measures

n=1

Definition 2.8. Let Ω be a non-empty set. An outer measure on Ω is a map ν :

2Ω → [0, ∞] satisfying:

OM1. ν(∅) = 0. OM2. If E ⊆ F ⊆ Ω, then ν(E) ≤ ν(F ). OM3. If En ⊆ Ω is

any sequence of pairwise disjoint sets, then ν ( ⋃ n≥1 En ) ≤

∑∞ n=1 ν(En).

(σ-subadditivity)

Remark 2.9. WARNING! The terminology is a bit confusing. Contrary to the

usual rules of the English language, an “outer measure” in the above sense
does not describe a measure that has an additional “outerness” property.
Instead, it describes a weaker concept than that of a measure.

Proposition 2.10. Let Ω be a non-empty set, R a ring on Ω, and µ a measure

on R. Define for every subset E ⊂ Ω the value

µ∗(E) = inf n=1 ∞∑ µ(En) ∣∣∣∣ En ∈ R is a sequence with E ⊆ n≥1 ⋃

En . 7 Then µ∗ defines an outer measure on Ω.

7Keep in mind that by convention, inf ∅ := ∞. For certain sets E there may
not exist any sequence En with these properties.
Proof. One gets µ∗(∅) = 0 by choosing En = ∅. If E ⊆ F ⊆ Ω are two sub
sets, then evidently there are at least as many ways to cover E by sequences
in R as for F , which leads directly to µ∗(E) ≤ µ∗(F ).

Now let An ⊆ Ω be a sequence of (pairwise disjoint) sets. We shall show

that µ∗(⋃ n≥1 An) ≤ ∑∞ n=1 µ∗(An). If the right side is infinite, there is
nothing to show, so let us assume that it is finite. Let ε > 0. For every n ≥ 1,
we may by definition find a sequence En,k ∈ R with An ⊆ ⋃ k≥1 En,k and
∑∞ k=1 µ(En,k) ≤ µ∗(An)+2−nε. Then the set ⋃ n≥1 An is of course covered
by the countably many sets En,k ∈ R for n, k ≥ 1, and hence

µ∗( ⋃ An) ≤ ∞∑ ∞∑ µ(En,k) ≤ ∞∑ (2−nε + µ∗(An)) = ε + ∞∑ µ∗(An).

n≥1

n=1 k=1

n=1

Since ε > 0 was arbitrary, this shows the claim. Definition 2.11. Let ν be an
outer measure on a set Ω. We say that a set E ⊆ Ω is ν-measurable, if for
every subset A ⊆ Ω, we have ν(A) = ν(A ∩ E) + ν(A \ E).

Theorem 2.12 (Carathéodory). Let ν be an outer measure on a set Ω. Then the

ν-measurable subsets of Ω form a σ-algebra M, and the restriction of ν to M
defines a measure.

Proof. Evidently ∅ ∈ M and if E ∈ M, then also Ec ∈ M. Let us consider

finite unions. Let E, F ∈ M, and A ⊆ Ω an arbitrary subset. Then

ν(A) = ν ( A ∩ (E ∪ F ) ∪ A \ (E ∪ F ))

≤ ν(A ∩ (E ∪ F )) + ν(A \ (E ∪ F )) = ν(A ∩ E ∪ (A ∩ F \ E)) + ν(A \ (E ∪ F

)) ≤ ν(A ∩ E) + ν((A \ E) ∩ F ) + ν((A \ E) \ F )) = ν(A ∩ E) + ν(A \ E) =
ν(A).
But from this we get equality everywhere, which implies E ∪ F ∈ M because
A was arbitrary. If additionally E ∩ F = ∅, then we have ν(A ∩ (E ∪ F )) =
ν(A ∩ (E ∪ F ) ∩ E) + ν(A ∩ (E ∪ F ) \ E) = ν(A ∩ E) + ν(A ∩ F ). So
inserting A = E ∪ F yields that ν is finitely additive on M.

From E ∪ F ∈ M it immediately follows that M is closed under intersec tions

and differences. Hence M becomes a σ-algebra if we can show that for all
sequences of pairwise disjoint sets En ∈ M, we have E = ⋃ n≥1 En ∈ M.

For m ≥ 1 and all sets A ⊆ Ω, we have

ν(A) = ν ( A ∩ ( n≤m ⋃ En )) + ν ( A
\ ( n≤m ⋃ En ))
= n=1 m∑ ν(A ∩ En) + ν ( A \ ( n≤m ⋃ En ))

≥ m∑ ν(A ∩ En) + ν(A \ E).

n=1

Since m was arbitrary, we get ν(A) ≥ ∞∑ ν(A∩En)+ν(A\E). On the other

n=1

hand, the equality A = (A \ E) ∪ (A ∩ E) = (A \ E) ∪ ⋃ n≥1 A ∩ En together

with σ-subadditivity yields

ν(A) ≤ ν(A ∩ E) + ν(A \ E)

≤ ∞∑ ν(A ∩ En) + ν(A \ E) ≤ ν(A).

n=1

In particular we get the equality ν(A) = ν(A ∩ E) + ν(A \ E), which yields E
∈ M as A was arbitrary. Furthermore, if we insert A = E, then we also have
ν(E) = ∑∞ n=1 ν(En), which shows that ν is σ-additive on M. In particular it
is indeed a measure when restricted to M.

Definition 2.13. A measure space (Ω,M, µ) is called complete, if for all sets
E ⊆ F ⊆ Ω, one has that F ∈ M and µ(F ) = 0 implies E ∈ M.

Proposition 2.14. Let Ω be a non-empty set and ν an outer measure on Ω. Let

M be the σ-algebra of ν-measurable sets, and define the measure µ = ν|M.
Then (Ω,M, µ) is a complete measure space.

Proof. Suppose E ⊆ F ∈ M are given with µ(F ) = 0. Then we observe for all
A ⊆ Ω that ν(A) ≤ ν(A ∩ E) + ν(A \ E) ≤ ν(A ∩ F ) + ν(A \ E) ≤ 0 + ν(A).

So we see that these are all equalities. Since A was arbitrary, this implies
that E ∈ M.

Theorem 2.15. Let R be a ring on a set Ω, and µ a measure on R. Then the σ-

algebra M of µ∗ -measurable sets contains R, and we have µ∗|R = µ.

Proof. We see right away for all E ∈ R that µ∗(E) ≤ µ(E) since we can write
E = E ∪ ∅ ∪ ∅ ∪ . . . . On the other hand, if En ∈ R is any sequence of sets
with E ⊆ ⋃ n≥1 En, then it follows from σ-subadditivity and monotonicity of
µ that

µ(E) = µ ( n≥1 ⋃ (E ∩ En) ) ≤ n=1 ∞∑ µ(E ∩ En) ≤ n=1 ∞∑ µ(En).

By taking the infimum over all possible choices of such sequences, we arrive
at µ(E) = µ∗(E). Since E ∈ R was arbitrary, we have just shown µ∗|R = µ.
Now we need to show that every set E ∈ R is µ∗ -measurable. Let A ⊆ Ω be
any set. Since we always have µ∗(A) ≤ µ∗(A ∩ E) + µ∗(A \ E), we may
assume without loss of generality that µ∗(A) < ∞. Let An ∈ R be a sequence
of sets with A ⊆ ⋃ n≥1 An. Then

µ∗(A) ≤ µ∗(A ∩ E) + µ∗(A \ E)

≤ ∑∞ n=1 µ(An ∩ E) + µ(An \ E) = ∑∞ n=1 µ(An).

If we take the infimum over all possible such sequences An, then the right
side approaches the value µ∗(A), and hence we get the equality µ∗(A) =
µ∗(A ∩ E) + µ∗(A \ E). Since A ⊆ Ω and E ∈ R were arbitrary, this finishes
the proof.

Definition 2.16. Let A be a semi-ring over Ω, and µ a measure on A. We say

that µ is σ-finite, if there is a sequence En ∈ A with µ(En) < ∞ and Ω = ⋃
n≥1 En.

Theorem 2.17. Let R be a ring on a set Ω, and µ a σ-finite measure on R. Let

M0 be the σ-algebra generated by R. Then the measure extension µ∗|M0 from
R to M0 is the unique measure on M0 extending µ on R.

Proof. Suppose that µ1 is any measure on M0 with µ1|R = µ. We claim µ1 ≤

µ∗|M0 . Let E ∈ M0, and assume without loss of genereality µ∗(E) < ∞. So in
particular there exist sequences En ∈ R with E ⊆ ⋃ n≥1 En. By σ
subadditivity of µ1, it follows that µ1(E) ≤ ∑∞ n=1 µ(En).

But since this holds for any choice of (En)n, we obtain µ1(E) ≤ µ∗(E).

Since we assumed µ to be σ-finite, we can choose some sequence Fn ∈ R

with µ(Fn) < ∞ and Ω = ⋃ n≥1 Fn. Without loss of generality we may assume
that Fn ⊆ Fn+1 for all n ≥ 1. For every A ∈ M0 we have µ1(Fn ∩ A) +
µ1(Fn \ A) = µ1(Fn) = µ∗(Fn) = µ∗(Fn ∩ A) + µ∗(Fn \ A). Note that all these
summands are finite, and we know from above µ1(Fn ∩ A) ≤ µ∗(Fn ∩ A). It
follows that necessarily µ1(Fn ∩ A) = µ∗(Fn ∩ A). By taking the supremum
over n, we conclude µ1(A) = µ∗(A).

Corollary 2.18. Let A be a semi-ring over a set Ω, and µ0 a measure on A.

Let M be the σ-algebra generated by A. Then there exists a measure µ : M →
[0, ∞] extending µ0. If µ0 is σ-finite, then µ is unique.

Proof. We note that if R is the ring generated by A, then M is also the σ-

algebra generated by R. So this is a direct consequence of Proposition 2.7,
Proposition 2.10, Theorem 2.12, and Theorem 2.15. In the case of µ0 being
σ-finite, uniqueness of µ is exactly Theorem 2.17.

2.3 Application: The Lebesgue measure on R

Notation. In what follows, we will fix a bijection φ on a set Ω. We will also
denote by φ the induced bijection on 2Ω , which associates to every subset E
⊆ Ω its image φ(E) under φ.

Proposition 2.19. Let A be semi-ring on a set Ω, and let µ0 : A → [0, ∞] be a

σ-finite measure. Let R be the ring generated by A, and µ : R → [0, ∞] the
unique extension to a measure. Suppose that φ is a bijection on Ω that
restricts to a bijection on A, and suppose that µ0 = µ0 ◦ φ. Then φ restricts to
a bijection on R, and µ = µ ◦ φ.

Proof. This is immediate from the definition of both R and µ, and is left as an
exercise.

Proposition 2.20. Let R be a ring on a set Ω, and let µ : R → [0, ∞] be a

measure. Suppose that φ is a bijection on Ω that restricts to a bijection on R,
and suppose that µ = µ ◦ φ. Then the outer measure µ∗ satisfies µ∗ ◦ φ = µ∗ .
Moreover, φ restricts to a bijection on the σ-algebra of µ∗ -measurable
subsets of Ω.

Proof. Let A ⊆ Ω be an arbitrary subset, and let En ⊆ Ω be any sequence of

sets. Then clearly A ⊆ ⋃ n≥1 En if and only if φ(A) ⊆ ⋃ n≥1 φ(En). If φ
defines a bijection on R, then also En ∈ R if and only if φ(En) ∈ R.
Furthermore we have from assumption that µ(φ(En)) = µ(En). By the
definition of µ∗ , we immediately get µ∗(A) = µ∗(φ(A)).

Now assume that E ⊆ Ω is µ∗ -measurable. Then we have

µ∗(A) = µ∗(φ−1(A))

= µ∗(φ−1(A) ∩ E) + µ∗(φ−1(A) \ E) = µ∗(A ∩ φ(E)) + µ∗(A \ φ(E)).

Since A is arbitrary, it follows that φ(E) is µ∗ -measurable. argument shows

that if φ(E) is µ∗ -measurable, then E was µ∗ -measurable to begin with.
This finishes the proof.

The reverse

Proposition 2.21. Consider the set A ⊆ 2R of half-open intervals A =

{(a, b] | a, b ∈ R, a ≤ b}. 8

Then A is a semi-ring, and the map µ0 : A →

[0, ∞] given by µ0 ((a, b]) = b − a is a measure on A.

Proof. Evidently ∅ ∈ A. We have (a, b] ∩ (c, d] = ( max(a, c), min(b, d)] , so

A is closed under intersections. The set-difference is given as a disjoint

union (a, b] \ (c, d] = (a, c] ∪ (d, b], so we see that A is a semi-ring.

Evidently µ0(∅) = 0, so we only need to show the σ-additivity.

Suppose that E = (a, b] ∈ A, and let En = (an, bn] ∈ A be a sequence of

pairwise disjoint sets with E = ⋃ n≥1 En. Given any M ≥ 1, we have in
particular ⋃M n=1 En ⊂ E. By reordering En for 1 ≤ n ≤ M and discarding
any empty sets if necessary, we may assume bn ≤ an+1 for all n < M . This
leads to

M∑ µ0(En) = M∑ bn−an ≤ bM −aM + M−1∑ an+1−an = bM −a1 ≤ b−a =

µ0(E). n=1

n=1

Since M is arbitrary, this leads to ∑∞ n=1 µ0(En) ≤ µ0(E). Let ε > 0 with ε <
b − a. Then in particular

[a + ε, b] ⊂ (a, b] = ⋃ (an, bn] ⊂ ⋃ (an, bn + 2−nε).

n≥1

The right-hand side is an open covering of the compact set on the left side,
and hence there is some N ≥ 1 such that [a+ε, b] ⊂ ⋃N n=1(an, bn +2−nε).
We change the ordering of the intervals appearing in this union by the
following inductive procedure: Choose k1 ∈ {1, . . . , N} to be the index so
that
ak1 = max { aj | a + ε ∈ (aj, bj + 2−jε)} . If b < bk1 + 2−k1ε, then the
procedure stops here. Otherwise, choose k2 ∈ {1, . . . , N} to be the index so
that ak2 = max { aj | bk1 + 2−k1 ∈ (aj, bj + 2−jε)} . If b < bk2 +2−k2ε, then
the procedure stops here. Otherwise one continues in ductively until the
procedure stops after L ≤ N steps. This yields an injective map k : {1, . . . ,
L} → {1, . . . , N} such that [a+ε, b] ⊂ ⋃L n=1(akn , bkn +2−knε) and such
that for all n < L, we have akn+1 < bkn + 2−knε. From this we can

8By convention, we set (a, b] := ∅ if a ≥ b.

deduce (with the help of the first part of the proof)

b−a−ε

< bkL + 2−kLε − ak1

= bkL + 2−kLε − akL + L−1∑ akn+1 − akn

n=1

< bkL + 2−kLε − akL + L−1∑ bkn + 2−knε − akn

n=1

= L∑ bkn + 2−knε − akn

n=1

≤ ε + N∑ bn − an n=1

≤ ε + b − a.

From this we may conclude

b − a ≤ 2ε + N∑ bn − an ≤ 2ε + ∞∑ µ0(En).

n=1

n=1
Since ε > 0 was arbitrary, this finally implies µ0(E) = ∑∞ n=1 µ0(En) and
finishes the proof.

Theorem 2.22 (Lebesgue). The measure µ0 : A → [0, ∞] defined on the

semi-ring of half-open intervals A ⊂ 2R extends to a translation-invariant
measure λ : L → [0, ∞] on the σ-algebra L ⊂ 2R of Lebesgue-measurable
sets, which contains the Borel σ-algebra of R. On the Borel σ-algebra, λ is
the unique measure extending µ0, and in fact the unique translation-invariant
measure with λ((0, 1]) = 1.

Proof. We use Proposition 2.7 and extend µ0 to a measure µ : R → [0, ∞] on

the ring R ⊂ 2R generated by A. We consider the outer measure µ∗ on

R as in Proposition 2.10. We appeal to Theorem 2.12 and define L as the σ-

algebra of µ∗ -measurable sets (which we call Lebesgue-measurable), which
contains A. Since A generates the Borel σ-algebra, it follows that L contains
the Borel σ-algebra. By Theorem 2.15, the measure λ = µ∗|L indeed extends
µ on A.

Let us argue why λ is translation-invariant. Let t ∈ R. Consider the bijection

φ on R given by φ(a) = t + a. Then evidently φ restricts to a bijection on A,
and we have µ = µ ◦ φ. By combining Proposition 2.19 and Proposition 2.20,
it follows that φ restricts to a bijection on L, and λ = λ◦φ. In other words, we
have λ(E + t) = λ(E) for every Lebesgue-measurable set E ∈ L. Since t ∈ R is
arbitrary, this shows the claim.

Lastly, the measure µ0 on A is σ-finite. Since A generates the Borel σ

algebra, the uniqueness of the measure follows from Theorem 2.17. On the
other hand, if we have a translation-invariant measure µ with µ((0, 1]) = 1,
then it is easy to see that µ((0, n 1 ]) = n 1 for all n ≥ 1, which one can use to
show that µ agrees with λ on all sets in A with rational endpoints. If a, b ∈ R
with a < b, pick some number c ∈ Q strictly between them. Choose
decreasing sequences of rational numbers an, bn ∈ Q such that an → a and bn
→ b. Then

(a, c] = ⋃ (an, c],

n≥1
(c, b] = ⋂ (c, bn].

n≥1

By the continuity of measures, we can conclude that µ((a, b]) = b − a, so µ

agrees with λ on all of A, hence µ = λ on the Borel σ-algebra.

Remark 2.23. WARNING! The σ-algebra of Lebesgue sets is indeed big ger
than the Borel σ-algebra. 9

Nevertheless, one sometimes refers to the Lebesgue measure to mean its

restriction on the Borel σ-algebra, and de notes that also by λ. In the exercise
sessions, we have already discussed an example of a set A ⊂ [0, 1] that is
not even Lebesgue-measurable.

2.4 Application: Lebesgue–Stieltjes Measures

Definition 2.24. Let X be a locally compact, σ-compact Hausdorff space. A
Borel measure on X is a measure on the Borel σ-algebra that assigns a finite
value to every compact subset of X. In the special case X = R, these are
called Lebesgue–Stieltjes measures.

Proposition 2.25. Let µ be a Lebesgue–Stieltjes measure. Then there exists a

unique increasing right-continuous function F : R → R such that F (0) = 0 and
µ((a, b]) = F (b) − F (a) for all a, b ∈ R, a < b.

Proof. From these properties it follows that if such a function F exists at all,
then it has to be given by the formula

F (t) = µ((0, 0 −µ((t, t]) 0])

, t > 0 , t = 0 , t < 0.

We claim that this function has indeed the right properties. Let a, b ∈ R with a
< b. We aim to show µ((a, b]) = F (b) − F (a). If a ≥ 0, then

9This is not so easy to see, but an example is discussed here, for whoever is
interested:
https://www.math3ma.com/blog/lebesgue-but-not-borel.

µ((a, b]) = µ((0, b] \ (0, a]) = µ((0, b]) − µ((0, a]) = F (b) − F (a). If b < 0, If
a < 0 ≤ b, then we have µ((a, b]) =

we can prove this analogously. µ((a, 0] ∪ (0, b]) = µ((a, 0]) + µ((0, b]) = F
(b) − F (a). The fact that F is increasing follows immediately from the fact
that µ has nonnegative values. The right-continuity follows from n→∞ lim F
(a + εn) = n→∞ lim µ((0, a + εn]) = µ((0, a]) = F (a) for any sequence εn > 0
with εn → 0. Here we used the continuity property of µ as a measure with
respect to countable decreasing intersections.

Theorem 2.26. The assignment µ → F , which assigns to every Lebesgue–

Stieltjes measure its increasing right-continuous function as in Proposition
2.25, is a bijection. In particular, whenever F : R → R is an increasing right
continuous function with F (0) = 0, there exists a unique Lebesgue–Stieltjes
measure µ such that µ((a, b]) = F (b) − F (a) for all a, b ∈ R with a < b.

Proof. We have already seen that every Lebesgue–Stieltjes measure gives an

increasing right-continuous function with these properties. Furthermore, the
assignment µ → F is injective. This is because F uniquely determines how µ
is defined on the semi-ring of half-open intervals. Since µ restricted to this
semi-ring is σ-finite, it follows from Theorem 2.17 that µ is hence uniquely
determined by F .

It remains to be shown that for every choice of F , there exists a corre

sponding Lebesgue–Stieltjes measure. Indeed, let us define A as the semi-
ring of bounded half-open intervals. As in Proposition 2.21, we define µ : A
→ [0, ∞] via µ((a, b]) = F (b) − F (a). For σ-additivity, assume E = (a, b]
can be expressed as the disjoint union of En = (an, bn]. If M ≥ 1 is any
number and we reorder the En for n ≤ M to ensure bn ≤ an+1, then it follows
from the fact that F is increasing that

M∑ µ(En) n=1

M∑ F (bn) − F (an) n=1

≤ F (bM) − F (aM) + M−1∑ F (an+1) − F (an)

n=1

= F (bM) − F (a1)

≤ F (b) − F (a)

= µ(E).

Since M was arbitrary, we have ∑∞ n=1 µ(En) ≤ µ(E).

For the reverse inequality, choose ε > 0. We use right-continuity to pick

for every n ≥ 1 a small δn > 0 such that F (bn + δn) − F (bn) ≤ 2−nε. Then

[a + ε, b] ⊂ E = ∞⋃ En ⊂ ∞⋃ (an, bn + δn)

n=1

Since the left side is compact, we obtain some N ≥ 1 such that [a + ε, b] ⊂

⋃N n=1(an, bn + δn). Now repeating exactly the same argument as in the
proof of Proposition 2.21, we may deduce F (b)−F (a+ε) ≤ ε+ ∑∞ n=1 F (bn)
−F (an). Letting ε → 0, we obtain the σ-additivity for µ.

The rest of the claim follows from the Carathéodory construction, exactly as
in the proof of Theorem 2.22, see also Corollary 2.18.

Example 2.27. For the choice F = idR, we recover the Lebesgue measure. On
the other hand, if a > 0 is some chosen number and we set

F (t) = 01

, t < a , t ≥ a,

then one can show that we recover the Dirac measure µ = δa. For a ≤ 0 one
also recovers this measure with the function
F (t) = 0 −1

, t < a , t ≥ a.

More generally, if µ is the measure corresponding to an increasing right

continuous function F , one can show that µ({a}) = 0 if and only if F is
continuous at a.

2.5 Product measures

Definition 2.28. Let (Ωi,Mi) be two measurable spaces for i = 1, 2. Then the
product σ-algebra M1 ⊗ M2 on Ω1 × Ω2 is defined as the σ-algebra
generated by all sets of the form E1 × E2 for E1 ∈ M1 and E2 ∈ M2, the so-
called measurable rectangles in Ω1 × Ω2.

Proposition 2.29. Let (Ωi,Mi, µi) be two measure spaces for i = 1, 2. Then
the set of measurable rectangles

A = {E1 × E2 | E1 ∈ M1, E2 ∈ M2} ⊆ 2Ω1×Ω2

is a semi-ring. Furthermore, the map µ0 : A → [0, ∞] given by µ0(E1×E2) =

µ1(E1)µ2(E2) is a measure on A.

Proof. This is proved in the exercise sessions.

Theorem 2.30. Let (Ωi,Mi, µi) be two measure spaces for i = 1, 2. Then the
measure µ0 on the measurable rectangles extends to a measure on the product
σ-algebra µ1 ⊗ µ2 : M1 ⊗ M2 → [0, ∞].

Proof. This is a special case of Corollary 2.18. Definition 2.31. Let d ≥ 1.

Then the Lebesgue measure on Rd is defined as the d-fold product measure

λ(d) = λ ⊗ λ ⊗ · · · ⊗ λ

︸
︸

︷︷

d times

with respect to the measure space (R,L, λ). The Lebesgue σ-algebra L(d) on
Rd is the one consisting of all λ(d) -measurable sets in the sense of Defi
nition 2.11, which contains the Borel σ-algebra. If the dimension d is clear
from context, we may sometimes slightly abuse notation and just write λ for
the Lebesgue measure on Rd .

Corollary 2.32. The Lebesgue measure λ(d) : L(d) → [0, ∞] is translation

invariant for all d ≥ 1.

Proof. We already know that the Lebesgue measure on R is translation in

variant, so there is nothing to prove when d = 1. We proceed by induction
and assume that d ≥ 2 is a number so that λ(d−1) is translation invariant.

Let t = (t1, . . . , td) = (t′ , td) ∈ Rd = Rd−1 ×R. Denote by µ0 the product
measure defined on the measurable rectangles. If E ∈ L(d−1) and F ∈ L are
measurable, then (E × F ) + t = (E + t′) × (F + td), and so µ0((E × F ) + t) =
λ(d−1)(E + t′)λ(F + td) = λ(d−1)(E)λ(F ) = µ0(E ×F ). It now follows
directly from Proposition 2.19 and Proposition 2.20 that λ(d)(A + t) = λ(d)
(A) for all A ∈ L(d) , concluding the proof.

We conclude this section with Fubini’s theorem, which is a fundamental

result that tells us how to compute integrals over product measures.

Theorem 2.33 (Tonelli; see exercises). Let (Ωi,Mi, µi) be two σ-finite mea
sure spaces for i = 1, 2. Let f : Ω1 × Ω2 → [0, ∞] be a M1 ⊗ M2-
measurable function. Denote fx = f(x, _) : Ω2 → [0, ∞] and f y = f(_, y) : Ω1
→ [0, ∞].

Then:

a. For every x ∈ Ω1, the function fx is M2-measurable.

b. For every y ∈ Ω2, the function f y is M1-measurable.

∫
c. The function Ω1 → [0, ∞] given by x → ∫ Ω2 fx dµ2 is M1-measurable.
d. The function Ω2 → [0, ∞] given by y → ∫ Ω1 f y dµ1 is M2-
measurable.
e. One has the equalities ∫ Ω1×Ω2 f d(µ1⊗µ2) = ∫ Ω1 ( ∫ Ω2 fx dµ2 )
dµ1(x) = ∫ Ω2 ( ∫ Ω1 f y dµ1 ) dµ2(y).

Theorem 2.34 (Fubini). Let (Ωi,Mi, µi) be two σ-finite measure spaces for i
= 1, 2. Let f : Ω1 × Ω2 → C be a M1 ⊗ M2-measurable function. Denote fx
= f(x, _) : Ω2 → C and f y = f(_, y) : Ω1 → C. Then the following are
equivalent:

A. ∫ Ω1×Ω2 |f | d(µ1 ⊗ µ2) < ∞;

B. ∫ Ω1 ( ∫ Ω2 |fx| dµ2 ) dµ1(x) < ∞;
C. ∫ Ω2 ( ∫ Ω1 |f y| dµ1 ) dµ2(y) < ∞.

If any (or every) one of these statements holds, then we have that

a. The function Ω1 → C given by x → ∫ Ω2 fx dµ2 is well-defined on a

conull set and µ1-integrable.
b. The function Ω2 → C given by y → ∫ Ω1 f y dµ1 is well-defined on a
conull set and µ2-integrable.
c. One has the equalities ∫ Ω1×Ω2 f d(µ1⊗µ2) = ∫ Ω1 ( ∫ Ω2 fx dµ2 )
dµ1(x) = ∫ Ω2 ( ∫ Ω1 f y dµ1 ) dµ2(y).

Proof. It follows from Tonelli’s theorem that the integrals appearing in (A),
(B), and (C) are always the same, so their finiteness are indeed equivalent.
Now let us assume that these statements are true. By splitting f up into its real
and imaginary parts, let us assume without loss of generality that f is a real
function.

By part (c) in Tonelli’s theorem and Proposition 1.38, it follows that the set
E of all x ∈ Ω1, for which fx is integrable, is in M1 and its complement is a
null set. Let f = f+ − f− be the canonical decomposition into positive
functions as in Remark 1.39. Then it is very easy to see that (f+)x = (fx)+ and
(f−)x = (fx)− . So f+ x and f− x are both integrable whenever x ∈ E.
Moreover the map

E ∋ x → ∫ Ω2 fx dµ2 = ∫ Ω2 f+ x dµ2 − ∫ Ω2 f− x dµ2

is M1-measurable by part (c) in Tonelli’s theorem, and this function is in fact
µ1-integrable because

∫ Ω1 ∣∣∣∣ ∫ Ω2 fx dµ2 ∣∣∣∣ dµ1(x) ≤ ∫ Ω1 ( ∫ Ω2 |fx| dµ2 ) dµ1(x) < ∞.

Hence it follows that

∫ = Ω1 ∫ ( Ω1 ∫ Ω2 χE fx ( dµ2 ∫ Ω2 ) fx dµ1(x) dµ2 ) dµ1(x) = = ∫ ∫ Ω1 Ω1

χE χE ( ( ∫ ∫ Ω2 Ω2 f+ f+ x x dµ2 dµ2 ) − dµ1(x) ∫ Ω2 f− x − dµ2 ∫ Ω1 ) χE
dµ1(x) ( ∫ Ω2 f− x dµ2 ) dµ1(x)

= ∫ Ω1 ( ∫ Ω2 f+ x dµ2 ) dµ1(x) − ∫ Ω1 ( ∫ Ω2 f− x dµ2 ) dµ1(x)

= ∫ Ω1×Ω2 f+ d(µ1 ⊗ µ2) − ∫ Ω1×Ω2 f− d(µ1 ⊗ µ2)

= ∫ Ω1×Ω2 f d(µ1 ⊗ µ2).

The remaining equality follows exactly in the same way by exchanging the
roles of Ω1 and Ω2.

2.6 Infinite products of probability measures

Lemma 2.35. Let Ω be a non-empty set and A ⊆ 2Ω a semi-ring with Ω ∈ A.
Suppose that µ : A → [0, 1] is a map with µ(Ω) = 1. Then µ is a measure if
and only if for every sequence An ∈ A of pairwise disjoint sets with Ω = ⋃
n≥1 An, one has ∑∞ n=1 µ(An) = 1.

Proof. Clearly any measure must satisfy this property, so the “only if” part is
clear. For the “if” part, we may first take A1 = Ω and An = ∅ for all n ≥ 2,
which immediately implies µ(∅) = 0. We need to show that µ is σ additive.
Let Bn ∈ A be a sequence of pairwise disjoint sets such that B = ⋃ n≥1 Bn ∈
A. As A is a semi-ring, we have Ω \ B = A1 ∪ · · · ∪ Aℓ for pairwise
disjoint sets A1, . . . , Aℓ ∈ A. Then the assumption implies on the one hand
that 1 = µ(B) + ∑ℓ n=1 µ(An). On the other hand, if we set Aℓ+k = Bk for k
≥ 1, then the sequence (An)n≥1 defines a pairwise disjount covering of Ω, so
1 = ∑∞ n=1 µ(An). Comparing these two equations, we see that µ(B) = ∑
n>ℓ µ(An) = ∑∞ n=1 µ(Bn), which shows our claim.
Definition 2.36. Let I be a non-empty index set and (Ωi,Mi, µi) a proba bility
space for every i ∈ I. We denote Ω = ∏ i∈I Ωi, and set

A = {∏ i∈I Ei | Ei ∈ Mi for all i ∈ I, Ei = Ωi for all but finitely many i ∈ I } .

The σ-algebra generated by A is denoted by ⊗ i∈I Mi. We consider the map

µ : A → [0, 1] given by µ(∏ i∈I Ei) = ∏ i∈I µi(Ei). Note that these products
are well-defined as all but finitely many factors are assumed to be 1.

Theorem 2.37. Adopt the notation from the above definition. Then A is a
semi-ring and µ is a measure on A. In particular, it extends uniquely to a
probability measure µ on ⊗ i∈I Mi. One also denotes µ = ⊗ i∈I µi.

Proof. The “in particular” part is due to Corollary 2.18. By definition, every
set in A is a product of subsets of the spaces Ωi which are proper subsets
only over finitely many indices. In particular, given any sequence An, there
are countably many indices in I so that over every other index i ∈ I, the
projection of every set An to the i-th coordinate yields Ωi. Considering the
semi-ring axioms for A and the axioms of being a measure for µ, we can see
that it is enough to consider the case where I is countable. As we already
know that the claim is true if I is finite, we may from now on assume I = N.

The fact that A is a semi-ring now follows directly from the exercises. We
hence need to show that µ is a measure on A, where our goal is to appeal to
the condition in the above lemma. Suppose that An ∈ A is a sequence of
pairwise disjoint sets with Ω = ⋃ n≥1 An.

It is our intention to show ∑∞ n=1 µ(An) = 1.

For each n ≥ 1 let us write An = ∏∞ i=1 An,i and pick in ≥ 1 such that An,i =
Ωi whenever i > in. For all m, n ∈ N and ω = (ωi)i ∈ Am we claim to have
the equation

∏ χAn,i(ωi) · ∏ µi(An,i) = δn,m.

i≤im
i>im

Indeed, if n = m, then all the involved factors are equal to one, so the
equation holds. Assume n = m. We observe χAk((ωi)i) = ∏ i≤ik χAk,i(ωi).
As 1 = ∑∞ k=1 χAk , we conclude for (ωi)i ∈ Am that 0 = ∏ i≤in χAn,i(ωi).
So either in ≤ im, in which case the desired inequality above follows
immediately. Or, if in > im, then every tuple of the form (ω1, . . . , ωim ,
αim+1, . . . ) is also in Am, so analogously

0 = ∏ χAn,i(ωi) ·

i≤im

in∏ χAn,i(αi).

i=im+1

By consecutively integrating over αi for i = im + 1, . . . , in, we recover the

equation we wish to show. Finally, let us assume that ∑∞ n=1 µ(An) = 1. We

have

∞∑ µ(An)

n=1

∞∑ ∞∏ µi(An,i) n=1 i=1

= n=1 ∞∑ ∫ Ω1 χAn,1(ω1) · ( i=2 ∞∏ µi(An,i) ) dµ1(ω1)

= ∫ Ω1 n=1 ∞∑ χAn,1(ω1) · ( i=2 ∞∏ µi(An,i) ) dµ1(ω1)

As this quantity is not 1, there must be some ω1 ∈ Ω1 for which

n=1 ∞∑ χAn,1(ω1) · ( i=2 ∞∏ µi(An,i) ) = 1.

By iterating this process, we can come up with a tuple ω = (ωi)i ∈ Ω
satisfying

n=1 i≤k

∞∑ [ ∏ χAn,i(ωi) · ∏ µi(An,i) ] = 1

i>k

for all k ≥ 1. However, since ω ∈ Am for some m, it follows from our

previous observation for k = im that precisely one of these summands is 1
and the others are zero. This leads to a contradiction. Hence we have indeed
that ∑∞ n=1 µ(An) = 1.

3 Probability
3.1 Probability spaces and random variables
Definition 3.1. A probability space is a measure space (Ω,M, µ) with µ(Ω) =
1. In this context,

µ is called a probability measure,

Ω is called a sample space,
the elements of Ω are called outcomes,
the elements of M are called events,
the value µ(A) is called the probability of (the event) A. (To be even
more suggestive, one often writes P instead of µ.) The following easy
special case illustrates how the most elementary prob-

abilistic experiments, such as tossing a coin only finitely many times,

can be thought of in this framework. We omit the easy proof.
40

Proposition 3.2. Let Ω be a non-empty countable set and M = 2Ω . Let p : Ω

→ [0, 1] be any map with the property ∑ ω∈Ω p(ω) = 1. Then the assignment
2Ω ∋ A → P(A) = ∑ ω∈A p(ω) defines a probability measure. Conversely,
if P is a probability measure on Ω, then p(ω) = P({ω}) defines a map with ∑
ω∈Ω p(ω) = 1.

For more involved probabilistic questions, the sample space is not neces
sarily countable, which makes it somewhat more difficult to come up with the
right choices of events and probability measures. The following can be seen
as a model for tossing a coin infinitely often.

Example 3.3 (infinite coin tossing). For each toss, a coin can only come up as
heads or tails, which we conveniently denote as the outcomes 0 and 1. Since
we want to model tossing the coin infinitely often, the outcomes are
sequences having value 0 or 1. This gives rise to the sample space Ω = {0,
1}N .

A rather obvious example of an event is when the first k tosses are equal to
some k-tuple ω ∈ {0, 1}k . The associated subset of Ω is given as Aω = {ω}
× {0, 1}N>k . For example, A1 is the event where the first coin toss comes
up as tails, or A0,1,1 is the event where the first three tosses come up as
heads → tails → tails. Let M be the event σ-algebra generated by all such
sets. We note that a lot of other natural choices for events are automatically in
M. For example, the event that the n-th coin toss comes up as heads is given
by

{0, 1}n−1 × {0} × {0, 1}N>n =

⋃ A(ω,0).

ω∈{0,1}n−1

As for determining the probability, we want all of our coin tosses to be fair,
meaning that there is always an equal chance that either heads or tails comes
up. In particular, the coin tosses should all be independent from each other. If
µ : M → [0, 1] is supposed to be a probability measure modelling this
behavior, then we can agree on µ(A0) = µ(A1) = 2 1 . Inductively, we may
conclude that for all ω ∈ {0, 1}n , one should have µ(Aω) = 2−n . In this
case, if we view {0, 1} as a discrete space, we may give Ω the product
topology, in which case M is the Borel σ-algebra. If µ2 : {0, 1} → [0, 1] is
the measure that assigns the value 2 1 to both singletons, then µ is in fact the
infinite product measure µ = ⊗∞ n=1 µ2.

Definition 3.4. Let (Ω,M,P) be a probability space and W a topological

space. A W -valued random variable X is a measurable map X : Ω → W .
For a Borel subset B ⊆ W , we will frequently use the popular notation (X ∈
B) for the preimage X−1(B).

Remark 3.5. One sometimes also says “stochastic variable”. In the partic ular
case W = Rn , one calls it a vector random variable, and for W = R, a real
random variable. The case W = RN is referred to as a random sequence. In
the case of a real random variable X, we will also freely play around with
the above notation, for example (|X| ≤ 1) is written instead of (X ∈ [−1, 1]),
etc.

Remark 3.6. From our previous study on measurable maps we can observe
that real random variables are closed under addition and multiplication, and
pointwise limits. General random variables are closed under the same type
of operations under which measurable maps are closed.

Definition 3.7. Let (Ω,M,P) be a probability space and X a W -valued

random variable. The distribution of X is the Borel probability measure PX
on W given by PX(B) = P(X ∈ B). If X and Y are two W -valued random
variables (defined on possibly different probability spaces), we say that X
and Y are identically distributed if PX = PY .

Definition 3.8. Let X be a real random variable. Then its (cumulative)

distribution function FX : R → R is defined as FX(t) = PX((−∞, t]) = PX(X
≤ t).

Indeed, in the situation above, the measure PX on R is a Lebesgue– Stieltjes

measure, so we know from the previous chapter that FX is an in creasing and
right-continuous function.

Definition 3.9. Let X be a real random variable on a probability space

(Ω,M,P). We say that X has an expected value (or mean), if X is integrable as
a function X : Ω → R. In this case, its expected value (or mean) is defined
as

E(X) = ∫ Ω X dP.

Proposition 3.10. Let X be a real random variable. Then X has an expected

value if and only if id : R → R is PX-integrable. In this case, its expected
value is E(X) = ∫ R x dPX(x). In particular, the expected value only depends
on the distribution of X.

Proof. This is a direct consequence of Theorem 1.54 and Remark 1.55.

Proposition 3.11. Let W be a topological space and X a W -valued random

variable with distribution PX . Let f : W → R be a Borel measurable
function. Then f ◦ X = f(X) has an expected value if and only if f is PX-
integrable. In that case, E(f(X)) = ∫ W f dPX .

Notation 3.12. Certain expected values get a special name. Let X be a real
random variable.

i. Given k ≥ 0, we call E(|X|k) the k-th absolute moment. If it exists, then

E(Xk) is called the k-th moment.
ii. Suppose X has an expected value mX = E(X). Then Var(X) = E((X−
mX)2) is called the variance of X.
iii. More generally, suppose X and Y are two real random variables on the
same probability space, for which the second moments exist. Then it
follows from Hölder’s inequality that all of X, Y , XY have expected
values, and the value Cov(X, Y ) = E((X −mX)(Y −mY )) is called the
covariance of X and Y .

3.2 Independence
Definition 3.13. Let (Ω,M,P) be a probability space. Two events A, B ∈ M
are called independent, if P(A ∩ B) = P(A)P(B).

Example 3.14. In the context of seeing measure theory as a way to assign a

“volume” to certain sets, this notion has no clear meaning. The concept does
however become relevant if one adopts the probabilistic point of view. For
example, keep in mind Example 3.3 modelling the infinite coin tossing. In
that context, we may for example define A to be the event where both the first
and second toss comes up as heads, and B the event where both the third and
fourth toss comes up as tails. In other words,

A = {0} × {0} ⊗ {0, 1}N≥3 ,

B = {0, 1} × {0, 1} × {1} × {1} ⊗ {0, 1}N≥5 .

In our model we assume that the individual coin tosses are not supposed to
influence each other, and hence we should certainly view these events as
independent. Indeed, we have here

A ∩ B = {0} × {0} × {1} × {1} ⊗ {0, 1}N≥5 ,

so by the properties of the product measure µ we can see here that µ(A) = 4 1
, µ(B) = 4 1 , and µ(A ∩ B) = 16 1 . Similarly, all pairs of events defined by
the outcomes of coin tosses happening at distinct times will be independent
in this model.

Definition 3.15. Let (Ω,M,P) be a probability space. A family of events

{Ai}i∈I ⊆ M is called independent, if for all pairwise distinct indices i1, . .
. , in ∈ I, one has P(Ai1 ∩ Ai2 ∩ · · · ∩ Aii) = P(Ai1)P(Ai2) · · ·P(Ain ).

Definition 3.16. Let Ω be a non-empty set. For a sequence An ∈ 2Ω , we

define

lim sup An = ⋂ ⋃ Ak

n→∞

and lim inf An = ⋃ ⋂ Ak.

n→∞

n≥1 k≥n

n≥1 k≥n
Example 3.17. Let us motivate this again from the point of view of our
infinite coin tossing model. Let Hn be the event that where the n-th toss
comes up heads. Then the event lim supn→∞ Hn describes the situation
where heads comes up infinitely many times, and the event lim infn→∞ Hn
describes the situation where heads comes up in all but finitely many tosses.
For these reasons, it is not uncommon in a probabilistic context to use the
notation

lim sup An = (An, infinitely often) = (An, i.o.)

n→∞

lim inf An = (An, almost always) = (An, a.a.).

n→∞

and

Proposition 3.18. Let (Ω,M,P) be a probability space, and An

∈ M a sequence. Then
P(lim n→∞ inf An) ≤ lim n→∞ inf P(An) ≤ lim n→∞ sup P(An) ≤ P(lim
n→∞ sup An). Proof. See exercises.

Lemma 3.19 (Borel–Cantelli Lemma). Let (Ω,M,P) be a probability space,

and An ∈ M a sequence.

(i) Suppose ∞∑ P(An) < ∞. Then P(lim sup An) = 0.

n=1

n→∞

(ii) Suppose ∞∑ P(An) = ∞. If {An}n∈ N is an independent

family, then
n=1
P(lim sup An) = 1.

n→∞

Proof. We prove (i) in the exercise sessions, so we only need to prove (ii).

We will use the fact from the exercise sessions that the family of com
plements {Ac n}n∈N is also independent. Moreover, we are about to use the
well-known inequality 1 − x ≤ e−x for all x ∈ R. We observe for all n ≥ 1

that

P ( k=n ∞⋂ Ac k )

= m→∞ lim P ( k=n m⋂ Ac k ) = m→∞ lim m∏ P(Ac k)

k=n

= m→∞ lim m∏ (1 − P(Ak))

k=n

≤ m→∞ lim m∏ exp(−P(Ak))

k=n

= m→∞ lim exp ( − k=n m∑ P(Ak) ) = 0.

We conclude that ⋃ n≥1 ⋂ k≥n Ac k = lim infn→∞ Ac n is a null set. lim

supn→∞ An = (lim infn→∞ Ac n)c is co-null, which shows our claim.

Therefore

Example 3.20. Let us have yet another look at our infinite coin tossing model
and what we observed in Example 3.14. Let Hn denote the event where the n-
th coin toss comes up as heads. Then the family {Hn}n∈N is independent,
each event has probability 2 1 , and hence it follows from the above that the
event (Hn, infinitely often) has probability 1.

Definition 3.21. Let (Ω,M,P) be a probability space. A family of sub-σ

algebras Mi ⊆ M for i ∈ I is called independent, if every collection {Ai}i∈I
⊆ M with Ai ∈ Mi is independent.

Definition 3.22. Let (Ω,M,P) be a probability space. Let I be an index set,

and let Xi be a random variable on (Ω,M,P) with values in the topological
space Wi for i ∈ I. We call the family {Xi}i∈I independent, if for every
collection {Si}i∈I of Borel sets Si ⊆ Wi, we have that the family of events
{(Xi ∈ Si)}i∈I is independent. (In other words, if we denote by Bi the Borel
σ-algebra on Wi, this condition means that the family of pullback σ-algebras
{X∗ i (Bi)}i∈I is independent in the above sense.)

Theorem 3.23. Let {Pi}i∈I be a family of Borel probability measures on R.

Then there exists a probability space (Ω,M,P) with an independent family of
real random variables {Xi}i∈I such that Pi = PXi.

Proof. We set (Ω,M,P) = ⊗ i∈I(R,B,Pi) and define Xi : Ω → R to be the

projection onto the i-th component. It is an easy exercise to see that this
yields an independent family with the desired property.

Since the proof of the following is very easy, we leave it as an exercise.

Proposition 3.24. Let (Ω,M,P) be a probability space. Let I be an index set,

and let Xi be a random variable on (Ω,M,P) with values in the topological
space Wi for i ∈ I. For each i ∈ I, let Vi be another topological space and fi :
Wi → Vi a Borel measurable map. If the family of random variables {Xi}i∈I
is independent, then so is {fi(Xi)}i∈I .

Proposition 3.25. Let X and Y be two mutually independent, real random

variables defined on a probability space (Ω,M,P). If both X and Y have a
mean, then so does XY , and in fact E(XY ) = E(X)E(Y ).

Proof. Consider the pullback σ-algebras MX = {(X ∈ S) | S ⊆ R Borel} ,

MY = {(Y ∈ S) | S ⊆ R Borel} ,
which are both sub-σ-algebras of M. Given any A ∈ MX and B ∈ MY , it
follows by independence that

∫ Ω χAχB dP = P(A ∩ B) = P(A)P(B) = ∫ Ω χA dP · ∫ Ω χB dP.

By linearity, it follows that for any MX-measurable simple function s : Ω →

[0, ∞) and MY -measurable simple function t : Ω → [0, ∞), one has the
equation

∫ Ω st dP = ∫ Ω s dP · ∫ Ω t dP.

By the Monotone Convergence Theorem, this even holds when s and t are not
assumed to be simple. Applying this to s = |X| and t = |Y | yields E(|XY |) =
E(|X|)E(|Y |) < ∞, so indeed XY has a mean. Furthermore if we decompose X
= X+ − X− and Y = Y + − Y − , then

E(XY )

= ∫ Ω XY dP = ∫ Ω(X+ − X−)(Y + − Y −) dP = ∫ Ω X+Y + + X−Y − − X−Y +

− X+Y − dP

= E(X+)E(Y +) + E(X−)E(Y −) − E(X−)E(Y +) − E(X+)E(Y −) = (E(X+) −

E(X−))(E(Y +) − E(Y −)) = E(X)E(Y ).

3.3 Law of Large Numbers

Before we come to the main point of this subsection, we need to cover some
observations that are useful in computations.

Theorem 3.26 (Chebyshev’s inequality). Let X be a real random variable on

the probability space (Ω,M,P). Suppose that f : R → R≥0 is an even
measurable function whose restriction to R≥0 is increasing. Let a ≥ 0 be any
number with f(a) > 0. Then

P(|X| ≥ a) ≤ f(a)E(f(X)). 1 Proof. We simply observe:

E(f(X)) =

∫ Ω f ◦ X dP

≥ ∫ Ω (f ◦ X)χ|X|≥a dP ≥ f(a) ∫ Ω χ|X|≥a dP

= f(a)P(|X| ≥ a).

Corollary 3.27. Let X be a real random variable on the probability space

(Ω,M,P).

i. For all a, p > 0, one has P(|X| ≥ a) ≤ apE(|X|p). 1

ii. If X has an expected value, then for all a > 0, one has

P(|X − mX | ≥ a) ≤ a2 1 Var(X).

Proof. The first part follows from the general Chebyshev inequality for f(x) =
|x|p. The second part follows when applying it to X −mX as the real random
variable and the function f(x) = x2 .

Definition 3.28. Let Xn be a sequence of real random variables defined on a

probability space (Ω,M,P). Given a real random variable X on the same
probability space, we say that Xn converges to X in probability, if for all ε >
0, one has P(|X − Xn| > ε) n→∞ −→ 0. One writes Xn → p X.

Theorem 3.29 (Weak Law of Large Numbers). Let Xn be a sequence of

identically distributed real random variables defined on a probability space
(Ω,M,P) that are pairwise independent. Suppose that every (or any) Xn has a
mean m ∈ R. Then it follows that n 1 ∑n k=1 Xk → p m.

Proof. Since they are identically distributed, it follows that both the mean and
the variance of Xn are the same for every n ≥ 1. We may assume without loss
of generality m = 0. Let us first assume that the variance of Xn is finite and
equal to σ2 for all n ≥ 1. For every n ≥ 1, denote ¯Xn = n 1 ∑n k=1 Xk. We
have that ¯Xn also has mean 0, and hence
Var( ¯Xn) = E( ¯X2 n) = n2E 1 ( j,k=1 n∑ XjXk ) = nσ2 n2 = σ2 n . Here we
used Proposition 3.25 for j = k. By applying Corollary 3.27(ii), we may
hence see P(| ¯Xn| ≥ ε) ≤ E( ε2 ¯X2 n) = nε2 σ2 n→∞ −→ 0 for all ε > 0.
Now for the general case, we allow the possibility that Xn has infinite
variance. Let us still assume m = 0. For all n, N ≥ 1, write Xn = X≤N n +
X>N n , where X≤N n = Xn · χ(|Xn|≤N). Then by the Monotone Convergence
Theorem aN := E(|X>N n |) = E(|Xn|) − E(|X≤N n |) N→∞ −→ 0, where we
are using that these values do not depend on n as the Xn were identically
distributed. Let us also denote

¯X≤N n = n 1 j=1 n∑ X≤N j

and ¯X>N n = n 1 j=1 n∑ X>N j .

We have in particular for any ε > 0 that

P(| ¯X>N n | ≥ ε) ≤ E(| ¯X>N ε n |) ≤ aN ε .

So, given any δ > 0 with δ < 1 and any N ≥ 1 large enough such that aN ≤ 2δε
1 for all n ≥ 1, it follows that P(| ¯X>N n | ≥ ε/2) ≤ δ. Hence

P(| ¯Xn| ≥ ε)

≤ P (| ¯X>N n | + | ¯X≤N n | ≥ ε ) ≤ δ + P (| ¯X>N n | + | ¯X≤N n | ≥ ε ∧ |

¯X>N n | ≤ ε/2 )

≤ δ + P (| ¯X≤N n | ≥ ε/2 )

≤ δ + P (| ¯X≤N n − (E( ¯X≤N n ) + E( ¯X>N n ))| ≥ ε/2 ) ≤ δ + P (| ¯X≤N n

− E( ¯X≤N n )| ≥ ε 2(1 − δ))

n→∞ −→ δ.

Here we have used that X≤N n has finite variance and the above falls into
our previous subcase. As δ > 0 was arbitrary, this verifies ¯Xn → p 0 in
general.

There is also a stronger law, i.e., a theorem with a stronger conclusion than
the weak law of large numbers. We will prove this strong law under an
additional assumption. 10

Theorem 3.30 (Strong Law of Large Numbers). Let Xn be an independent

sequence of identically distributed real random variables defined on a proba
bility space (Ω,M,P). Suppose that every (or any) Xn has a fourth moment,
meaning that E(X4 n) < ∞. If m = E(Xn) is the mean of all these random
variables, then it follows that P ( n 1 ∑n k=1 Xk → m ) = 1.

Proof. We first note that applying Hölder’s inequality twice yields E(|Xn|)4 ≤
E(X2 n)2 ≤ E(X4 n) =: M4, hence the assumptions on Xn ensure that m
exists.

As before, we assume without loss of generality m = 0 by replacing each Xn

by Xn − m, if necessary. Denote ¯Xn = n 1 ∑n k=1 Xk. Then

E( ¯X4 n) = n4 1 i,j,k,l=1 n∑ E(XiXjXkXl).

Considering the summand over the tuple (i, j, k, l), it follows from the inde
pendence of the sequence Xn and Proposition 3.25 that it is zero if there is
one entry in this tuple that is different from all the other three. In other words,
the summand can only be non-zero if all four entries agree, or if the tuple has
two different indices occuring exactly twice. Two different entries can
possibly occur in the pattern (k, k, l, l), (k, l, k, l) or (k, l, l, k), leading to
3n(n − 1) possible summands. Using Proposition 3.25 once again, the
expression hence simplifies to

E( ¯X4 n) = n4 1 ( k=1 n∑ E(X4 k) + 3 k,l=1 n∑ E(X2 k)E(X2 l ) )

k =l

By the very beginning of the proof, each summand in the bracket can be esti
mated above by M4, and there are a total of n + 3n(n − 1) ≤ 3n2 summands,
hence

E( ¯X4 n) ≤ 3n2M4 n4 = 3M4 n2 . If we apply Corollary 3.27 for p = 4, it

follows for every ε > 0 that

P(| ¯Xn| ≥ ε) ≤ n2ε4 3M4 .

10The conclusion of the theorem is actually true under the same assumptions
we had for the weak law. The proof of the general case is however quite
demanding, which is why we add some common extra assumptions allowing
for a much simpler proof. Anyone who is interested in the proof of the
general case is referred to, for example, “Theorem 22.1” in the book
“Probability and Measure” by Patrick Billingsley.

The right side is a summable sequence of non-negative numbers. Hence it

follows by the Borel–Cantelli Lemma that

P(| ¯Xn| ≥ ε, infinitely often) = 0. However, we note that by the definition of

convergence of sequences, we have ( ¯Xn → 0) = ⋃ (| ¯Xn| ≥ ε, infinitely
often),

ε>0 ε∈Q

and hence it really follows that ¯Xn → 0 occurs almost surely.

3.4 Central Limit Theorem

Definition 3.31. Given a real random variable X, one defines its character
istic function φX : R → C via φX(t) = E(eitX).

Theorem 3.32 (Lévy’s Inversion Theorem). Let X be a real random variable

with characteristic function φX and distribution PX .

Then one has for all a, b ∈ R with a < b that

T lim →∞ 2π 1 ∫ −T T e−ita it − e−itb φX(t) dt = 2 1 PX({a}) + PX ((a, b))

+ 2 1 PX({b}). Proof of Theorem 3.32. Note first that due to the l’Hôpital
rule, we have limt→0 e−ita −e−itb t = i(b−a), so the function in the integral
is to be understood as a continuous function in this sense. Using Fubini’s
theorem, we may write

∫ −T T e−ita it − e−itb ︸ φX(t) ︷︷︸

=E(eitX)

dt = ∫ R ( ∫ −T T eit(x−a) it − eit(x−b) dt ) dPX(x).

For the purpose of the proof, we define the function κ : R → R via κ(x) =

∫ 0 x sin t t

dt for x ≥ 0, and κ(x) = −κ(−x) when x < 0. We will appeal

(without proof) to a fact from calculus stating that limx→∞ κ(x) = π 2 . We

obviously have the identity dx d κ(x) = sin(x) x for all x ≥ 0. Using the fact
that cos is an even function, we hence observe that

∫ −T T eit(x−a) it − eit(x−b) dt

= ∫ −T T cos(t(x − a)) − cos(t(x − b)) + it i sin(t(x − a)) − i sin(t(x − b)) dt = ∫

−T T sin(t(x − a)) − t sin(t(x − b)) dt = 2κ(T (x − a)) − 2κ(T (x − b)). 50

In particular,∫ −T T e−ita it − e−itb φX(t) dt = 2 ∫ R κ(T (x − a)) − κ(T (x −

b)) dPX(x) In order to determine what happens within the integral when T →
∞, we note that due to the initial remark about κ we obtain

T lim →∞ κ(T (x − a)) − κ(T (x − b)) = π π/2 0

, x < a or x > b , a < x < b , x = a or x = b.

Since PX is a probability measure, it follows from the Dominated

Convergence Theorem that

T lim →∞ 2π 1 ∫ −T T e−ita it − e−itb φX(t) dt = ∫ R 2 1 χ{a,b} + χ(a,b) dPx,

which is precisely the claim.

We will use the following fact from real analysis without proof:11
Theorem 3.33. A monotone function f : R → R can have at most countably
many points of discontinuity.

Corollary 3.34. Two real random variables are identically distributed if and
only if their characteristic functions are equal.

Proof. The “only if” part is clear, so we show the “if” part. Let X and Y be
two real random variables. The distribution function FX(t) = PX((−∞, t]) is
increasing, and therefore by the above theorem, it is discontinuous in at most
countably many points. In analogy to Example 2.27, this means that we may
have PX({t}) = 0 for at most countably many t ∈ R. The same observation is
of course true for Y in place of X. Let W ⊆ R be the subset of all points t
with PX({t}) = 0 = PY ({t}). Then W is co-countable and in particular
dense. It is then an easy exercise to show that the semi-ring

AW = {(a, b] | a, b ∈ W, a < b} ⊆ 2R

generates the Borel σ-algebra. By Theorem 2.17, it follows that PX = PY

holds if and only if PX and PY agree on AW . If we assume φX = φY , then
this is a direct consequence of Theorem 3.32.

11For those interested, see for example “Theorem 4.30” in the book
“Principles of Math ematical Analysis” (third edition) by Walter Rudin.

Definition 3.35. A sequence Pn of Borel probability measures on R is said to

weakly converge to a Borel probability measure P, if one has

n→∞ lim ∫ R f dPn = ∫ R f dP

for every compactly supported continuous function f : R → R. We write

Pn → w P.

If Fn is the distribution function for Pn and F the distribution function

for P, we say that Fn converges to F weakly, written Fn → w F , if Pn → w P.

Definition 3.36. A sequence of real random variables Xn is said to converge
in distribution to a real random variable X, written Xn → d X, if PXn → w
PX . Remark. WARNING! Unlike the notions of convergence we considered
be fore, convergence of real random variables in distribution is not well-
behaved with respect to standard operations such as addition or
multiplication. That is, if Xn → d X and Yn → d Y holds, it is not
necessarily true that Xn + Yn → d X + Y or XnYn → d XY .

We also note that it is shown in the exercises that for a sequence Pn → w P,

the same kind of limit behavior follows for f being any bounded continuous
function on R, not necessarily compactly supported. We will subsequently
use this characterization of weak convergence without further mention.

Theorem 3.37. Let Fn, n ≥ 1, and F be distribution functions with respect to

Borel probability measures Pn and P. Let C(F ) ⊆ R be the set of all points
over which F is continuous. Then Fn → w F if and only if Fn(x) → F (x) for
all x ∈ C(F ).

Proof. Let us first assume Fn → w F holds. We shall first show the following
intermediate claim. If A ⊆ R is a closed interval, then lim supn→∞ Pn(A) ≤
P(A). Indeed, we may define a pointwise decreasing sequence of piecewise
linear functions fk : R → R with fk|A = 1 and limk→∞ fk(t) = 0 for all t /∈ A.
Then

lim n→∞ sup Pn(A) ≤ lim n→∞ sup ∫ R fk dPn = ∫ R fk dP,

k ≥ 1.

Here we used the fact that Pn → w P. Letting k → ∞ on the right side, we

obtain P(A) as the limit by the Dominated Convergence Theorem, so the
intermediate claim is proved.

Now let x ∈ C(F ). Then it follows from the intermediate claim that

lim sup Fn(x) = lim sup Pn((−∞, x]) ≤ P((−∞, x]) = F (x).

n→∞

Since x is a continuous point of F , we also have P({x}) = 0 and hence

F (x) = P((−∞, x))

1 − P([x, ∞))

≤ 1 − lim sup Pn([x, ∞))

n→∞

= lim inf Pn((−∞, x))

n→∞

≤ lim inf Fn(x).

n→∞

Conversely, let us assume that Fn(x) → F (x) holds for all x ∈ C(F ). By
definition, we may easily observe Pn((a, b]) → P((a, b]) for all a, b ∈ C(F )
with a < b. Therefore, it follows that

∫ R f dPn → ∫ R f dP
holds whenever f belongs to the linear subspace spanned by all indicator
functions χ(a,b] for a, b ∈ C(F ). Since C(F ) is dense in R, it is easy to see
that the closure of this linear subspace with respect to the sup-norm ∥ · ∥∞
contains the space of continuous compactly supported functions. So (with a
standard ε/2-argument) we may conclude that the above limit behavior even
holds for f being any compactly supported continuous function.

Definition 3.38. A sequence of Borel probability measures (Pn)n on R is

called tight, if lim lim inf Pn([−R, R]) = 1.

R→∞ n→∞

Remark. In the exercise sessions, we will prove the following analytical

statement which is useful for the proof of the next theorem. Let Fn : R → [0,
1] be a sequence of increasing right-continuous functions satisfying the limit
formula limR→∞ Fn(R) − Fn(−R) = 1.

If the sequence Fn(x) is convergent for a dense set of numbers x ∈ R,

then there exists an increasing right-continuous function F : R → [0, 1]
such that Fn(x) → F (x) holds whenever F is continuous in x.
Suppose that a limit function F as above exists and that

R→∞ lim lim n→∞ inf Fn(R) − Fn(−R) = 1. Then F is the distribution
function for a Borel probability measure on R. Theorem 3.39 (Helly’s
Selection Theorem). For any tight sequence (Pn)n of Borel probability
measures on R, there exists a subsequence (Pnk)k and a Borel probability
measure P on R such that Pnk → w P.

Proof. Let Fn be the distribution function of Pn for every n ≥ 1. Then Pn being

tight translates to

R→∞ lim lim n→∞ inf Fn(R) − Fn(−R) = 1. Of course this tightness
criterion holds for every subsequence as well. By Theorem 3.37 and the
above remark, it suffices to show that there is some increasing sequence of
natural numbers nk such that Fnk converges pointwise on a dense set of real
numbers, for example on the rational numbers Q. Let N ∋ ℓ → qℓ be an
enumeration of Q. By Bolzano–Weierstrass, we know that there is some
increasing sequence of numbers n(1,k) such that Fn(1,k)(q1) converges as k
→ ∞. Applying Bolzano–Weierstrass again, we know that (n(1, k))k admits a
subsequence (n(2, k))k such that Fn(2,k)(q2) converges as k → ∞. Proceed
like this inductively, and find finer and finer subsequences (n(ℓ, k))k such
that Fn(ℓ,k)(qℓ) converges as k → ∞. Finally define nk = n(k, k). Then (nk)k
is a subsequence of (n(ℓ, k))k (up to the finitely many indices k ≤ ℓ) for every
ℓ ≥ 1, so indeed Fnk(qℓ) converges as k → ∞, for every ℓ ≥ 1. This finishes
the proof.

Lemma 3.40. Let X be a real random variable. Then we have for all R > 0
the estimate

P(|X| > 2R) ≤ R ∫ −1/R 1/R 1 − φX(t) dt.

Proof. Using Fubini’s theorem we compute

∫
R ∫ −1/R 1/R 1 − φX(t) dt

2 − R ∫ −1/R 1/R φX(t) dt

= 2 − R ∫ −1/R 1/R ∫ R eitx dPX(x) dt

= 2 − R ∫ R ∫ −1/R 1/R cos(tx) dt dPX(x) = 2 − 2R ∫ R ∫ 0 1/R cos(tx) dt

dPX(x) = 2 − 2R ∫ R sin(x/R) x dPX(x) = 2E ( 1 − sin(X/R) X/R )

Note that we have sin(X/R) X/R then |x/R| > 2, in which case | sin(x/R)| ≤ 1 ≤
|x|/2R, hence 1− sin(x/R) x/R ≥ 2 1 . This leads to the inequality χ(|X|>2R) ≤
2(1 − sin(X/R) X/R ), and hence forming the expected value on both sides
yields the claim.

≤ 1. Moreover, if any x ∈ R satisfies |x| > 2R,

Theorem 3.41 (Lévy’s Continuity Theorem). Let Xn and X be real random

variables for n ≥ 1. Then Xn → d X holds if and only if φXn (t) → φX(t) for
all t ∈ R.

Proof. The direction “⇒” holds by definition of the characteristic functions

and by what it means to converge in distribution. For the converse, let us
assume φXn (t) → φX(t) for all t ∈ R. Let us first show that the sequence
PXn is tight. Indeed, using the above lemma we observe for R > 0 that

PXn ([−R, R])

= 1 − PXn (|Xn| > R)

n→∞ −→ 1 − 2 · R 4 ∫ −2/R 2/R 1 − φX(t) dt.

≥ 1 − R 2 ∫ −2/R 2/R 1 − φXn (t) dt

Here we also used the Dominated Convergence Theorem. Using the fact that
φX is a continuous function with φX(0) = 1, we see that the integral R 4 ∫
−2/R 2/R 1 − φX(t) dt goes to zero as R → ∞. Hence the above inequality
yields
R→∞ lim lim n→∞ inf PXn ([−R, R]) = 1,

or in other words the tightness of the sequence PXn .

In order to show Xn → d X we proceed by contradiction, and suppose that

Xn does not converge to X in distribution. This means that for some
compactly supported continuous function f : R → R, one has that ∫ R f dPXn
does not converge to ∫ R f dPX . Thus, after passing to a subsequence, we
may assume without loss of generality that

lim n→∞ inf ∣∣∣∣ ∫ R f dPXn − ∫ R f

dPX ∣∣∣∣ > 0.
By Theorem 3.39, we may again pass to a subsequence and assume without
loss of generality that Xn → d Y for some real random variable Y . By the
“⇒” part, we obtain φX(t) = limn→∞ φXn (t) = φY (t) for all t ∈ R. By
Corollary 3.34, it follows that PX = PY , but this leads to a contradiction to
what we assumed above. We conclude that Xn → d X must hold in the first
place.

Lemma 3.42. For all x ∈ R and n ≥ 0, we have the inequality

∣∣∣∣e ix − k=0 n∑ (ix)k k! ∣∣∣∣ ≤ min (2|x|n n! , (n |x|n+1 + 1)! ) . 55

Proof. We proceed by induction in n. We note that it suffices to consider the

case x ≥ 0 since we may obtain the claim for x < 0 by complex conjugation of
the involved terms. Let us first proceed with proving that 2|x|n n! is an upper
bound. For n = 0, this is equal to 2, and since eix has norm one, this follows
from the triangle inequality. Now assume that this inequality holds for a given
number n ≥ 0 and all x ≥ 0. We then compute

∣∣∣∣e ix − n+1∑ k=0 (ix)k k! ∣∣∣∣

= ∣∣∣∣ ∫ 0 x eit − k=0 n∑ (it)k k! dt
∣∣∣∣ ≤ ∫ 0 x ∣∣∣∣e it − k=0 n∑ (it)k k!
∣∣∣∣ dt
≤ ∫ 0 x 2tn n! dt = (n 2xn+1 + 1)! .

We can also proceed by induction to show the upper bound (n+1)! |x|n+1 .
Then one can first consider n = −1, in which case both the left and right side
is 1. One can then perform the induction step (n − 1) → n with a completely
analogous calculation as above, namely

∣∣∣∣e ix − k=0 n∑ (ix)k k! ∣∣∣∣

= ∣∣∣∣ ∫ 0 x eit − n−1∑ k=0 (it)k k! dt
∣∣∣∣ ≤ ∫ 0 x ∣∣∣∣e it − n−1∑ k=0 (it)k
k! ∣∣∣∣ dt
≤ ∫ 0 x n! tn dt = (n xn+1 + 1)! .

Lemma 3.43. Let X be a real random variable such that E(X2) = 1 and E(X)
= 0. Then one has the limit formula

t→∞ lim φX(t/√ n)n = e−t2/2 ,

t ∈ R.

Proof. As an application of Lemma 3.42, we obtain for all t ∈ R the estimate

∣∣∣eitX − ( 1 + itX − t2X2 2 )∣∣∣ ≤ min ( t2X2 , t3|X3| 6 ) = t2 min ( X2 , t|X3|
6).

In particular, since X has a second absolute moment we see that the random
variable on the left has a mean. Moreover the expected value of the random
variable X 0 t = min ( X2 , t|X3| 6 ) tends to zero as t → 0 as a consequence
of the Dominated Convergence Theorem. So by integrating and applying the
triangle inequality, we see

∣∣∣φX(t) − 1 + t2 2 ∣∣∣ ≤ E (∣∣∣eitX −

1 − itX + t2X2 2 ∣∣∣ ) ≤ t2E(X 0). t
If we substitute t → t/√ n, this leads to the observation that

xn = n(φX(t/√ n) − ( 1 − 2n t2 )) n→∞ −→ 0.

Using the fact from calculus that one has convergence

ex = n→∞ lim ( 1 + n x )n

uniformly over bounded intervals, we finally conclude that

φX(t/√ n)n

( 1 + (φX(t/√ n) − 1) )n

= ( 1 − t2 2 − n(φX(t/√ n n) − 1 + 2n t2 ) )n

= ( 1 − t2 2 − n xn )n

n→∞ e−t2/2 .

−→
Theorem 3.44 (Central Limit Theorem). Let Xn be an independent se quence
of identically distributed real random variables with mean E(Xn) = 0 and
variance E(X2 n) = 1. Consider the standard normal variable N given by

PN ((a, b]) = 1√ 2π ∫ a b e−t2/2 dt.

Then one has

1√ n k=1 n∑ Xk −→ d N .

Proof. Since the Xn are identically distributed, they all have the same charac
teristic function, which we shall denote by φX . Denote Yn = n−1/2 ∑n k=1
Xk, so that we wish to show Yn → d N . Using that the Xn are independent,
we observe for all t ∈ R that

φYn (t)

= E(e−itYn)

= E(e−i t√ n ∑n k=1 Xk)

= φ∑n k=1 Xk (t/√ n) = n∏ φXk(t/√ n)

k=1

= φX(t/√ n)n

Given that the standard normal variable has the characteristic function φN (t)
= e−t2/2 , the proof is complete by combining Lemma 3.43 and Theo rem
3.41.

Corollary 3.45. Let Xn be an independent sequence of identically distributed

real random variables with a finite mean E(Xn) = 0 and finite variance E(X2
n) = σ2 . Then one has

√
1√ n k=1 n∑ Xk −→ d N (0, σ),

where N (0, σ) is the real random variable given by the distribution P((a, b])
=

σ √ 2π

1 ∫ a b e − 2σ2 t2 dt.

6616 Notes 18 Nov 2015
No ratings yet
6616 Notes 18 Nov 2015
96 pages
CMIGAN12019
No ratings yet
CMIGAN12019
73 pages
Measure Theory
No ratings yet
Measure Theory
149 pages
Sourav's Note
No ratings yet
Sourav's Note
191 pages
Stats 310 Notes
No ratings yet
Stats 310 Notes
251 pages
Measure
No ratings yet
Measure
59 pages
wt2 Bonn PDF
100% (1)
wt2 Bonn PDF
127 pages
Adv Prob Note
No ratings yet
Adv Prob Note
102 pages
Measure Theory Full
No ratings yet
Measure Theory Full
61 pages
Discrete Mathematics by Norman L. Biggs
No ratings yet
Discrete Mathematics by Norman L. Biggs
498 pages
Measure Theory
No ratings yet
Measure Theory
110 pages
Lecture 05 - Measure Theory (Schuller's Lectures On Quantum Theory)
100% (1)
Lecture 05 - Measure Theory (Schuller's Lectures On Quantum Theory)
15 pages
Notes 501
No ratings yet
Notes 501
26 pages
7 Measure Theory
No ratings yet
7 Measure Theory
19 pages
Measure Integration PDF
No ratings yet
Measure Integration PDF
333 pages
Measures Integration Nice
No ratings yet
Measures Integration Nice
36 pages
Orf526 f24 Lec1
No ratings yet
Orf526 f24 Lec1
4 pages
Measure Theory Notes
No ratings yet
Measure Theory Notes
65 pages
Lecture 1
No ratings yet
Lecture 1
11 pages
Measuretheory
No ratings yet
Measuretheory
6 pages
Orf526 f24 Lec2
No ratings yet
Orf526 f24 Lec2
6 pages
Prob 1 Lecture 2
No ratings yet
Prob 1 Lecture 2
7 pages
Mora Eng
No ratings yet
Mora Eng
86 pages
Measure Theory
No ratings yet
Measure Theory
22 pages
(Probability Theory and Stochastic Modelling 103) Zenghu Li - Measure-Valued Branching Markov Processes-Springer-Verlag GMBH (2023)
100% (2)
(Probability Theory and Stochastic Modelling 103) Zenghu Li - Measure-Valued Branching Markov Processes-Springer-Verlag GMBH (2023)
481 pages
Measure and Integration Rev - (Z-Library)
100% (2)
Measure and Integration Rev - (Z-Library)
575 pages
Measure Spaces: 2.1 Families of Sets
No ratings yet
Measure Spaces: 2.1 Families of Sets
11 pages
Measure and Integral: 1 Classes of Sets
No ratings yet
Measure and Integral: 1 Classes of Sets
35 pages
Lecture Notes
No ratings yet
Lecture Notes
72 pages
Lnotes Mathematical Found QM
No ratings yet
Lnotes Mathematical Found QM
19 pages
Measurable Selection
No ratings yet
Measurable Selection
17 pages
Borel Sets PDF
100% (1)
Borel Sets PDF
181 pages
Notes
No ratings yet
Notes
20 pages
Caratheodory
No ratings yet
Caratheodory
4 pages
GAANS-Probability Measures On Metric Spaces
No ratings yet
GAANS-Probability Measures On Metric Spaces
29 pages
Measurenotes 6 Feb 2019
No ratings yet
Measurenotes 6 Feb 2019
41 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
14 pages
Measurable Spaces and Measurable Maps
No ratings yet
Measurable Spaces and Measurable Maps
20 pages
Formal Aspects of Language Modeling
No ratings yet
Formal Aspects of Language Modeling
252 pages
Measure Theory
No ratings yet
Measure Theory
33 pages
2017-01-20-015145 @R Basicmeasuretheory - Skript PDF
No ratings yet
2017-01-20-015145 @R Basicmeasuretheory - Skript PDF
51 pages
Solutions For Exercise Sheet 1
No ratings yet
Solutions For Exercise Sheet 1
3 pages
Chapt 2
No ratings yet
Chapt 2
11 pages
Englne E Ring: Applpcalon
No ratings yet
Englne E Ring: Applpcalon
264 pages
Gleason, Jonathan - Introduction To Analysis
100% (2)
Gleason, Jonathan - Introduction To Analysis
346 pages
Some Note On Probability Theory
No ratings yet
Some Note On Probability Theory
4 pages
Fundamental of Financial Mathematics
No ratings yet
Fundamental of Financial Mathematics
4 pages
Real Analysis and Measure Theory1
No ratings yet
Real Analysis and Measure Theory1
7 pages
Measure and Integration
100% (1)
Measure and Integration
417 pages
Lec1
No ratings yet
Lec1
3 pages
Folland Solution
No ratings yet
Folland Solution
64 pages
Book 1
No ratings yet
Book 1
78 pages
10.john - K.Hunter - Measure Theory 14pp 2007 - TM
No ratings yet
10.john - K.Hunter - Measure Theory 14pp 2007 - TM
14 pages
Probability and Measure Theory
No ratings yet
Probability and Measure Theory
198 pages
The Theory of La Mesure
No ratings yet
The Theory of La Mesure
11 pages
Measure Theory Joe
No ratings yet
Measure Theory Joe
65 pages
LLLLLLLLL
No ratings yet
LLLLLLLLL
62 pages
Solutions To Real Analysis 2nd Edition, G.B.Folland Chapter 1. Measures (Part 2)
No ratings yet
Solutions To Real Analysis 2nd Edition, G.B.Folland Chapter 1. Measures (Part 2)
6 pages
E Real 1 PDF
No ratings yet
E Real 1 PDF
15 pages
Exercises in Analysis
No ratings yet
Exercises in Analysis
78 pages
MTLI
No ratings yet
MTLI
101 pages
Introduction To Modern Analysis, 2nd Edition Kantorovitz All Chapters Instant Download
100% (1)
Introduction To Modern Analysis, 2nd Edition Kantorovitz All Chapters Instant Download
57 pages
EISTI - Departement of Mathematics Q.F.R.M. M1 2014-15 An Introduction To Measure and Integration
No ratings yet
EISTI - Departement of Mathematics Q.F.R.M. M1 2014-15 An Introduction To Measure and Integration
33 pages
Billingsley
No ratings yet
Billingsley
25 pages
Functional Analysis I PDF
100% (2)
Functional Analysis I PDF
286 pages
Abel Math Harvard - Edu PDF
No ratings yet
Abel Math Harvard - Edu PDF
65 pages
Unique Ergodicity: IQI (C., Py
No ratings yet
Unique Ergodicity: IQI (C., Py
20 pages
Probability Density Function: Relative Likelihood That The Value of The Random Variable Would Be Close To That Sample
No ratings yet
Probability Density Function: Relative Likelihood That The Value of The Random Variable Would Be Close To That Sample
8 pages
(Lecture Notes in Mathematics 1730) Siegfried Graf, Harald Luschgy (Auth.) - Foundations of Quantization For Probability Distributions (2000, Springer-Verlag Berlin Heidelberg) PDF
No ratings yet
(Lecture Notes in Mathematics 1730) Siegfried Graf, Harald Luschgy (Auth.) - Foundations of Quantization For Probability Distributions (2000, Springer-Verlag Berlin Heidelberg) PDF
237 pages
1 PDFsam Number Theory Book-1
No ratings yet
1 PDFsam Number Theory Book-1
51 pages
Mtli 2025
No ratings yet
Mtli 2025
7 pages
Lecture 2006
No ratings yet
Lecture 2006
222 pages
Review of Measure Theory (507/420), Winter 2010, Term 1
No ratings yet
Review of Measure Theory (507/420), Winter 2010, Term 1
6 pages
ISEM28 Lecture8
No ratings yet
ISEM28 Lecture8
17 pages
Dedicated To Walter Schachermayer On The Occasion of His 60th Birthday
No ratings yet
Dedicated To Walter Schachermayer On The Occasion of His 60th Birthday
30 pages
An Adventure of Epic Porpoises
No ratings yet
An Adventure of Epic Porpoises
174 pages
Chapter 3. Lebesgue Integral and The Monotone Convergence Theorem
No ratings yet
Chapter 3. Lebesgue Integral and The Monotone Convergence Theorem
21 pages
18.721: Introduction To Algebraic Geometry: Lecturer: Professor Mike Artin Notes By: Andrew Lin Spring 2020
No ratings yet
18.721: Introduction To Algebraic Geometry: Lecturer: Professor Mike Artin Notes By: Andrew Lin Spring 2020
118 pages
221 Analysis 2, 2008-09 Summary of Theorems and Definitions: Measure Theory
No ratings yet
221 Analysis 2, 2008-09 Summary of Theorems and Definitions: Measure Theory
13 pages
Generalized Hilbert Operator Acting On Bergman Spaces
No ratings yet
Generalized Hilbert Operator Acting On Bergman Spaces
20 pages
Problem Sheet 7
No ratings yet
Problem Sheet 7
5 pages
Math 138 Functional Analysis Notes PDF
No ratings yet
Math 138 Functional Analysis Notes PDF
159 pages
Probability and Measure
No ratings yet
Probability and Measure
54 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
ISEM28 Lecture2
No ratings yet
ISEM28 Lecture2
18 pages
MScFE 622 CTSP - Compiled - Video - Transcripts - M1
No ratings yet
MScFE 622 CTSP - Compiled - Video - Transcripts - M1
11 pages
Typeset by AMS-TEX
No ratings yet
Typeset by AMS-TEX
14 pages
Wang - Real Analysis Homework
No ratings yet
Wang - Real Analysis Homework
41 pages
Population Modeling by Differential Equations
No ratings yet
Population Modeling by Differential Equations
31 pages
Reply Suppes To Muller
No ratings yet
Reply Suppes To Muller
12 pages
V1288 PDF
No ratings yet
V1288 PDF
26 pages
1.3. Definitions of Probability: Mathematical (Or Classical or A Priori) Definition of Probability
No ratings yet
1.3. Definitions of Probability: Mathematical (Or Classical or A Priori) Definition of Probability
12 pages
Continue
No ratings yet
Continue
2 pages
10 5486 PMD 2009 4129 Abstract
No ratings yet
10 5486 PMD 2009 4129 Abstract
1 page
Functional Analysis Takehome2
No ratings yet
Functional Analysis Takehome2
3 pages
Activity 01
No ratings yet
Activity 01
2 pages
CH 8
No ratings yet
CH 8
2 pages
Unit 1 Real Analysis
No ratings yet
Unit 1 Real Analysis
6 pages
Tropes in Shoujo Manga
No ratings yet
Tropes in Shoujo Manga
1 page
EC744 Lecture Note 6 Stochastic Models: Mathematical Preliminaries
No ratings yet
EC744 Lecture Note 6 Stochastic Models: Mathematical Preliminaries
18 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Lectures on Measure and Integration
From Everand
Lectures on Measure and Integration
Harold Widom
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.