0% found this document useful (0 votes)
26 views76 pages

Notes GSzabo

Uploaded by

Tan Ch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views76 pages

Notes GSzabo

Uploaded by

Tan Ch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Lecture notes Probability and Measure

Gábor Szabó KU Leuven – G0P63a

Academic Year 2023–2024, Semester 1

Contents
1.1 σ-Algebras and Measurable Spaces

1.2 Measurable functions . . . . . . . . .

1 Lebesgue’s Integration Theory


2

..............2..............4

1.3 Measures on σ-algebras, Measure Spaces . . . . . . . . . . . .

1.4 Integration theory . . . . . . . . . . . . . . . . . . . . . . . . .

69

1.5 Convergence Theorems . . . . . . . . . . . . . . . . . . . . . . 19

2.3 Application: The Lebesgue measure on R

2.4 Application: Lebesgue–Stieltjes Measures

2.5 Product measures . . . . . . . . . . . . .

2.6 Infinite products of probability measures


2 Carathéodory’s Construction of Measures
23

2.1 Measures on Semi-Rings and Rings . . . . . . . . . . . . . . . 23

2.2 Outer measures . . . . . . . . . . . . . . . . . . . . . . . . . . 26

. . . . . . . . . . . . 30 . . . . . . . . . . . . 33 . . . . . . . . . . . . 35 . . . . . . . . . . . . 38

3.2 Independence . . . . . .

3.3 Law of Large Numbers .

3.4 Central Limit Theorem .

..

..

..

..

..

..

..

..

..

..

..

..
..

..

..

..

..

..

..

..

3 Probability
3.1 Probability spaces and random variables

40

..

. . . . . . . . . . . . 40 . . . . . . . . . . . . . 43

. . . . . . . . . . . . . 46 . .

. . . . . . . . . . . . . . 50 . .

1 Lebesgue’s Integration Theory


1.1 σ-Algebras and Measurable Spaces
Definition 1.1. Let Ω be a non-empty set. A σ-algebra M on Ω is a set of
subsets of Ω satisfying
A1. ∅ ∈ M. A2. If E ∈ M, then Ec ∈ M. A3. For any sequence En ∈ M, one
has ⋃ n∈N En ∈ M. We will refer to the pair (Ω,M) as a measurable space.
Remark 1.2. A σ-algebra M is closed with respect to all countable oper
ations on sets one can perform using complement, union, intersection, and
difference. In fact, any intersection and difference can be rewritten in terms
of unions and complements, namely

E ∩ F = (Ec ∪ F c)c ,

E \ F = (Ec ∪ F )c .

Example 1.3. The entire power set 2Ω of Ω is the largest possible σ-algebra
on Ω, whereas {∅, Ω} is the smallest one. We may also consider

M = {E ⊆ Ω | E or Ec is countable} ,

which sits strictly in between if Ω is an uncountably infinite set.

Proposition 1.4. If {Mi}i∈I is an arbitrary family of σ-algebras on a set Ω,


then the intersection ⋂ i∈I Mi is a σ-algebra. Consequently, if S ⊆ 2Ω is

a family of subsets of Ω, then there exists a smallest σ-algebra M containing


S. We will refer to it as the σ-algebra generated by S.

Proof. It is a trivial exercise to prove the first part of the statement. For the
second part, if S is given, consider the family { M ⊆ 2Ω | M is a σ-algebra
with S ⊆ M } . Since this family contains 2Ω , it is non-empty, and hence the
intersection of this family is the smallest possible σ-algebra that contains S.

Example 1.5. Recall that a topology T on a set Ω is a family of subsets of Ω


satisfying

T1. ∅, Ω ∈ T .

T2. For any family of sets O ⊆ T , one has ⋃ O ∈ T . T3. For O1, O2 ∈ T ,
one has O1 ∩ O2 ∈ T .

If we are given a topological space (Ω, T ), then the σ-algebra generated by


T is called the Borel-σ-algebra of (Ω, T ). Its elements are called the Borel
subsets of Ω.

Remark 1.6. Given a topology T as above, a base B ⊆ T is a subset with the


property that every set O ∈ T can be expressed as a union of a family of sets
in B. If T has a countable base B, then every σ-algebra M containing B will
necessarily also contain T . It follows that the Borel-σ-algebra generated by
T coincides with the σ-algebra generated by B.

Proposition 1.7. Let (Ω, d) be a separable metric space. countable base


consisting of open balls, i.e., sets of the form

B(x, r) = {y ∈ Ω | d(x, y) < r}

Then Ω has a

for x ∈ Ω and r > 0. In particular, the Borel-σ-algebra on Ω coincides with


the σ-algebra generated by all open balls.

Proof. In light of the previous remark, the second part of the statement
follows automatically if we can show that the metric topology on Ω has a
countable base consisting of open balls. As we assumed Ω to be separable,
choose a countable dense set D ⊆ Ω, and consider

B = {B(x, r) | x ∈ D, 0 < r ∈ Q} .

This is a countable family of open balls, and we claim that it is a base for the
metric topology. Indeed, let O ⊆ Ω be an open set. For x ∈ D ∩ O, set Rx =
{0 < r ∈ Q | B(x, r) ⊆ O}. Then evidently

⋃ {B(x, r) | x ∈ D ∩ O, r ∈ Rx} ⊆ O.

We claim that this is an equality: Given y ∈ O, it follows from the definition


of openness that there exists δ > 0 with B(y, δ) ⊆ O. By making δ smaller, if
necessary, we may assume δ ∈ Q. Since D is a dense subset of Ω, there exists
x ∈ D with d(x, y) < δ/2. By the triangle inequality, we have y ∈ B(x, δ/2) ⊆
B(y, δ) ⊆ O. In summary, we have shown that O is the countable union of sets
in B, which finishes the proof.
Example 1.8. Let us consider Ω = R with the standard topology. Then half-
open intervals are Borel because, for example, for a, b ∈ R with a < b, one
has [a, b) = ⋂ (a − n 1 , b).

n∈N

Singleton sets are also Borel since they are closed. To summarize, there are
in general many more Borel sets than open sets. Note that in light of the
previous remark, the Borel-σ-algebra on R is generated by all open intervals.

1.2 Measurable functions


Definition 1.9. Let (Ω1,M1) and (Ω2,M2) be two measurable spaces. A map
f : Ω1 → Ω2 is called measurable, if for every subset E ⊆ Ω2, E ∈ M2
implies f−1(E) ∈ M1.

Remark 1.10. The composition of two measurable maps is always measur


able.

Proposition 1.11. Let (Ω1,M1) and (Ω2,M2) be two measurable spaces and f
: Ω1 → Ω2 a map. Suppose that M2 is the σ-algebra generated by a set S ⊆
2Ω2 .

Then f is measurable if and only if for all O ∈ S, one has

f−1(O) ∈ M1.

Proof. The “only if” part is trivial, so let us consider the “if part”. Consider
the set

M = { O ⊆ Ω2 | f−1(O) ∈ M1 } .

By assumption, we have S ⊆ M. The claim amounts to showing that M2 ⊆ M,


and by assumption on S, it therefore suffices to show that M is a σ algebra.
But this will be part of the exercise sessions.
Proposition 1.12. Let (Ω,M) be a measurable space and (Y, T ) a topological
space. For a map f : Ω → Y , the following are equivalent:

i. f is measurable with respect to the Borel-σ-algebra on Y .


ii. For all O ∈ T , one has f−1(O) ∈ M.

If furthermore there exists a countable base B ⊆ T , then this is equivalent to

(iii) For all O ∈ B, one has f−1(O) ∈ M.

Proof. By definition of the Borel-σ-algebra as the one generated by T , the


equivalence (i)⇔(ii) becomes a special case of Proposition 1.11. If we
further more assume that B is a countable base of T , then the equivalence
(i)⇔(iii) is also a special case of Proposition 1.11 in light of Remark 1.6.

Theorem 1.13. Let (Ω,M) be a measurable space and (Y, d) a metric space.
We equip Y with the Borel-σ-algebra associated to the metric topology. Sup
pose that a sequence of measurable functions fn : Ω → Y converges to a map
f : Ω → Y pointwise. Then f is measurable.

Proof. As a consequence of Proposition 1.12, it suffices to show that the


preimages of open sets under f belong to M. Since preimages respect com
plements of sets, it suffices to show that the preimage of every closed set C
⊆ Y belongs to M. Since we have a metric space, we have

C = C = ⋂ { y ∈ Y ∣∣∣∣ x∈C inf d(x, y) < k 1 }

k∈N ︸

︷︷

=:Ck

Then each of the sets Ck is open. For every x ∈ Ω, we use fn(x) → f(x) and
observe

x ∈ f−1(C)
⇔ f(x) ∈ C ⇔ ∀ k ∈ N : f(x) ∈ Ck

fn(x)→f(x) ⇔

∀ k ∈ N : ∃ n0 ∈ N : ∀ n ≥ n0 : fn(x) ∈ Ck ⇔ x ∈ ⋂ ⋃ ⋂ f−1 n (Ck).

k∈N n0∈N n≥n0

In summary, the preimage f−1(C) can be realized as a countable intersection


of countable unions of countable intersections of sets in M, and hence also
belongs to M.

Notation 1.14. We will equip [0, ∞] := [0, ∞) ∪ {∞} with the topology of the
one point compactification, that is, we define a subset O ⊆ [0, ∞] to be open
when the following holds:

if 0 ∈ O, then there exists ε > 0 with [0, ε) ⊆ O.


for every x ∈ O ∩ (0, ∞), there exists ε > 0 with (x − ε, x + ε) ⊆ O.
if ∞ ∈ O, then there is b ≥ 0 with (b, ∞] ⊆ O.

This topology is induced by a metric, for example by

d(x, y) = ∣∣∣∣ 1 + 1 x − 1 + 1 y ∣∣∣∣, where we follow the convention 1 := 0.


We extend the usual addition and

multiplication from [0, ∞) to [0, ∞] by defining1

x + ∞ := ∞,

x · ∞ := ∞ (x > 0),

0 · ∞ := 0.

Then the addition map + : [0, ∞] × [0, ∞] → [0, ∞] is continuous, but this is
not true for the multiplication map. 2 We also extend the usual order relation
“≤” of numbers to [0, ∞] in the obvious way.
1.3 Measures on σ-algebras, Measure Spaces
Definition 1.15. Let (Ω,M) be a measurable space. A (positive) measure µ
on (Ω,M) is a map M → [0, ∞] satisfying:

M1. µ(∅) = 0. M2. µ is σ-additive, i.e., for every sequence En ∈ M


consisting of pairwise

disjoint sets, one has

µ ( n∈N ⋃ En ) = n∈N ∑ µ(En).

The triple (Ω,M, µ) is called a measure space. If µ(Ω) < ∞, we call µ a finite
measure, and if more specifically µ(Ω) = 1, we call it a probability measure
and the triple (Ω,M, µ) a probability space. If there exists a sequence En ∈ M
with µ(En) < ∞ and Ω = ⋃ n∈N En, then we say that µ is σ-finite.

Remark 1.16. Note that

the series ∑ n∈N µ(En) is a series in [0, ∞], which we may define as
the supremum supN≥1 ∑N n=1 µ(En).
σ-additivity implies finite additivity: If E1, E2 ∈ M are two disjoint
sets, then

µ(E1 ∪ E2)

= µ(E1 ∪ E2 ∪ ∅ ∪ ∅ ∪ . . . ) = µ(E1) + µ(E2) + µ(∅) + µ(∅) + . . .

︷︷

=0

= µ(E1) + µ(E2).
1Keep in mind that we implicitly force everything to be commutative, so the
order of addition and multiplication does not matter here by default.

2Convince yourself why this is not the case!

• If there is at least some E ∈ M with µ(E) < ∞, then σ-additivity already


implies µ(∅) = 0:

µ(E) = µ(E ∪ ∅ ∪ ∅ ∪ . . . )

= µ(E) + ∞ · µ(∅).

If µ(E) < ∞, then the above can only happen when µ(∅) = 0.

Example 1.17 (Counting measure). For any non-empty set Ω, we can con
sider the σ-algebra 2Ω and define a measure µ via

µ(E) = #E ∞

, E is finite , E is infinite.

Example 1.18 (Dirac measure). If (Ω,M) is any measurable space with a


distinguished point x ∈ Ω, one can define the measure δx via

δx(E) = 01

, x ∈ E , x /∈ E.

Example 1.19. Consider R with its Borel-σ-algebra B. We will later con


struct the Lebesgue measure on B, which is a unique measure µ : B → [0, ∞]
with the property that µ([a, b]) = b − a for all a, b ∈ R with a ≤ b. An anal
ogous unique measure exists on Rn as well.

Proposition 1.20. Let (Ω,M, µ) be a measure space. Then µ has the fol
lowing extra properties:

i. µ is monotone, i.e., for E, F ∈ M with E ⊆ F one has µ(E) ≤ µ(F ). If


µ(F ) < ∞, then µ(E) = µ(F ) − µ(F \ E).
ii. µ is subadditive, i.e., for E, F ∈ M one has µ(E ∪ F ) ≤ µ(E) + µ(F ).
iii. µ is σ-subadditive, i.e., for a sequence En ∈ M one has µ(⋃ n∈N En) ≤
∑ n∈N µ(En).
iv. If En ∈ M is an increasing sequence (w.r.t. “⊆”), then µ(⋃ n∈N En) =
supn∈N µ(En).
v. If En ∈ M is a decreasing sequence and µ(E1) < ∞, then µ(⋂ n∈N En) =
infn∈N µ(En).

Proof. (i): If E ⊆ F , then we may write F = E ∪ (F \ E) as a disjoint union,


so it follows from additivity that µ(F ) = µ(E) + µ(F \ E) ≥ µ(E). If µ(F ) < ∞,
we also obtain the last part of the claim by substracting µ(F \E).

(ii): One has E ∪ F = (E \ F ) ∪ F , which is a disjoint union. Thus µ(E ∪ F )


= µ(E \ F ) + µ(F ) ≤ µ(E) + µ(F ).

(iii)+(iv): We construct a sequence Fn ∈ M as follows. We set F1 = E1 and


Fn = En\(⋃ k<n Ek) for n ≥ 2. Then the sequence Fn consists of pairwise
disjoint sets, but at the same time one has that ⋃ k≤n Ek = ⋃ k≤n Fk for all n
≥ 1. Hence µ(⋃ n∈N En) = µ(⋃ n∈N Fn) = ∑ n∈N µ(Fn) ≤ ∑ n∈N µ(En),
which proves (iii). For (iv), additionally assume that En was increasing.
Then

µ ( n∈N ⋃ En )

= ∑ µ(Fn)

n∈N

= sup ∑ µ(Fk)

n∈N k≤n

= sup n∈N µ ( k≤n ⋃ Fk ) = sup µ(En).

n∈N

(v): Consider Fn = E1 \ En for all n ≥ 1. As En was assumed to be


decreasing, the sets Fn will be increasing, and moreover ⋃ n∈N Fn = E1 \
( ⋂ n∈N En ) .
By using (i) and (iv), we see

µ(E1) − n∈N inf µ(En)

= sup µ(Fn)

n∈N

= µ ( n∈N ⋃ Fn )
= µ(E1) − µ ( n∈N ⋂ En )

This implies the claim. Remark 1.21. In condition (v) above, it is really
necessary to assume that at least one of the sets En has finite measure. For
example, we may consider Ω = N with the counting measure µ, and the sets
En = {k ∈ N | k ≥ n}. Then µ(En) = ∞ for all n, the sequence is decreasing,
and ⋂ n∈N En = ∅.

We omit the proof of the following statement, which is a very simple


exercise:

Proposition 1.22. Let (Ω,M) be a measurable space with two measures µ1,
µ2. For any numbers α1, α2 ≥ 0, the map

α1µ1 + α2µ2 : M → [0, ∞],

is a measure.

E → α1µ1(E) + α2µ2(E)

Definition 1.23. Let (Ω,M, µ) be a measure space. A subset N ⊆ Ω is called


a null set if there is some E ∈ M with N ⊆ E and µ(E) = 0.
If P (x) is a statement about elements x ∈ Ω that can either be true or false,
we say that P holds almost everywhere (w.r.t. µ) if the set of elements x ∈ Ω
for which P (x) is false is a null set.

Notation 1.24. For brevity, we may also say “P holds a.e.” (a.e.=almost
everywhere) or “P (x) holds for (µ-)almost all x”.

1.4 Integration theory


Notation 1.25. For a set Ω with a subset E ⊆ Ω, its indicator function (or
characteristic function) is defined via

χE : Ω → {0, 1} ,

χE(x) = 01

, x ∈ E , x /∈ E.

Definition 1.26. Let Ω be a set. A simple function (or step function) on Ω is a


function with finite range. In particular, a simple function s : Ω → Y with Y
∈ {[0, ∞), [0, ∞],R,C} can always be written as

s=

∑ λ · χs−1(λ).

λ∈s(Ω)

We will refer to this as the canonical form for s.

Proposition 1.27. Let (Ω,M) be a measurable space. Let s : Ω → Y be a


simple function, and assume that Y is equipped with a σ-algebra that contains
all singleton sets. (In particular this is the case for Y ∈ {[0, ∞), [0, ∞],R,C}.)
Then s is measurable if and only if s−1(λ) ∈ M for all λ ∈ s(Ω).

Proof. The “only if” part is clear. Since we assumed the σ-algebra on Y to
contain all singleton sets, it follows in particular that s−1(y) ∈ M for all y ∈
Y.
For the “if” part, let A be a measurable subset of Y . Then s−1(A) = ⋃
λ∈s(Ω) s−1(A ∩ {λ}). By assumption this is a finite union of sets of the form
s−1(λ) for λ ∈ s(Ω), and hence if these are measurable, then so is s−1(A).

Definition 1.28. Let (Ω,M, µ) be a measure space. Let s : Ω → [0, ∞) be a


measurable simple function with canonical form s = ∑ λ∈s(Ω) λ · χs−1(λ).
We define the integral of s over Ω with respect to µ as

∫ Ω s dµ := λ∈s(Ω) ∑ λ · µ(s−1(λ)).

(Keep in mind the convention 0 · ∞ = 0.)

Lemma 1.29. Let (Ω,M, µ) be a measure space. Given k ≥ 0, numbers α1, . . .


, αk ≥ 0, and sets A1, . . . , Ak ∈ M, we consider the positive measurable step
function s = ∑k j=1 αj · χAj .

i. Suppose that A1, . . . , Ak are pairwise disjoint. Then ∫ Ω s dµ = j=1 k∑


αjµ(Aj).
ii. If t : Ω → [0, ∞) is another positive measurable step function, then ∫ Ω
(s + t) dµ = ∫ Ω s dµ + ∫ Ω t dµ.
iii. One has ∫ Ω s dµ = j=1 k∑ αjµ(Aj).

Proof. (i): If the union ⋃k j=1 Aj is not Ω, then we may set Ak+1 = Ω\(⋃k
j=1 Aj) and αk+1 = 0. Then we still have s = ∑k+1 j=1 αj ·χAj and the sets
A1, . . . , Ak+1 are pairwise disjoint. Clearly our claim boils down to the
same claim for this expression of s, so we may simply assume without loss of
generality that we had Ω = ⊔k j=1 Aj to begin with.

Due to this assumption, we can notice that every value λ ∈ s(Ω) has to equal
one of the coefficients αj, and conversely every such coefficient with Aj = ∅
is in s(Ω). Upon grouping all such indices together, we can observe

s−1(λ) = ⊔{Aj | 1 ≤ j ≤ k, αj = λ},

λ ∈ s(Ω).

Using the additivity of the measure µ, we thus obtain


∫ Ω s dµ = λ∈s(Ω) ∑ λµ(s−1(λ)) = λ∈s(Ω) ∑ λ · k∑ j=1 =λ µ(Aj) = j=1 k∑
αjµ(Aj).

αj

(ii): We keep the earlier assumption that Ω = ⊔k j=1 Aj. We can write t = ∑ℓ
i=1 βiχBi for finitely many coefficients β1, . . . , βℓ ≥ 0 and sets B1, . . . , Bℓ
∈ M with Ω = ⊔ℓ i=1 Bi, for instance via the canonical form of t. Then we
can

write

s = k∑ αj · ℓ∑ χAj∩Bi ,

j=1

i=1

This gives rise to an expression

t = ℓ∑ βi · k∑ χAj∩Bi .

i=1 j=1

s + t = k∑ ℓ∑ (αj + βi)χAj∩Bi ,

j=1 i=1

where Ω = ⊔ℓ i=1 ⊔k j=1(Aj ∩ Bi). Using part (i) and the additivity of µ,
we obtain

∫ Ω (s + t) dµ

k∑ ℓ∑ (αj + βi)µ(Aj ∩ Bi) j=1 i=1

= k∑ αj ℓ∑ µ(Aj ∩ Bi) + ℓ∑ βi k∑ µ(Aj ∩ Bi)


j=1 i=1

i=1 j=1

= k∑ αjµ(Aj) + ℓ∑ βiµ(Bi)

j=1

i=1

= ∫ Ω s dµ + ∫ Ω t dµ.

(iii): We keep in mind that every summand αjχAj can itself be understood as
a positive measurable step function whose integral is equal to αjµ(Aj) by
part (i). Since we have already proved in part (ii) that the integral
construction on such functions is additive, this yields the desired sum
expression.

Proposition 1.30. Let (Ω,M, µ) be a measure space. Then the assignment s →


∫ Ω s dµ, which assigns to every measurable step function Ω → [0, ∞) its
integral, satisfies the following properties:

i. ∫ Ω (c · s) dµ = c · ∫ Ω s dµ for all constants c ≥ 0.


ii. If s ≤ t, then ∫ Ω s dµ ≤ ∫ Ω t dµ.

Proof. We note that (i) is an immediate consequence of Lemma 1.29.

For (ii), suppose that s and t are given as

s = k∑ αjχAj ,

j=1

t = ℓ∑ βiχBi

i=1

with Ω = ⊔k j=1 Aj = ⊔ℓ i=1 Bi, for example via their canonical forms.
Then we can also write
s = k∑ αj · ℓ∑ χAj∩Bi ,

j=1

t = ℓ∑ βi · k∑ χAj∩Bi .

i=1

i=1 j=1

Now clearly s ≤ t implies that αj ≤ βi whenever Aj ∩Bi = ∅. Hence it


follows from Lemma 1.29 applied to the above equalities that ∫ Ω s dµ ≤ ∫ Ω
t dµ.

Proposition 1.31. Let (Ω,M, µ) be a measure space and s : Ω → [0, ∞) a


measurable simple function. Then the map ν : M → [0, ∞], E → ∫ Ω sχE dµ
defines a measure.

Proof. Since positive linear combinations of measures are again measures, it


suffices by Lemma 1.29 and Proposition 1.30 to consider the case where s is
of the form s = χA for A ∈ M. In this case ν(E) = ∫ Ω χAχE dµ = ∫ Ω χA∩E
dµ = µ(A ∩ E). Indeed it is clear that ν(∅) = 0. For σ-additivity, let En ∈ M
be a sequence of pairwise disjoint sets, and observe

ν ( n∈N ⋃ En )

µ ( A ∩ n∈N ⋃ En )

= µ ( n∈N ⋃ (A ∩ En) )

= ∑ µ(A ∩ En)

n∈N

= ∑ ν(En).

n∈N
Definition 1.32. Let (Ω,M, µ) be a measure space. For a measurable func tion
f : Ω → [0, ∞], we define its integral as ∫ Ω f dµ = sup {∫ Ω s dµ ∣∣∣∣ s is a
positive measurable step function with s ≤ f } . We call f integrable when ∫ Ω
f dµ < ∞.

Remark 1.33. We should first convince ourselves that the new definition of
the integral does not contradict the old one in the case where f is assumed to
be a step function. But indeed, s = f is the largest step function with

the property that s ≤ f , so by Proposition 1.30(ii)

indeed attained at the value ∫ Ω s dµ in the sense of

the above supremum is

Definition 1.28.

Secondly, we remark that the above definition a priori makes sense even
when f is not assumed to be measurable. However, we will see in the
exercises that the resulting notion of integral will have some undesirable
properties when evaluated on non-measurable functions, for example not
being additive.

Example 1.34. If δx is the Dirac measure associated to a point x ∈ Ω in a


measurable space, then ∫ Ω f dδx = f(x).

If µ is the counting measure on an infinite set Ω, then ∫ Ω f dµ = ∑ x∈Ω f(x).3


If µ is the (yet to be constructed) Lebesgue measure and f : [a, b] → [0, ∞) is
a continuous function, then f is measurable and integrable, and its integral in
the sense of Definition 1.32 will coincide with its Riemann integral.

3Note that this sum might be over an uncountable index set!

Proposition 1.35. Let (Ω,M) be a measureable space and f : Ω → [0, ∞] a

Then f is measurable if and only if there exists a (point positive function.


wise) increasing sequence of positive measurable step functions sn : Ω →
[0, ∞) such that f = supn∈N sn = limn→∞ sn. 4

Proof. We already know that the “if” part is true. For the “only if” part,
assume that f is measurable. We claim that it suffices to consider the special
case f = id as a function [0, ∞] → [0, ∞]. Indeed, if we can realize id =
supn∈N tn for an increasing sequence of positive measurable step functions
tn, then f = id ◦f = n→∞ lim tn ◦ f = sup n∈N (tn ◦ f), where sn = tn ◦ f is an
increasing sequence of positive measurable step functions. But we can come
up with the following sequence tn, which is easily seen to do the trick:

tn = nχ[n,∞] + n2n∑ k 2n − 1 χ[ k−1 2n , 2n k )

k=1

Proposition 1.36. Let (Ω,M, µ) be a measure space and f : Ω → [0, ∞] a


positive measurable function. Suppose that sn is an increasing sequence of
positive measurable step functions such that f = supn∈N sn. Then ∫ Ω f dµ =

supn∈N ∫ Ω sn dµ.

Proof. Since we have by assumption sn ≤ f for all n, it follows from the


definition of the integral that supn∈N ∫ Ω sn dµ ≤ ∫ Ω f dµ.

For the reverse inequality, it suffices to prove that c · ∫ Ω s dµ ≤ supn∈N ∫ Ω


sn dµ for every positive measurable step function s ≤ f and all constants 0 < c
< 1. For a fixed such function s and constant c, we consider the sequence of
sets

En = {x ∈ Ω | sn(x) ≥ cs(x)} . Since sn and s are measurable, it follows that


En ∈ M. Since sn converges to f pointwise and s ≤ f , we can see that Ω = ⋃
n∈N En. As in Proposition 1.31, we define a new measure ν via ν(E) = ∫ Ω
sχE dµ. Then it follows from

4In exactly one instance later, we will use an extra property that we arrange
in our proof. Namely, we can find (sn)n with |f(x) − sn(x)| ≤ 2−n for all x ∈ Ω
with f(x) ≤ n, and sn(x) = n whenever f(x) ≥ n.
Proposition 1.20(iv) that

c ∫ Ω s dµ cν(Ω)

= c · supn∈N ν(En) = sup n∈N ∫ Ω csχEn dµ ≤ sup n∈N ∫ Ω snχEn dµ ≤ sup


n∈N ∫ Ω sn dµ.

Proposition 1.37. Let (Ω,M, µ) be a measure space. Let f, g : Ω → [0, ∞]


be measurable functions. Then

Proof. Both

i. ∫ Ω (c · f) dµ = c · ∫ Ω f dµ for all constants c ≥ 0.


ii. ∫ Ω (f + g) dµ = ∫ Ω f dµ + ∫ Ω g dµ.
iii. If f ≤ g, then ∫ Ω f dµ ≤ ∫ Ω g dµ.
i. and (iii) are trivial consequences of the definition of the inte-

gral. For (ii),

take sequences sn and tn of positive measurable step functions

that increase pointwise to f and g, respectively. Then sn + tn increases


pointwise to f + g. Hence it follows from Proposition 1.36 that

∫ Ω f + g dµ

sup n∈N ∫ Ω sn + tn dµ

= (sup ∫ Ω sn dµ) + (sup ∫ Ω tn dµ) = ∫ Ω f dµ + ∫ Ω g dµ.

Here we have used that these suprema are in fact limits.


Proposition 1.38. Let (Ω,M, µ) be a measure space and f : Ω → [0, ∞] Then
one has ∫ Ω f dµ = 0 if and only if f(x) = 0 a measurable function. for µ-
almost all x ∈ Ω. Furthermore, if f is integrable, then f(x) < ∞ for µ-almost
all x ∈ Ω.

Proof. Suppose ∫ Ω f dµ = 0. Consider the measurable set En = f−1(( n 1 ,


∞]), and observe that by definition n 1 χEn ≤ f . Hence n 1 µ(En) ≤ ∫ Ω f dµ =
0,

so µ(En) = 0 for all n. The union of En is equal to E = f−1((0, ∞]). 14

By continuity of measures, we have µ(E) = supn∈N µ(En) = 0, so indeed f(x)


= 0 for µ-almost all x ∈ Ω.

For the converse, assume that f(x) = 0 for µ-almost all x ∈ Ω, which means
that the set E above is a null set. If s : Ω → [0, ∞) is a simple measurable
function with s ≤ f , then it means in particular that s−1(λ) ⊆ E for all λ = 0.
This immediately implies ∫ Ω s dµ = 0, but hence also ∫ Ω f dµ = 0 by
definition.

Now assume that f is integrable. Consider E = f−1(∞) ∈ M and sn = nχE.


Then sn ≤ f for all n ∈ N, and hence nµ(E) = ∫ Ω sn dµ ≤ ∫ Ω f dµ. Since this
holds for all n ≥ 1 and we assume the right side to be finite, this is only
possible for µ(E) = 0. But this is what it means that f(x) < ∞ for µ-almost all
x ∈ Ω. This finishes the proof.

Remark 1.39. Let Ω be a set and f : Ω → C a function. Then it is always


possible to decompose f into f = u + i · v for real-valued functions u, v : Ω
→ R, the real and imaginary parts of f . By defining u± = max(0, ±u) and v±
= max(0, ±v), we can always uniquely write

u = u+ − u− , v = v+ − v− , u+u− = 0, v+v− = 0,

where u+ , u− , v+ , v− take values in [0, ∞). We then have u± , v± ≤ |f |, but


also |f | ≤ u+ + u− + v+ + v− by the triangle inequality.
Furthermore, if Ω carries a σ-algebra for which f becomes measurable, then
the functions u+ , u− , v+ , v− are also measurable.

In what follows we wish to exploit such decompositions to extend the


integral construction to the case of complex valued functions.

Definition 1.40. Let (Ω,M, µ) be a measure space. We say that a mea surable
function f : Ω → C is integrable, if |f | is integrable in the sense of Definition
1.32. In this case we define the integral as

∫ Ω f dµ = ∫ Ω u+ dµ − ∫ Ω u− dµ + i ( ∫ Ω v+ dµ − ∫ Ω v− dµ ) ,

where we use the decomposition from Remark 1.39. The set of all such
functions is denoted L1(Ω,M, µ) or L1(µ).

Theorem 1.41. L1(Ω,M, µ) is a complex vector space with the usual oper
ations. Moreover the integral

L1(Ω,M, µ) → C,

is a linear map.

f → ∫ Ω f dµ

Proof. The fact that L1 is a vector space is left as an exercise. Let us proceed
to prove linearity of the integral.

Let us first assume that f, g ∈ L1 are real-valued. Set h = f + g and observe


that hence h+ −h− = f+ −f−+g+ −g− , or equivalently h++f−+g− = h− + f+ +
g+ . All of these summands are positive integrable functions, so it follows by
Proposition 1.37 that

∫ Ω h+ dµ + ∫ Ω f− dµ + ∫ Ω g− dµ = ∫ Ω h− dµ + ∫ Ω f+ dµ + ∫ Ω g+ dµ.

All of these summands are finite, and hence we can rearrange this equation to

∫ Ω h+ dµ − ∫ Ω h− dµ = ∫ Ω f+ dµ − ∫ Ω f− dµ + ∫ Ω g+ dµ − ∫ Ω g− dµ,

∫ ∫ ∫
and hence ∫ Ω h dµ = ∫ Ω f dµ + ∫ Ω g dµ. It is also clear from the definition
that ∫ Ω f + ig dµ = ∫ Ω f dµ + i ∫ Ω g dµ. These two equations imply together
that the integral is indeed additive.

Consider a scalar α > 0. Then (αf)± = α · f± , so from Proposition 1.37 it


follows that∫ Ω αf dµ = ∫ Ω αf+ dµ − ∫ Ω αf− dµ = α ∫ Ω f dµ.

If α < 0, then (αf)± = (−α)f∓ , so the similar calculation shows

∫ Ω αf dµ = ∫ Ω (−αf−) dµ − ∫ Ω (−αf+) dµ = α ∫ Ω f dµ.

If more generally α = α1 + iα2 ∈ C for α1, α2 ∈ R, then we use the above to


calculate∫ Ω α(f + ig) dµ

= ∫ ∫ Ω Ω α1f α1f − − α2g α2g + dµ i(α1g + i ∫ Ω + α1g α2f) + dµ α2f dµ

= α1 ∫ Ω f dµ − α2 ∫ Ω g dµ + i ( α1 ∫ Ω g dµ + α2 ∫ Ω f dµ ) = α ( ∫ Ω f dµ + i
∫ Ω g dµ ) = α ∫ Ω f + ig dµ.

This finishes the proof.

Proposition 1.42. For all f ∈ L1(Ω,M, µ), one has

∣∣∣∣ ∫ Ω f dµ ∣∣∣∣ ≤ ∫ Ω |f | dµ.


Proof. By considering the polar decomposition of ∫ Ω f dµ and multiplying f
with a scalar of modulus one, we may assume without loss of generality that ∫
Ω f dµ ≥ 0. Write f = u + iv for real-valued functions u, v. Then

0 ≤ ∫ Ω f dµ = ∫ Ω u dµ ≤ ∫ Ω u+ dµ u+≤|f ≤ | ∫ Ω |f | dµ.
Proposition 1.43. Let (Ω,M, µ) be a measure space and f : Ω → C a
measurable function.

i. If f = 0 a.e., then f is integrable and ∫ Ω f dµ = 0.


ii. If f is integrable and g : Ω → C is another measurable function such that
f = g a.e., then g is integrable and ∫ Ω f dµ = ∫ Ω g dµ.
iii. If f is integrable and ∫ Ω f · χE dµ = 0 for all E ∈ M, then f = 0 a.e.

Proof. (i): By the previous proposition, it suffices to show ∫ Ω |f | dµ = 0, but


this is a consequence of Proposition 1.38.

(ii): The assumption means that g −f = 0 a.e., so g −f is integrable and has


vanishing integral. Hence g = f + (g − f) is integrable, and according to
additivity of the integral it has the same integral as f .

(iii): Consider E = {x ∈ Ω | Ref(x) ≥ 0}. Decompose f = u+ − u− + i(v+ −


v−) as before, and observe 0 = Re ( ∫ Ω fχE dµ ) = ∫ Ω Re(fχE) dµ = ∫ Ω u+
dµ. By Proposition 1.38 it follows that u+ = 0 a.e.. Repeating this argument
with different choices for E, one can also see that u− , v+ , v− vanish almost
everywhere. But this clearly implies that f = 0 a.e..

Proposition 1.44. Let (Ω,M, µ) be a σ-finite measure space and f : Ω → C an


integrable function. Let F ⊆ C be a closed set with the property that for all E
∈ M with 0 < µ(E) < ∞, one has

µ(E) 1 ∫ Ω fχE dµ ∈ F.

Then f(x) ∈ F for µ-almost all x ∈ Ω.

Proof. As µ is σ-finite, we can write Ω = ⋃ En as an increasing union of sets


En ∈ M with µ(En) < ∞. If the set of all x ∈ En with f(x) /∈ F is a null set for
all n ≥ 1, then by σ-subadditivity the same is true for the set of all x ∈ Ω with
f(x) /∈ F . So by restricting the problem to each En in place of Ω, we may
assume without loss of generality that µ is a finite measure.

If F = C, there is nothing to prove, so let us assume that F has non empty


complement. Let B ⊂ C \ F be a closed ball around some point z with radius
r > 0. Set g = fχf−1(B) and write g = zχf−1(B) + g0, where |g0| ≤ rχf−1(B).
Then either µ(f−1(B)) = 0, or

∣∣∣∣z − µ(f−1(B)) 1 ∫ Ω g dµ ∣∣∣∣ = ∣∣∣∣


µ(f−1(B)) 1 ∫ Ω g0 dµ ∣∣∣∣ ≤ r.
The latter would imply that the number µ(f−1(B)) 1 ∫ Ω g dµ is in B, which
Hence we conclude µ(f−1(B)) = 0. is disjoint from F , a contradiction.

Since we can write C \ F as a countable union of closed balls, this implies


µ(f−1(C \ F )) = 0, which confirms the claim.

Proposition 1.45. Let (Ω,M, µ) be a measure space.


Consider
N = {f : Ω → C | f is measurable and f(x) = 0 for µ-almost all x} .

Then N is a linear subspace of L1(Ω,M, µ), and f ∈ N holds if and only if f is


measurable and |f | ∈ N .

Proof. From Proposition 1.43 we get that indeed N ⊆ L1(Ω,M, µ), and the
second equivalence of the statement is clear.

Let f, g ∈ N be given. First of all, it is clear that α · f ∈ N for all α ∈ C. So


we shall show f + g ∈ N . We do have that this function is measurable, so it
remains to show that it vanishes almost everywhere. Indeed, we have (f +g)
−1(C\{0}) ⊆ f−1(C\{0})∪g−1(C\{0}), which by assumption implies µ((f +
g)−1(C \ {0})) = 0, or that indeed f(x) + g(x) = 0 holds for µ-almost all x.

Definition 1.46. Let (Ω,M, µ) be a measure space. Then in light of the above,
we define the quotient vector space

L1(µ) = L1(Ω,M, µ) = L1(Ω,M, µ)/N. For a function f ∈ L1 , we may for the


moment denote its coset by [f ] = f+N , but in time we will transition to
denote it simply by f , even though this is some abuse of notation. 5 From the
previous results above it follows that the assignment

L1(Ω,M, µ) → C,

[f ] → ∫ Ω f dµ
is a well-defined linear map, which we will call the integral. N happens
precisely when |f | ∈ N , it makes sense to define |[f ]| = [|f |]. Consequently,
the following semi-norm ∥ · ∥1 on L1 becomes a norm (cf. Proposition 1.38)
on the L1 -space via

∥f∥1 = ∫ Ω |f | dµ,

1.5 Convergence Theorems


Since f ∈

∥[f ]∥1 := ∥f∥1.

Theorem 1.47 (Monotone Convergence Theorem). Let (Ω,M, µ) be a mea


sure space and fn : Ω → [0, ∞] a (pointwise) increasing sequence of measur
able functions. Then f = supn≥1 fn is also measurable, and we have

∫ Ω f dµ = sup n≥1 ∫ Ω fn dµ.

Proof. By Proposition 1.35, for every n ≥ 1, we find an increasing sequence


of measurable step functions sn,k : Ω → [0, ∞] such that fn = supk≥1 sn,k.
Con sidering the remark in the footnote, we may assume that |fn(x) − sn,k(x)|
≤ 2−k whenever fn(x) ≤ k, and sn,k(x) = k whenever fn(x) ≥ k.

Since (sn,k)k is increasing, the sequence of positive measurable step func


tions tn = maxj≤n sj,n is increasing. Since fn is increasing, it follows that tn ≤
fn for all n. We claim f = supn≥1 tn. Indeed, let x ∈ Ω be given. If f(x) = ∞,
then fn(x) → ∞, so tn(x) ≥ sn,n(x) ≥ min(n, fn(x) − 2−n) → ∞. If f(x) < ∞,
then in particular f(x) ≤ n for all large enough n, and hence
|f(x) − tn(x)|

≤ |f(x) − sn,n(x)| ≤ |fn(x) − f(x)| + |fn(x) − sn,n(x)| ≤ |fn(x) − f(x)| + 2−n → 0.

We appeal to Proposition 1.36 and see that

∫ Ω f dµ = sup n≥1 ∫ Ω tn dµ tn≤fn ≤ sup n≥1 ∫ Ω fn dµ.

The “≥” relation is on the other hand clear from the fact that the integral is
monotone. This finishes the proof.

5But this is justified by the fact that we will mostly form integrals, which do
not depend on the representative for such a coset.

Lemma 1.48 (Fatou). Let (Ω,M, µ) be a measure space and fn : Ω → [0, ∞] a


sequence of measurable functions. Then lim n→∞ inf fn is also measurable,
and we have

∫ Ω (lim n→∞ inf fn) dµ ≤ lim n→∞ inf ∫ Ω fn dµ. Proof. Denote gk = infn≥k
fn, and recall that lim infn→∞ fn = supk≥1 gk. We thus see that lim infn→∞
fn is a measurable function. If n ≥ k, then evidently gk ≤ fn, so ∫ Ω gk dµ ≤ ∫ Ω
fn dµ. In particular, this is true for arbitrarily large n, so ∫ Ω gk dµ ≤ lim
infn→∞ ∫ Ω fn dµ. Using the Monotone Convergence Theorem, we thus see ∫
Ω lim n→∞ inf fn dµ = sup k≥1 ∫ Ω gk dµ ≤ lim n→∞ inf ∫ Ω fn dµ.

Theorem 1.49 (Dominated Convergence Theorem). Let (Ω,M, µ) be a mea


sure space and fn ∈ L1(Ω,M, µ) a sequence converging to a function f point
wise. Suppose that there exists a positive integrable function g : Ω → [0, ∞)
such that |fn| ≤ g for all n. Then f ∈ L1(Ω,M, µ), and

n→∞ lim ∫ Ω |f − fn| dµ = 0,

∫ Ω f dµ = n→∞ lim ∫ Ω fn dµ.

Proof. We have that f is measurable (Theorem 1.13) and |f | ≤ g, hence f ∈


L1(Ω,M, µ). Furthermore, we have |f − fn| ≤ 2g, and so we may apply Fatou’s
Lemma to the sequence 2g − |f − fn| to get
∫ ∫ ∫ ∫
∫ Ω 2g dµ ≤ lim n→∞ inf ∫ Ω 2g − |f − fn| dµ = ∫ Ω 2g dµ − lim n→∞ sup ∫ Ω
|f − fn| dµ .

︸ ︷︷ ︸
≥0

Since g was assumed to be integrable, this is equivalent to

0 = lim n→∞ sup ∫ Ω |f − fn| dµ = n→∞ lim ∫ Ω |f − fn| dµ. As a consequence


we obtain

∣∣∣∣ ∫ Ω f dµ − ∫ Ω fn dµ ∣∣∣∣ = ∣∣∣∣ ∫ Ω


f − fn dµ ∣∣∣∣ ≤ ∫ Ω |f − fn| dµ n→∞
−→ 0.
This finishes the proof.

Remark 1.50. For practical applications of the Dominated Convergence


Theorem, it is useful to observe that it is only necessary to assume that the
pointwise convergence fn(x) → f(x) and the inequality |fn(x)| ≤ g(x) holds for
µ-almost all x ∈ Ω. This is a consequence of Proposition 1.43, as we may
just re-define all functions fn on a common measurable null set to have value
zero and thus enforce these statements to hold on all points, yet the integrals
in the conclusion of the statement remain the same.

Remark 1.51. The Dominated Convergence Theorem really only holds for
sequences of functions, and its analogous generalizations for more general
families of functions (such as nets) are false. A counterexample is discussed
in the exercise sessions.
Theorem 1.52. Let (Ω,M, µ) be a measure space. Suppose that a sequence fn
∈ L1(Ω,M, µ) satisfies the Cauchy criterion in the semi-norm ∥·∥1. Then
there exists a subsequence (fnk)k and a function f ∈ L1(Ω,M, µ) such that

fnk(x) k→∞ −→ f(x)

for µ-almost all x ∈ Ω,

and moreover ∥f − fn∥1 n→∞ −→ 0. In particular, it follows that L1(Ω,M, µ)


is a Banach space with respect to the norm ∥ · ∥1.

Proof. By applying the Cauchy criterion inductively, we can find a subse


quence (fnk)k such that ∥fnk − fnk+1∥1 ≤ 2−k . We define the measurable
function g : Ω → [0, ∞] as g(x) = ∑∞ k=1 |fnk(x) − fnk+1(x)|. Then it follows
from the Monotone Convergence Theorem that

∫ Ω g dµ = k=1 ∞∑ ∫ Ω |fnk − fnk+1| dµ = k=1 ∞∑ ∥fnk − fnk+1∥1 ≤ k=1 ∞∑


2−k = 1.

In particular g is integrable, and by Proposition 1.38, the set E = g−1([0, ∞))


has a complement of zero measure. For all x ∈ E, we have by definition that
the series ∞∑ [fnk+1(x) − fnk(x)] converges absolutely, and hence the
function

k=1

f(x) := fn1(x) + ∞∑ [fnk+1(x) − fnk(x)] = k→∞ lim fnk(x),

k=1

x∈E
is well defined and measurable on E. We extend f to a measurable function
on Ω by defining it to be zero on the complement of E. We get by the triangle
inequality that for all k ≥ 1, the function fnk is dominated (on E) by the
integrable function |fn1|+g. Therefore it follows from the Dominated
Convergence Theorem that ∥f − fnk∥1 k→∞ −→ 0. Since the sequence (fn)n
was assumed to satisfy the Cauchy criterion in ∥ · ∥1 and we just showed that
a subsequence converges to f in this norm, it follows that also ∥f − fn∥ → 0. 6
This finishes the proof.

Proposition 1.53. Let (Ω1,M1, µ1) be a measure space, Ω2 a non-empty set,


and φ : Ω1 → Ω2 a map. Then

φ∗M1 := { E ⊆ Ω2 | φ−1(E) ∈ M1 }

is a σ-algebra and

φ∗µ1 : φ∗M1 → [0, ∞],

φ∗µ1(E) = µ1(φ−1(E))

These are called the push-forward σ-algebra and the push forward measure
with respect to φ. Then with respect to this measure space structure on Ω2, φ
becomes a measurable map.

is a measure.

Proof. We have already seen in the exercise sessions that φ∗M1 is a σ-


algebra. For the measure condition, we observe that φ∗µ1(∅) = 0 is trivial.
Moreover if En ∈ φ∗M1 is a sequence of disjoint sets, then φ−1(En) ∈ M1 is
a sequence of disjoint sets, and hence

φ∗µ1(⋃ En) = µ1(⋃ φ−1(En)) = ∑ µ1(φ−1(En)) = ∑ φ∗µ1(En). The last


part of the statement is trivial.

Theorem 1.54. Let (Ω1,M1, µ1) be a measure space, Ω2 a non-empty set,


and φ : Ω1 → Ω2 a map. We define M2 = φ∗M1 and µ2 = φ∗µ1 in the above

sense.

(i) If f : Ω2 → [0, ∞] is a measurable function, then


∫ Ω2 f dµ2 = ∫ Ω1 f ◦ φ dµ1.
(ii) If f : Ω2 → C is a µ2-integrable function, then f ◦ φ is µ1-integrable, and
we have

∫ Ω2 f dµ2 = ∫ Ω1 f ◦ φ dµ1.

6As an exercise, fill in this detail yourself! This is a standard ε/2-argument,


which is very similar to how one proves the completeness of R out of the
axiom that bounded sets have suprema.

Proof. (i): By definition, a set E ⊆ Ω2 belongs to M2 precisely when φ−1(E)


∈ M1. Since χE ◦ φ = χφ−1(E), we have

∫ Ω2 χE dµ2 = µ2(E) = µ1(φ−1(E)) = ∫ Ω1 χE ◦ φ dµ1.

In other words, the claim holds for functions of the form f = χE. By linearity
of the integral, the desired equation holds for all positive measurable step
functions in place of f . Now let f be as general as in the statement, and write
f = supn∈N sn for an increasing sequence of positive measurable step
functions sn : Ω2 → [0, ∞), using Proposition 1.35. Clearly we also have f ◦
φ = supn∈N sn ◦ φ. Then by Proposition 1.36, we see that

∫ Ω2 f dµ2 = sup n∈N ∫ Ω2 sn dµ2 = sup n∈N ∫ Ω1 sn ◦ φ dµ1 = ∫ Ω1 f ◦ φ


dµ1.

(ii): By definition, f being µ2-integrable means that ∫ Ω2 |f | dµ2 < ∞, which


by part (i) implies ∫ Ω1 |f ◦φ| dµ1 < ∞. In other words, f ◦φ is µ1-integrable.
The desired equality thus follows directly from part (i), the linearity of the in
tegral, and the fact that we may write f as a linear combination of integrable
positive functions.

Remark 1.55. In the above theorem, we pushed forward the measure space
structure from Ω1 to get one on Ω2 which makes the statement of the theorem
true. It may of course happen that we have an a priori given measure space
(Ω2,M2, µ2) and a measurable map φ : Ω1 → Ω2 with the property that
µ1(φ−1(E)) = µ2(E) for all E ∈ M2. Convince yourself that the statement of
the theorem will still be true!
2 Carathéodory’s Construction of
Measures
2.1 Measures on Semi-Rings and Rings
Definition 2.1. Let Ω be a non-empty set. A semi-ring on Ω is a set of subsets
A ⊆ 2Ω such that

1. ∅ ∈ A.
2. If E, F ∈ A, then E ∩ F ∈ A.
3. If E, F ∈ A, then E \ F is a finite disjoint union of sets in A.

Example 2.2. The set A of all subsets of Ω with at most one element form a
semi-ring. More interestingly, if n ≥ 2, then the set of half-open cubes

A = {(a1, b1] × · · · × (an, bn] | aj, bj ∈ R, aj ≤ bj}

is also a semi-ring on Rn . (This will be justified later.) Definition 2.3. Let Ω


be a non-empty set. A ring over Ω is a set of subsets R ⊆ 2Ω such that

1. ∅ ∈ R.
2. If E, F ∈ R, then E ∪ F ∈ R.
3. If E, F ∈ R, then E \ F ∈ R.

Remark 2.4. One always has E ∩ F = E \ (E \ F ), so every ring is closed


under forming intersections. In particular, it follows that every ring is also a
semi-ring. The converse is of course not true, e.g., the set of (left) half open
intervals in R is a semi-ring, but not a ring.

Lemma 2.5. Let A be a semi-ring over Ω. Then the smallest ring R over Ω
containing A is given as

R = {E1 ∪ · · · ∪ En | Ej ∈ A are pairwise disjoint} . We will refer to it as


the ring generated by A.
Proof. If R is given as above, then evidently every ring containing A will
also contain R. Hence the claim will follow once we show that R is indeed a
ring. It is first of all clear that ∅ ∈ R. Let E, F ∈ R. We shall first show

E \ F ∈ R.

By definition, we find pairwise disjoints sets E1, . . . , En ∈ A and pairwise

disjoint sets F1, . . . , Fm ∈ A such that E = ⋃n j=1 Ej and F = ⋃m ℓ=1 Fℓ.


Then

E \ F = j=1 n⋃ Ej \ ( ℓ=1 m⋃ Fℓ ) = j=1 n⋃ [(((Ej \ F1) \ F2 ) \ . . . ) \ Fm ]

Since A was assumed to be a semi-ring, the set Ej \ F1 is a finite union of


pairwise disjoint sets in A. Once we know this, it follows again by the semi
ring property that (Ej \ F1) \ F2 is a finite union of pairwise disjoint sets in
A. By induction, it follows that each expression of the form (((Ej \ F1) \ F2 )
\ . . . ) \ Fm is a finite union of disjoint sets in A. These unions are in turn
pairwise disjoint in j, and hence it follows that E \ F ∈ R.

Now from this we also get E∪F ∈ R because one can write it as a disjoint
union E ∪ F = E ∪ (F \ E), and it is clear that R is closed under disjoint
unions of its elements.

Definition 2.6. Let A be a semi-ring on Ω. A measure on A is a map µ : A →


[0, ∞] such that

i. µ(∅) = 0.
ii. If En ∈ A is a sequence of pairwise disjoint sets with ⋃ n≥1 En ∈ A,
then µ(⋃ n≥1 En) = ∑ n≥1 µ(En).
(σ-additivity)

In particular, if R is a ring on Ω, then a measure on the ring R is defined


as a measure on R viewed as a semi-ring. Proposition 2.7. Let A be a
semi-ring over Ω, and let R be the ring gen- erated by A. Then every
measure µ0 on A extends uniquely to a measure µ
n R. Proof. By the definition of R, it is clear that if µ exists, then the σ-
additivity
f µ implies that µ is uniquely determined by µ0. So let us argue why µ
exists in the first place. Let E ∈ R, and write it as E = E1 ∪· · ·∪En
with pairwise disjoint sets Ej ∈ A. Define
µ(E) := µ0(E1) + · · · + µ0(En).
We have to show that µ is well-defined, and that it is σ-additive.
Suppose that F1, . . . , Fm ∈ A is another collection of pairwise disjoint
sets with E = F1 ∪ · · · ∪ Fm. Then one has for all j ∈ {1, . . . , n} that

Ej = Ej ∩ E = m⋃ (Ej ∩ Fℓ),

ℓ=1

and this union is a disjoint union. Analogously we have Fℓ = ⋃n j=1(Ej ∩


Fℓ) for all ℓ ∈ {1, . . . , m}. By using the additivity of µ0, it follows that

n∑ µ0(Ej) = n∑ m∑ µ0(Ej ∩ Fℓ) = m∑ n∑ µ0(Ej ∩ Fℓ) = m∑ µ0(Fℓ).

j=1

j=1 ℓ=1

So we see that µ is a well-defined map.

ℓ=1 j=1

ℓ=1

Now let us see why µ is σ-additive. Suppose that En ∈ R is a sequence of


pairwise disjoint sets with E = ⋃ n≥1 En ∈ R. Write E = A1 ∪ · · · ∪ Am for
pairwise disjoint sets A1, . . . , Am ∈ A, and moreover write En = An,1 ∪ · ·
· ∪ An,mn ∈ A for all n ≥ 1 some mn ≥ 1. Then applying σ-additivity of µ0
several times, we can see

µ(E) = m∑ µ0(Aj)

j=1

= j=1 m∑ µ0 ( n≥1 ⋃ (Aj ∩ En) )


= j=1 m∑ µ0 ( n≥1 ⋃ k≤mn ⋃ (Aj ∩ An,k) ) = m∑ ∞∑ mn∑ µ0(Aj ∩ An,k)

j=1 n=1 k=1

= ∞∑ m∑ mn∑ µ0(Aj ∩ An,k) = ∞∑ µ(En).

n=1 j=1 k=1

2.2 Outer measures


n=1

Definition 2.8. Let Ω be a non-empty set. An outer measure on Ω is a map ν :


2Ω → [0, ∞] satisfying:

OM1. ν(∅) = 0. OM2. If E ⊆ F ⊆ Ω, then ν(E) ≤ ν(F ). OM3. If En ⊆ Ω is


any sequence of pairwise disjoint sets, then ν ( ⋃ n≥1 En ) ≤

∑∞ n=1 ν(En).

(σ-subadditivity)

Remark 2.9. WARNING! The terminology is a bit confusing. Contrary to the


usual rules of the English language, an “outer measure” in the above sense
does not describe a measure that has an additional “outerness” property.
Instead, it describes a weaker concept than that of a measure.

Proposition 2.10. Let Ω be a non-empty set, R a ring on Ω, and µ a measure


on R. Define for every subset E ⊂ Ω the value

µ∗(E) = inf n=1 ∞∑ µ(En) ∣∣∣∣ En ∈ R is a sequence with E ⊆ n≥1 ⋃


En . 7 Then µ∗ defines an outer measure on Ω.

7Keep in mind that by convention, inf ∅ := ∞. For certain sets E there may
not exist any sequence En with these properties.
Proof. One gets µ∗(∅) = 0 by choosing En = ∅. If E ⊆ F ⊆ Ω are two sub
sets, then evidently there are at least as many ways to cover E by sequences
in R as for F , which leads directly to µ∗(E) ≤ µ∗(F ).

Now let An ⊆ Ω be a sequence of (pairwise disjoint) sets. We shall show


that µ∗(⋃ n≥1 An) ≤ ∑∞ n=1 µ∗(An). If the right side is infinite, there is
nothing to show, so let us assume that it is finite. Let ε > 0. For every n ≥ 1,
we may by definition find a sequence En,k ∈ R with An ⊆ ⋃ k≥1 En,k and
∑∞ k=1 µ(En,k) ≤ µ∗(An)+2−nε. Then the set ⋃ n≥1 An is of course covered
by the countably many sets En,k ∈ R for n, k ≥ 1, and hence

µ∗( ⋃ An) ≤ ∞∑ ∞∑ µ(En,k) ≤ ∞∑ (2−nε + µ∗(An)) = ε + ∞∑ µ∗(An).

n≥1

n=1 k=1

n=1

n=1

Since ε > 0 was arbitrary, this shows the claim. Definition 2.11. Let ν be an
outer measure on a set Ω. We say that a set E ⊆ Ω is ν-measurable, if for
every subset A ⊆ Ω, we have ν(A) = ν(A ∩ E) + ν(A \ E).

Theorem 2.12 (Carathéodory). Let ν be an outer measure on a set Ω. Then the


ν-measurable subsets of Ω form a σ-algebra M, and the restriction of ν to M
defines a measure.

Proof. Evidently ∅ ∈ M and if E ∈ M, then also Ec ∈ M. Let us consider


finite unions. Let E, F ∈ M, and A ⊆ Ω an arbitrary subset. Then

ν(A) = ν ( A ∩ (E ∪ F ) ∪ A \ (E ∪ F ))

≤ ν(A ∩ (E ∪ F )) + ν(A \ (E ∪ F )) = ν(A ∩ E ∪ (A ∩ F \ E)) + ν(A \ (E ∪ F


)) ≤ ν(A ∩ E) + ν((A \ E) ∩ F ) + ν((A \ E) \ F )) = ν(A ∩ E) + ν(A \ E) =
ν(A).
But from this we get equality everywhere, which implies E ∪ F ∈ M because
A was arbitrary. If additionally E ∩ F = ∅, then we have ν(A ∩ (E ∪ F )) =
ν(A ∩ (E ∪ F ) ∩ E) + ν(A ∩ (E ∪ F ) \ E) = ν(A ∩ E) + ν(A ∩ F ). So
inserting A = E ∪ F yields that ν is finitely additive on M.

From E ∪ F ∈ M it immediately follows that M is closed under intersec tions


and differences. Hence M becomes a σ-algebra if we can show that for all
sequences of pairwise disjoint sets En ∈ M, we have E = ⋃ n≥1 En ∈ M.

For m ≥ 1 and all sets A ⊆ Ω, we have

ν(A) = ν ( A ∩ ( n≤m ⋃ En )) + ν ( A
\ ( n≤m ⋃ En ))
= n=1 m∑ ν(A ∩ En) + ν ( A \ ( n≤m ⋃ En ))

≥ m∑ ν(A ∩ En) + ν(A \ E).

n=1

Since m was arbitrary, we get ν(A) ≥ ∞∑ ν(A∩En)+ν(A\E). On the other

n=1

hand, the equality A = (A \ E) ∪ (A ∩ E) = (A \ E) ∪ ⋃ n≥1 A ∩ En together


with σ-subadditivity yields

ν(A) ≤ ν(A ∩ E) + ν(A \ E)

≤ ∞∑ ν(A ∩ En) + ν(A \ E) ≤ ν(A).

n=1

In particular we get the equality ν(A) = ν(A ∩ E) + ν(A \ E), which yields E
∈ M as A was arbitrary. Furthermore, if we insert A = E, then we also have
ν(E) = ∑∞ n=1 ν(En), which shows that ν is σ-additive on M. In particular it
is indeed a measure when restricted to M.

Definition 2.13. A measure space (Ω,M, µ) is called complete, if for all sets
E ⊆ F ⊆ Ω, one has that F ∈ M and µ(F ) = 0 implies E ∈ M.

Proposition 2.14. Let Ω be a non-empty set and ν an outer measure on Ω. Let


M be the σ-algebra of ν-measurable sets, and define the measure µ = ν|M.
Then (Ω,M, µ) is a complete measure space.

Proof. Suppose E ⊆ F ∈ M are given with µ(F ) = 0. Then we observe for all
A ⊆ Ω that ν(A) ≤ ν(A ∩ E) + ν(A \ E) ≤ ν(A ∩ F ) + ν(A \ E) ≤ 0 + ν(A).

So we see that these are all equalities. Since A was arbitrary, this implies
that E ∈ M.

Theorem 2.15. Let R be a ring on a set Ω, and µ a measure on R. Then the σ-


algebra M of µ∗ -measurable sets contains R, and we have µ∗|R = µ.

Proof. We see right away for all E ∈ R that µ∗(E) ≤ µ(E) since we can write
E = E ∪ ∅ ∪ ∅ ∪ . . . . On the other hand, if En ∈ R is any sequence of sets
with E ⊆ ⋃ n≥1 En, then it follows from σ-subadditivity and monotonicity of
µ that

µ(E) = µ ( n≥1 ⋃ (E ∩ En) ) ≤ n=1 ∞∑ µ(E ∩ En) ≤ n=1 ∞∑ µ(En).

By taking the infimum over all possible choices of such sequences, we arrive
at µ(E) = µ∗(E). Since E ∈ R was arbitrary, we have just shown µ∗|R = µ.
Now we need to show that every set E ∈ R is µ∗ -measurable. Let A ⊆ Ω be
any set. Since we always have µ∗(A) ≤ µ∗(A ∩ E) + µ∗(A \ E), we may
assume without loss of generality that µ∗(A) < ∞. Let An ∈ R be a sequence
of sets with A ⊆ ⋃ n≥1 An. Then

µ∗(A) ≤ µ∗(A ∩ E) + µ∗(A \ E)

≤ ∑∞ n=1 µ(An ∩ E) + µ(An \ E) = ∑∞ n=1 µ(An).

If we take the infimum over all possible such sequences An, then the right
side approaches the value µ∗(A), and hence we get the equality µ∗(A) =
µ∗(A ∩ E) + µ∗(A \ E). Since A ⊆ Ω and E ∈ R were arbitrary, this finishes
the proof.

Definition 2.16. Let A be a semi-ring over Ω, and µ a measure on A. We say


that µ is σ-finite, if there is a sequence En ∈ A with µ(En) < ∞ and Ω = ⋃
n≥1 En.

Theorem 2.17. Let R be a ring on a set Ω, and µ a σ-finite measure on R. Let


M0 be the σ-algebra generated by R. Then the measure extension µ∗|M0 from
R to M0 is the unique measure on M0 extending µ on R.

Proof. Suppose that µ1 is any measure on M0 with µ1|R = µ. We claim µ1 ≤


µ∗|M0 . Let E ∈ M0, and assume without loss of genereality µ∗(E) < ∞. So in
particular there exist sequences En ∈ R with E ⊆ ⋃ n≥1 En. By σ
subadditivity of µ1, it follows that µ1(E) ≤ ∑∞ n=1 µ(En).

But since this holds for any choice of (En)n, we obtain µ1(E) ≤ µ∗(E).

Since we assumed µ to be σ-finite, we can choose some sequence Fn ∈ R


with µ(Fn) < ∞ and Ω = ⋃ n≥1 Fn. Without loss of generality we may assume
that Fn ⊆ Fn+1 for all n ≥ 1. For every A ∈ M0 we have µ1(Fn ∩ A) +
µ1(Fn \ A) = µ1(Fn) = µ∗(Fn) = µ∗(Fn ∩ A) + µ∗(Fn \ A). Note that all these
summands are finite, and we know from above µ1(Fn ∩ A) ≤ µ∗(Fn ∩ A). It
follows that necessarily µ1(Fn ∩ A) = µ∗(Fn ∩ A). By taking the supremum
over n, we conclude µ1(A) = µ∗(A).

29

Corollary 2.18. Let A be a semi-ring over a set Ω, and µ0 a measure on A.


Let M be the σ-algebra generated by A. Then there exists a measure µ : M →
[0, ∞] extending µ0. If µ0 is σ-finite, then µ is unique.

Proof. We note that if R is the ring generated by A, then M is also the σ-


algebra generated by R. So this is a direct consequence of Proposition 2.7,
Proposition 2.10, Theorem 2.12, and Theorem 2.15. In the case of µ0 being
σ-finite, uniqueness of µ is exactly Theorem 2.17.

2.3 Application: The Lebesgue measure on R


Notation. In what follows, we will fix a bijection φ on a set Ω. We will also
denote by φ the induced bijection on 2Ω , which associates to every subset E
⊆ Ω its image φ(E) under φ.

Proposition 2.19. Let A be semi-ring on a set Ω, and let µ0 : A → [0, ∞] be a


σ-finite measure. Let R be the ring generated by A, and µ : R → [0, ∞] the
unique extension to a measure. Suppose that φ is a bijection on Ω that
restricts to a bijection on A, and suppose that µ0 = µ0 ◦ φ. Then φ restricts to
a bijection on R, and µ = µ ◦ φ.

Proof. This is immediate from the definition of both R and µ, and is left as an
exercise.

Proposition 2.20. Let R be a ring on a set Ω, and let µ : R → [0, ∞] be a


measure. Suppose that φ is a bijection on Ω that restricts to a bijection on R,
and suppose that µ = µ ◦ φ. Then the outer measure µ∗ satisfies µ∗ ◦ φ = µ∗ .
Moreover, φ restricts to a bijection on the σ-algebra of µ∗ -measurable
subsets of Ω.

Proof. Let A ⊆ Ω be an arbitrary subset, and let En ⊆ Ω be any sequence of


sets. Then clearly A ⊆ ⋃ n≥1 En if and only if φ(A) ⊆ ⋃ n≥1 φ(En). If φ
defines a bijection on R, then also En ∈ R if and only if φ(En) ∈ R.
Furthermore we have from assumption that µ(φ(En)) = µ(En). By the
definition of µ∗ , we immediately get µ∗(A) = µ∗(φ(A)).

Now assume that E ⊆ Ω is µ∗ -measurable. Then we have

µ∗(A) = µ∗(φ−1(A))

= µ∗(φ−1(A) ∩ E) + µ∗(φ−1(A) \ E) = µ∗(A ∩ φ(E)) + µ∗(A \ φ(E)).

Since A is arbitrary, it follows that φ(E) is µ∗ -measurable. argument shows


that if φ(E) is µ∗ -measurable, then E was µ∗ -measurable to begin with.
This finishes the proof.

The reverse

Proposition 2.21. Consider the set A ⊆ 2R of half-open intervals A =


{(a, b] | a, b ∈ R, a ≤ b}. 8

Then A is a semi-ring, and the map µ0 : A →

[0, ∞] given by µ0 ((a, b]) = b − a is a measure on A.

Proof. Evidently ∅ ∈ A. We have (a, b] ∩ (c, d] = ( max(a, c), min(b, d)] , so


A is closed under intersections. The set-difference is given as a disjoint

union (a, b] \ (c, d] = (a, c] ∪ (d, b], so we see that A is a semi-ring.


Evidently µ0(∅) = 0, so we only need to show the σ-additivity.

Suppose that E = (a, b] ∈ A, and let En = (an, bn] ∈ A be a sequence of


pairwise disjoint sets with E = ⋃ n≥1 En. Given any M ≥ 1, we have in
particular ⋃M n=1 En ⊂ E. By reordering En for 1 ≤ n ≤ M and discarding
any empty sets if necessary, we may assume bn ≤ an+1 for all n < M . This
leads to

M∑ µ0(En) = M∑ bn−an ≤ bM −aM + M−1∑ an+1−an = bM −a1 ≤ b−a =


µ0(E). n=1

n=1

n=1

Since M is arbitrary, this leads to ∑∞ n=1 µ0(En) ≤ µ0(E). Let ε > 0 with ε <
b − a. Then in particular

[a + ε, b] ⊂ (a, b] = ⋃ (an, bn] ⊂ ⋃ (an, bn + 2−nε).

n≥1

n≥1

The right-hand side is an open covering of the compact set on the left side,
and hence there is some N ≥ 1 such that [a+ε, b] ⊂ ⋃N n=1(an, bn +2−nε).
We change the ordering of the intervals appearing in this union by the
following inductive procedure: Choose k1 ∈ {1, . . . , N} to be the index so
that
ak1 = max { aj | a + ε ∈ (aj, bj + 2−jε)} . If b < bk1 + 2−k1ε, then the
procedure stops here. Otherwise, choose k2 ∈ {1, . . . , N} to be the index so
that ak2 = max { aj | bk1 + 2−k1 ∈ (aj, bj + 2−jε)} . If b < bk2 +2−k2ε, then
the procedure stops here. Otherwise one continues in ductively until the
procedure stops after L ≤ N steps. This yields an injective map k : {1, . . . ,
L} → {1, . . . , N} such that [a+ε, b] ⊂ ⋃L n=1(akn , bkn +2−knε) and such
that for all n < L, we have akn+1 < bkn + 2−knε. From this we can

8By convention, we set (a, b] := ∅ if a ≥ b.

deduce (with the help of the first part of the proof)

b−a−ε

< bkL + 2−kLε − ak1

= bkL + 2−kLε − akL + L−1∑ akn+1 − akn

n=1

< bkL + 2−kLε − akL + L−1∑ bkn + 2−knε − akn

n=1

= L∑ bkn + 2−knε − akn

n=1

≤ ε + N∑ bn − an n=1

≤ ε + b − a.

From this we may conclude

b − a ≤ 2ε + N∑ bn − an ≤ 2ε + ∞∑ µ0(En).

n=1

n=1
Since ε > 0 was arbitrary, this finally implies µ0(E) = ∑∞ n=1 µ0(En) and
finishes the proof.

Theorem 2.22 (Lebesgue). The measure µ0 : A → [0, ∞] defined on the


semi-ring of half-open intervals A ⊂ 2R extends to a translation-invariant
measure λ : L → [0, ∞] on the σ-algebra L ⊂ 2R of Lebesgue-measurable
sets, which contains the Borel σ-algebra of R. On the Borel σ-algebra, λ is
the unique measure extending µ0, and in fact the unique translation-invariant
measure with λ((0, 1]) = 1.

Proof. We use Proposition 2.7 and extend µ0 to a measure µ : R → [0, ∞] on


the ring R ⊂ 2R generated by A. We consider the outer measure µ∗ on

R as in Proposition 2.10. We appeal to Theorem 2.12 and define L as the σ-


algebra of µ∗ -measurable sets (which we call Lebesgue-measurable), which
contains A. Since A generates the Borel σ-algebra, it follows that L contains
the Borel σ-algebra. By Theorem 2.15, the measure λ = µ∗|L indeed extends
µ on A.

Let us argue why λ is translation-invariant. Let t ∈ R. Consider the bijection


φ on R given by φ(a) = t + a. Then evidently φ restricts to a bijection on A,
and we have µ = µ ◦ φ. By combining Proposition 2.19 and Proposition 2.20,
it follows that φ restricts to a bijection on L, and λ = λ◦φ. In other words, we
have λ(E + t) = λ(E) for every Lebesgue-measurable set E ∈ L. Since t ∈ R is
arbitrary, this shows the claim.

Lastly, the measure µ0 on A is σ-finite. Since A generates the Borel σ


algebra, the uniqueness of the measure follows from Theorem 2.17. On the
other hand, if we have a translation-invariant measure µ with µ((0, 1]) = 1,
then it is easy to see that µ((0, n 1 ]) = n 1 for all n ≥ 1, which one can use to
show that µ agrees with λ on all sets in A with rational endpoints. If a, b ∈ R
with a < b, pick some number c ∈ Q strictly between them. Choose
decreasing sequences of rational numbers an, bn ∈ Q such that an → a and bn
→ b. Then

(a, c] = ⋃ (an, c],

n≥1
(c, b] = ⋂ (c, bn].

n≥1

By the continuity of measures, we can conclude that µ((a, b]) = b − a, so µ


agrees with λ on all of A, hence µ = λ on the Borel σ-algebra.

Remark 2.23. WARNING! The σ-algebra of Lebesgue sets is indeed big ger
than the Borel σ-algebra. 9

Nevertheless, one sometimes refers to the Lebesgue measure to mean its


restriction on the Borel σ-algebra, and de notes that also by λ. In the exercise
sessions, we have already discussed an example of a set A ⊂ [0, 1] that is
not even Lebesgue-measurable.

2.4 Application: Lebesgue–Stieltjes Measures


Definition 2.24. Let X be a locally compact, σ-compact Hausdorff space. A
Borel measure on X is a measure on the Borel σ-algebra that assigns a finite
value to every compact subset of X. In the special case X = R, these are
called Lebesgue–Stieltjes measures.

Proposition 2.25. Let µ be a Lebesgue–Stieltjes measure. Then there exists a


unique increasing right-continuous function F : R → R such that F (0) = 0 and
µ((a, b]) = F (b) − F (a) for all a, b ∈ R, a < b.

Proof. From these properties it follows that if such a function F exists at all,
then it has to be given by the formula

F (t) = µ((0, 0 −µ((t, t]) 0])

, t > 0 , t = 0 , t < 0.

We claim that this function has indeed the right properties. Let a, b ∈ R with a
< b. We aim to show µ((a, b]) = F (b) − F (a). If a ≥ 0, then

9This is not so easy to see, but an example is discussed here, for whoever is
interested:
https://www.math3ma.com/blog/lebesgue-but-not-borel.

µ((a, b]) = µ((0, b] \ (0, a]) = µ((0, b]) − µ((0, a]) = F (b) − F (a). If b < 0, If
a < 0 ≤ b, then we have µ((a, b]) =

we can prove this analogously. µ((a, 0] ∪ (0, b]) = µ((a, 0]) + µ((0, b]) = F
(b) − F (a). The fact that F is increasing follows immediately from the fact
that µ has nonnegative values. The right-continuity follows from n→∞ lim F
(a + εn) = n→∞ lim µ((0, a + εn]) = µ((0, a]) = F (a) for any sequence εn > 0
with εn → 0. Here we used the continuity property of µ as a measure with
respect to countable decreasing intersections.

Theorem 2.26. The assignment µ → F , which assigns to every Lebesgue–


Stieltjes measure its increasing right-continuous function as in Proposition
2.25, is a bijection. In particular, whenever F : R → R is an increasing right
continuous function with F (0) = 0, there exists a unique Lebesgue–Stieltjes
measure µ such that µ((a, b]) = F (b) − F (a) for all a, b ∈ R with a < b.

Proof. We have already seen that every Lebesgue–Stieltjes measure gives an


increasing right-continuous function with these properties. Furthermore, the
assignment µ → F is injective. This is because F uniquely determines how µ
is defined on the semi-ring of half-open intervals. Since µ restricted to this
semi-ring is σ-finite, it follows from Theorem 2.17 that µ is hence uniquely
determined by F .

It remains to be shown that for every choice of F , there exists a corre


sponding Lebesgue–Stieltjes measure. Indeed, let us define A as the semi-
ring of bounded half-open intervals. As in Proposition 2.21, we define µ : A
→ [0, ∞] via µ((a, b]) = F (b) − F (a). For σ-additivity, assume E = (a, b]
can be expressed as the disjoint union of En = (an, bn]. If M ≥ 1 is any
number and we reorder the En for n ≤ M to ensure bn ≤ an+1, then it follows
from the fact that F is increasing that

M∑ µ(En) n=1

M∑ F (bn) − F (an) n=1


≤ F (bM) − F (aM) + M−1∑ F (an+1) − F (an)

n=1

= F (bM) − F (a1)

≤ F (b) − F (a)

= µ(E).

Since M was arbitrary, we have ∑∞ n=1 µ(En) ≤ µ(E).

For the reverse inequality, choose ε > 0. We use right-continuity to pick

for every n ≥ 1 a small δn > 0 such that F (bn + δn) − F (bn) ≤ 2−nε. Then

[a + ε, b] ⊂ E = ∞⋃ En ⊂ ∞⋃ (an, bn + δn)

n=1

n=1

Since the left side is compact, we obtain some N ≥ 1 such that [a + ε, b] ⊂


⋃N n=1(an, bn + δn). Now repeating exactly the same argument as in the
proof of Proposition 2.21, we may deduce F (b)−F (a+ε) ≤ ε+ ∑∞ n=1 F (bn)
−F (an). Letting ε → 0, we obtain the σ-additivity for µ.

The rest of the claim follows from the Carathéodory construction, exactly as
in the proof of Theorem 2.22, see also Corollary 2.18.

Example 2.27. For the choice F = idR, we recover the Lebesgue measure. On
the other hand, if a > 0 is some chosen number and we set

F (t) = 01

, t < a , t ≥ a,

then one can show that we recover the Dirac measure µ = δa. For a ≤ 0 one
also recovers this measure with the function
F (t) = 0 −1

, t < a , t ≥ a.

More generally, if µ is the measure corresponding to an increasing right


continuous function F , one can show that µ({a}) = 0 if and only if F is
continuous at a.

2.5 Product measures


Definition 2.28. Let (Ωi,Mi) be two measurable spaces for i = 1, 2. Then the
product σ-algebra M1 ⊗ M2 on Ω1 × Ω2 is defined as the σ-algebra
generated by all sets of the form E1 × E2 for E1 ∈ M1 and E2 ∈ M2, the so-
called measurable rectangles in Ω1 × Ω2.

Proposition 2.29. Let (Ωi,Mi, µi) be two measure spaces for i = 1, 2. Then
the set of measurable rectangles

A = {E1 × E2 | E1 ∈ M1, E2 ∈ M2} ⊆ 2Ω1×Ω2

is a semi-ring. Furthermore, the map µ0 : A → [0, ∞] given by µ0(E1×E2) =


µ1(E1)µ2(E2) is a measure on A.

Proof. This is proved in the exercise sessions.

Theorem 2.30. Let (Ωi,Mi, µi) be two measure spaces for i = 1, 2. Then the
measure µ0 on the measurable rectangles extends to a measure on the product
σ-algebra µ1 ⊗ µ2 : M1 ⊗ M2 → [0, ∞].

Proof. This is a special case of Corollary 2.18. Definition 2.31. Let d ≥ 1.


Then the Lebesgue measure on Rd is defined as the d-fold product measure

λ(d) = λ ⊗ λ ⊗ · · · ⊗ λ


︷︷

d times

with respect to the measure space (R,L, λ). The Lebesgue σ-algebra L(d) on
Rd is the one consisting of all λ(d) -measurable sets in the sense of Defi
nition 2.11, which contains the Borel σ-algebra. If the dimension d is clear
from context, we may sometimes slightly abuse notation and just write λ for
the Lebesgue measure on Rd .

Corollary 2.32. The Lebesgue measure λ(d) : L(d) → [0, ∞] is translation


invariant for all d ≥ 1.

Proof. We already know that the Lebesgue measure on R is translation in


variant, so there is nothing to prove when d = 1. We proceed by induction
and assume that d ≥ 2 is a number so that λ(d−1) is translation invariant.

Let t = (t1, . . . , td) = (t′ , td) ∈ Rd = Rd−1 ×R. Denote by µ0 the product
measure defined on the measurable rectangles. If E ∈ L(d−1) and F ∈ L are
measurable, then (E × F ) + t = (E + t′) × (F + td), and so µ0((E × F ) + t) =
λ(d−1)(E + t′)λ(F + td) = λ(d−1)(E)λ(F ) = µ0(E ×F ). It now follows
directly from Proposition 2.19 and Proposition 2.20 that λ(d)(A + t) = λ(d)
(A) for all A ∈ L(d) , concluding the proof.

We conclude this section with Fubini’s theorem, which is a fundamental


result that tells us how to compute integrals over product measures.

Theorem 2.33 (Tonelli; see exercises). Let (Ωi,Mi, µi) be two σ-finite mea
sure spaces for i = 1, 2. Let f : Ω1 × Ω2 → [0, ∞] be a M1 ⊗ M2-
measurable function. Denote fx = f(x, _) : Ω2 → [0, ∞] and f y = f(_, y) : Ω1
→ [0, ∞].

Then:

a. For every x ∈ Ω1, the function fx is M2-measurable.


b. For every y ∈ Ω2, the function f y is M1-measurable.


c. The function Ω1 → [0, ∞] given by x → ∫ Ω2 fx dµ2 is M1-measurable.
d. The function Ω2 → [0, ∞] given by y → ∫ Ω1 f y dµ1 is M2-
measurable.
e. One has the equalities ∫ Ω1×Ω2 f d(µ1⊗µ2) = ∫ Ω1 ( ∫ Ω2 fx dµ2 )
dµ1(x) = ∫ Ω2 ( ∫ Ω1 f y dµ1 ) dµ2(y).

Theorem 2.34 (Fubini). Let (Ωi,Mi, µi) be two σ-finite measure spaces for i
= 1, 2. Let f : Ω1 × Ω2 → C be a M1 ⊗ M2-measurable function. Denote fx
= f(x, _) : Ω2 → C and f y = f(_, y) : Ω1 → C. Then the following are
equivalent:

A. ∫ Ω1×Ω2 |f | d(µ1 ⊗ µ2) < ∞;


B. ∫ Ω1 ( ∫ Ω2 |fx| dµ2 ) dµ1(x) < ∞;
C. ∫ Ω2 ( ∫ Ω1 |f y| dµ1 ) dµ2(y) < ∞.

If any (or every) one of these statements holds, then we have that

a. The function Ω1 → C given by x → ∫ Ω2 fx dµ2 is well-defined on a


conull set and µ1-integrable.
b. The function Ω2 → C given by y → ∫ Ω1 f y dµ1 is well-defined on a
conull set and µ2-integrable.
c. One has the equalities ∫ Ω1×Ω2 f d(µ1⊗µ2) = ∫ Ω1 ( ∫ Ω2 fx dµ2 )
dµ1(x) = ∫ Ω2 ( ∫ Ω1 f y dµ1 ) dµ2(y).

Proof. It follows from Tonelli’s theorem that the integrals appearing in (A),
(B), and (C) are always the same, so their finiteness are indeed equivalent.
Now let us assume that these statements are true. By splitting f up into its real
and imaginary parts, let us assume without loss of generality that f is a real
function.

By part (c) in Tonelli’s theorem and Proposition 1.38, it follows that the set
E of all x ∈ Ω1, for which fx is integrable, is in M1 and its complement is a
null set. Let f = f+ − f− be the canonical decomposition into positive
functions as in Remark 1.39. Then it is very easy to see that (f+)x = (fx)+ and
(f−)x = (fx)− . So f+ x and f− x are both integrable whenever x ∈ E.
Moreover the map

E ∋ x → ∫ Ω2 fx dµ2 = ∫ Ω2 f+ x dµ2 − ∫ Ω2 f− x dµ2


is M1-measurable by part (c) in Tonelli’s theorem, and this function is in fact
µ1-integrable because

∫ Ω1 ∣∣∣∣ ∫ Ω2 fx dµ2 ∣∣∣∣ dµ1(x) ≤ ∫ Ω1 ( ∫ Ω2 |fx| dµ2 ) dµ1(x) < ∞.

Hence it follows that

∫ = Ω1 ∫ ( Ω1 ∫ Ω2 χE fx ( dµ2 ∫ Ω2 ) fx dµ1(x) dµ2 ) dµ1(x) = = ∫ ∫ Ω1 Ω1


χE χE ( ( ∫ ∫ Ω2 Ω2 f+ f+ x x dµ2 dµ2 ) − dµ1(x) ∫ Ω2 f− x − dµ2 ∫ Ω1 ) χE
dµ1(x) ( ∫ Ω2 f− x dµ2 ) dµ1(x)

= ∫ Ω1 ( ∫ Ω2 f+ x dµ2 ) dµ1(x) − ∫ Ω1 ( ∫ Ω2 f− x dµ2 ) dµ1(x)

= ∫ Ω1×Ω2 f+ d(µ1 ⊗ µ2) − ∫ Ω1×Ω2 f− d(µ1 ⊗ µ2)

= ∫ Ω1×Ω2 f d(µ1 ⊗ µ2).

The remaining equality follows exactly in the same way by exchanging the
roles of Ω1 and Ω2.

2.6 Infinite products of probability measures


Lemma 2.35. Let Ω be a non-empty set and A ⊆ 2Ω a semi-ring with Ω ∈ A.
Suppose that µ : A → [0, 1] is a map with µ(Ω) = 1. Then µ is a measure if
and only if for every sequence An ∈ A of pairwise disjoint sets with Ω = ⋃
n≥1 An, one has ∑∞ n=1 µ(An) = 1.

Proof. Clearly any measure must satisfy this property, so the “only if” part is
clear. For the “if” part, we may first take A1 = Ω and An = ∅ for all n ≥ 2,
which immediately implies µ(∅) = 0. We need to show that µ is σ additive.
Let Bn ∈ A be a sequence of pairwise disjoint sets such that B = ⋃ n≥1 Bn ∈
A. As A is a semi-ring, we have Ω \ B = A1 ∪ · · · ∪ Aℓ for pairwise
disjoint sets A1, . . . , Aℓ ∈ A. Then the assumption implies on the one hand
that 1 = µ(B) + ∑ℓ n=1 µ(An). On the other hand, if we set Aℓ+k = Bk for k
≥ 1, then the sequence (An)n≥1 defines a pairwise disjount covering of Ω, so
1 = ∑∞ n=1 µ(An). Comparing these two equations, we see that µ(B) = ∑
n>ℓ µ(An) = ∑∞ n=1 µ(Bn), which shows our claim.
Definition 2.36. Let I be a non-empty index set and (Ωi,Mi, µi) a proba bility
space for every i ∈ I. We denote Ω = ∏ i∈I Ωi, and set

A = {∏ i∈I Ei | Ei ∈ Mi for all i ∈ I, Ei = Ωi for all but finitely many i ∈ I } .


38

The σ-algebra generated by A is denoted by ⊗ i∈I Mi. We consider the map


µ : A → [0, 1] given by µ(∏ i∈I Ei) = ∏ i∈I µi(Ei). Note that these products
are well-defined as all but finitely many factors are assumed to be 1.

Theorem 2.37. Adopt the notation from the above definition. Then A is a
semi-ring and µ is a measure on A. In particular, it extends uniquely to a
probability measure µ on ⊗ i∈I Mi. One also denotes µ = ⊗ i∈I µi.

Proof. The “in particular” part is due to Corollary 2.18. By definition, every
set in A is a product of subsets of the spaces Ωi which are proper subsets
only over finitely many indices. In particular, given any sequence An, there
are countably many indices in I so that over every other index i ∈ I, the
projection of every set An to the i-th coordinate yields Ωi. Considering the
semi-ring axioms for A and the axioms of being a measure for µ, we can see
that it is enough to consider the case where I is countable. As we already
know that the claim is true if I is finite, we may from now on assume I = N.

The fact that A is a semi-ring now follows directly from the exercises. We
hence need to show that µ is a measure on A, where our goal is to appeal to
the condition in the above lemma. Suppose that An ∈ A is a sequence of
pairwise disjoint sets with Ω = ⋃ n≥1 An.

It is our intention to show ∑∞ n=1 µ(An) = 1.

For each n ≥ 1 let us write An = ∏∞ i=1 An,i and pick in ≥ 1 such that An,i =
Ωi whenever i > in. For all m, n ∈ N and ω = (ωi)i ∈ Am we claim to have
the equation

∏ χAn,i(ωi) · ∏ µi(An,i) = δn,m.

i≤im
i>im

Indeed, if n = m, then all the involved factors are equal to one, so the
equation holds. Assume n = m. We observe χAk((ωi)i) = ∏ i≤ik χAk,i(ωi).
As 1 = ∑∞ k=1 χAk , we conclude for (ωi)i ∈ Am that 0 = ∏ i≤in χAn,i(ωi).
So either in ≤ im, in which case the desired inequality above follows
immediately. Or, if in > im, then every tuple of the form (ω1, . . . , ωim ,
αim+1, . . . ) is also in Am, so analogously

0 = ∏ χAn,i(ωi) ·

i≤im

in∏ χAn,i(αi).

i=im+1

By consecutively integrating over αi for i = im + 1, . . . , in, we recover the


equation we wish to show. Finally, let us assume that ∑∞ n=1 µ(An) = 1. We

have

∞∑ µ(An)

n=1

∞∑ ∞∏ µi(An,i) n=1 i=1

= n=1 ∞∑ ∫ Ω1 χAn,1(ω1) · ( i=2 ∞∏ µi(An,i) ) dµ1(ω1)

= ∫ Ω1 n=1 ∞∑ χAn,1(ω1) · ( i=2 ∞∏ µi(An,i) ) dµ1(ω1)

As this quantity is not 1, there must be some ω1 ∈ Ω1 for which

n=1 ∞∑ χAn,1(ω1) · ( i=2 ∞∏ µi(An,i) ) = 1.


By iterating this process, we can come up with a tuple ω = (ωi)i ∈ Ω
satisfying

n=1 i≤k

∞∑ [ ∏ χAn,i(ωi) · ∏ µi(An,i) ] = 1

i>k

for all k ≥ 1. However, since ω ∈ Am for some m, it follows from our


previous observation for k = im that precisely one of these summands is 1
and the others are zero. This leads to a contradiction. Hence we have indeed
that ∑∞ n=1 µ(An) = 1.

3 Probability
3.1 Probability spaces and random variables
Definition 3.1. A probability space is a measure space (Ω,M, µ) with µ(Ω) =
1. In this context,

µ is called a probability measure,


Ω is called a sample space,
the elements of Ω are called outcomes,
the elements of M are called events,
the value µ(A) is called the probability of (the event) A. (To be even
more suggestive, one often writes P instead of µ.) The following easy
special case illustrates how the most elementary prob-

abilistic experiments, such as tossing a coin only finitely many times,


can be thought of in this framework. We omit the easy proof.
40

Proposition 3.2. Let Ω be a non-empty countable set and M = 2Ω . Let p : Ω


→ [0, 1] be any map with the property ∑ ω∈Ω p(ω) = 1. Then the assignment
2Ω ∋ A → P(A) = ∑ ω∈A p(ω) defines a probability measure. Conversely,
if P is a probability measure on Ω, then p(ω) = P({ω}) defines a map with ∑
ω∈Ω p(ω) = 1.

For more involved probabilistic questions, the sample space is not neces
sarily countable, which makes it somewhat more difficult to come up with the
right choices of events and probability measures. The following can be seen
as a model for tossing a coin infinitely often.

Example 3.3 (infinite coin tossing). For each toss, a coin can only come up as
heads or tails, which we conveniently denote as the outcomes 0 and 1. Since
we want to model tossing the coin infinitely often, the outcomes are
sequences having value 0 or 1. This gives rise to the sample space Ω = {0,
1}N .

A rather obvious example of an event is when the first k tosses are equal to
some k-tuple ω ∈ {0, 1}k . The associated subset of Ω is given as Aω = {ω}
× {0, 1}N>k . For example, A1 is the event where the first coin toss comes
up as tails, or A0,1,1 is the event where the first three tosses come up as
heads → tails → tails. Let M be the event σ-algebra generated by all such
sets. We note that a lot of other natural choices for events are automatically in
M. For example, the event that the n-th coin toss comes up as heads is given
by

{0, 1}n−1 × {0} × {0, 1}N>n =

⋃ A(ω,0).

ω∈{0,1}n−1

As for determining the probability, we want all of our coin tosses to be fair,
meaning that there is always an equal chance that either heads or tails comes
up. In particular, the coin tosses should all be independent from each other. If
µ : M → [0, 1] is supposed to be a probability measure modelling this
behavior, then we can agree on µ(A0) = µ(A1) = 2 1 . Inductively, we may
conclude that for all ω ∈ {0, 1}n , one should have µ(Aω) = 2−n . In this
case, if we view {0, 1} as a discrete space, we may give Ω the product
topology, in which case M is the Borel σ-algebra. If µ2 : {0, 1} → [0, 1] is
the measure that assigns the value 2 1 to both singletons, then µ is in fact the
infinite product measure µ = ⊗∞ n=1 µ2.

Definition 3.4. Let (Ω,M,P) be a probability space and W a topological


space. A W -valued random variable X is a measurable map X : Ω → W .
For a Borel subset B ⊆ W , we will frequently use the popular notation (X ∈
B) for the preimage X−1(B).

Remark 3.5. One sometimes also says “stochastic variable”. In the partic ular
case W = Rn , one calls it a vector random variable, and for W = R, a real
random variable. The case W = RN is referred to as a random sequence. In
the case of a real random variable X, we will also freely play around with
the above notation, for example (|X| ≤ 1) is written instead of (X ∈ [−1, 1]),
etc.

Remark 3.6. From our previous study on measurable maps we can observe
that real random variables are closed under addition and multiplication, and
pointwise limits. General random variables are closed under the same type
of operations under which measurable maps are closed.

Definition 3.7. Let (Ω,M,P) be a probability space and X a W -valued


random variable. The distribution of X is the Borel probability measure PX
on W given by PX(B) = P(X ∈ B). If X and Y are two W -valued random
variables (defined on possibly different probability spaces), we say that X
and Y are identically distributed if PX = PY .

Definition 3.8. Let X be a real random variable. Then its (cumulative)


distribution function FX : R → R is defined as FX(t) = PX((−∞, t]) = PX(X
≤ t).

Indeed, in the situation above, the measure PX on R is a Lebesgue– Stieltjes


measure, so we know from the previous chapter that FX is an in creasing and
right-continuous function.

Definition 3.9. Let X be a real random variable on a probability space


(Ω,M,P). We say that X has an expected value (or mean), if X is integrable as
a function X : Ω → R. In this case, its expected value (or mean) is defined
as

E(X) = ∫ Ω X dP.

Proposition 3.10. Let X be a real random variable. Then X has an expected


value if and only if id : R → R is PX-integrable. In this case, its expected
value is E(X) = ∫ R x dPX(x). In particular, the expected value only depends
on the distribution of X.

Proof. This is a direct consequence of Theorem 1.54 and Remark 1.55.

Proposition 3.11. Let W be a topological space and X a W -valued random


variable with distribution PX . Let f : W → R be a Borel measurable
function. Then f ◦ X = f(X) has an expected value if and only if f is PX-
integrable. In that case, E(f(X)) = ∫ W f dPX .

Notation 3.12. Certain expected values get a special name. Let X be a real
random variable.

i. Given k ≥ 0, we call E(|X|k) the k-th absolute moment. If it exists, then


E(Xk) is called the k-th moment.
ii. Suppose X has an expected value mX = E(X). Then Var(X) = E((X−
mX)2) is called the variance of X.
iii. More generally, suppose X and Y are two real random variables on the
same probability space, for which the second moments exist. Then it
follows from Hölder’s inequality that all of X, Y , XY have expected
values, and the value Cov(X, Y ) = E((X −mX)(Y −mY )) is called the
covariance of X and Y .

3.2 Independence
Definition 3.13. Let (Ω,M,P) be a probability space. Two events A, B ∈ M
are called independent, if P(A ∩ B) = P(A)P(B).

Example 3.14. In the context of seeing measure theory as a way to assign a


“volume” to certain sets, this notion has no clear meaning. The concept does
however become relevant if one adopts the probabilistic point of view. For
example, keep in mind Example 3.3 modelling the infinite coin tossing. In
that context, we may for example define A to be the event where both the first
and second toss comes up as heads, and B the event where both the third and
fourth toss comes up as tails. In other words,

A = {0} × {0} ⊗ {0, 1}N≥3 ,

B = {0, 1} × {0, 1} × {1} × {1} ⊗ {0, 1}N≥5 .

In our model we assume that the individual coin tosses are not supposed to
influence each other, and hence we should certainly view these events as
independent. Indeed, we have here

A ∩ B = {0} × {0} × {1} × {1} ⊗ {0, 1}N≥5 ,

so by the properties of the product measure µ we can see here that µ(A) = 4 1
, µ(B) = 4 1 , and µ(A ∩ B) = 16 1 . Similarly, all pairs of events defined by
the outcomes of coin tosses happening at distinct times will be independent
in this model.

Definition 3.15. Let (Ω,M,P) be a probability space. A family of events


{Ai}i∈I ⊆ M is called independent, if for all pairwise distinct indices i1, . .
. , in ∈ I, one has P(Ai1 ∩ Ai2 ∩ · · · ∩ Aii) = P(Ai1)P(Ai2) · · ·P(Ain ).

Definition 3.16. Let Ω be a non-empty set. For a sequence An ∈ 2Ω , we


define

lim sup An = ⋂ ⋃ Ak

n→∞

and lim inf An = ⋃ ⋂ Ak.

n→∞

n≥1 k≥n

n≥1 k≥n
Example 3.17. Let us motivate this again from the point of view of our
infinite coin tossing model. Let Hn be the event that where the n-th toss
comes up heads. Then the event lim supn→∞ Hn describes the situation
where heads comes up infinitely many times, and the event lim infn→∞ Hn
describes the situation where heads comes up in all but finitely many tosses.
For these reasons, it is not uncommon in a probabilistic context to use the
notation

lim sup An = (An, infinitely often) = (An, i.o.)

n→∞

lim inf An = (An, almost always) = (An, a.a.).

n→∞

and

Proposition 3.18. Let (Ω,M,P) be a probability space, and An


∈ M a sequence. Then
P(lim n→∞ inf An) ≤ lim n→∞ inf P(An) ≤ lim n→∞ sup P(An) ≤ P(lim
n→∞ sup An). Proof. See exercises.

Lemma 3.19 (Borel–Cantelli Lemma). Let (Ω,M,P) be a probability space,

and An ∈ M a sequence.

(i) Suppose ∞∑ P(An) < ∞. Then P(lim sup An) = 0.

n=1

n→∞

(ii) Suppose ∞∑ P(An) = ∞. If {An}n∈ N is an independent


family, then
n=1
P(lim sup An) = 1.

n→∞

Proof. We prove (i) in the exercise sessions, so we only need to prove (ii).

We will use the fact from the exercise sessions that the family of com
plements {Ac n}n∈N is also independent. Moreover, we are about to use the
well-known inequality 1 − x ≤ e−x for all x ∈ R. We observe for all n ≥ 1

that

P ( k=n ∞⋂ Ac k )

= m→∞ lim P ( k=n m⋂ Ac k ) = m→∞ lim m∏ P(Ac k)

k=n

= m→∞ lim m∏ (1 − P(Ak))

k=n

≤ m→∞ lim m∏ exp(−P(Ak))

k=n

= m→∞ lim exp ( − k=n m∑ P(Ak) ) = 0.

We conclude that ⋃ n≥1 ⋂ k≥n Ac k = lim infn→∞ Ac n is a null set. lim


supn→∞ An = (lim infn→∞ Ac n)c is co-null, which shows our claim.

Therefore

Example 3.20. Let us have yet another look at our infinite coin tossing model
and what we observed in Example 3.14. Let Hn denote the event where the n-
th coin toss comes up as heads. Then the family {Hn}n∈N is independent,
each event has probability 2 1 , and hence it follows from the above that the
event (Hn, infinitely often) has probability 1.

Definition 3.21. Let (Ω,M,P) be a probability space. A family of sub-σ


algebras Mi ⊆ M for i ∈ I is called independent, if every collection {Ai}i∈I
⊆ M with Ai ∈ Mi is independent.

Definition 3.22. Let (Ω,M,P) be a probability space. Let I be an index set,


and let Xi be a random variable on (Ω,M,P) with values in the topological
space Wi for i ∈ I. We call the family {Xi}i∈I independent, if for every
collection {Si}i∈I of Borel sets Si ⊆ Wi, we have that the family of events
{(Xi ∈ Si)}i∈I is independent. (In other words, if we denote by Bi the Borel
σ-algebra on Wi, this condition means that the family of pullback σ-algebras
{X∗ i (Bi)}i∈I is independent in the above sense.)

Theorem 3.23. Let {Pi}i∈I be a family of Borel probability measures on R.


Then there exists a probability space (Ω,M,P) with an independent family of
real random variables {Xi}i∈I such that Pi = PXi.

Proof. We set (Ω,M,P) = ⊗ i∈I(R,B,Pi) and define Xi : Ω → R to be the


projection onto the i-th component. It is an easy exercise to see that this
yields an independent family with the desired property.

Since the proof of the following is very easy, we leave it as an exercise.

Proposition 3.24. Let (Ω,M,P) be a probability space. Let I be an index set,


and let Xi be a random variable on (Ω,M,P) with values in the topological
space Wi for i ∈ I. For each i ∈ I, let Vi be another topological space and fi :
Wi → Vi a Borel measurable map. If the family of random variables {Xi}i∈I
is independent, then so is {fi(Xi)}i∈I .

Proposition 3.25. Let X and Y be two mutually independent, real random


variables defined on a probability space (Ω,M,P). If both X and Y have a
mean, then so does XY , and in fact E(XY ) = E(X)E(Y ).

Proof. Consider the pullback σ-algebras MX = {(X ∈ S) | S ⊆ R Borel} ,

MY = {(Y ∈ S) | S ⊆ R Borel} ,
which are both sub-σ-algebras of M. Given any A ∈ MX and B ∈ MY , it
follows by independence that

∫ Ω χAχB dP = P(A ∩ B) = P(A)P(B) = ∫ Ω χA dP · ∫ Ω χB dP.

By linearity, it follows that for any MX-measurable simple function s : Ω →


[0, ∞) and MY -measurable simple function t : Ω → [0, ∞), one has the
equation

∫ Ω st dP = ∫ Ω s dP · ∫ Ω t dP.

By the Monotone Convergence Theorem, this even holds when s and t are not
assumed to be simple. Applying this to s = |X| and t = |Y | yields E(|XY |) =
E(|X|)E(|Y |) < ∞, so indeed XY has a mean. Furthermore if we decompose X
= X+ − X− and Y = Y + − Y − , then

E(XY )

= ∫ Ω XY dP = ∫ Ω(X+ − X−)(Y + − Y −) dP = ∫ Ω X+Y + + X−Y − − X−Y +


− X+Y − dP

= E(X+)E(Y +) + E(X−)E(Y −) − E(X−)E(Y +) − E(X+)E(Y −) = (E(X+) −


E(X−))(E(Y +) − E(Y −)) = E(X)E(Y ).

3.3 Law of Large Numbers


Before we come to the main point of this subsection, we need to cover some
observations that are useful in computations.

Theorem 3.26 (Chebyshev’s inequality). Let X be a real random variable on


the probability space (Ω,M,P). Suppose that f : R → R≥0 is an even
measurable function whose restriction to R≥0 is increasing. Let a ≥ 0 be any
number with f(a) > 0. Then

P(|X| ≥ a) ≤ f(a)E(f(X)). 1 Proof. We simply observe:


E(f(X)) =

∫ Ω f ◦ X dP

≥ ∫ Ω (f ◦ X)χ|X|≥a dP ≥ f(a) ∫ Ω χ|X|≥a dP

= f(a)P(|X| ≥ a).

Corollary 3.27. Let X be a real random variable on the probability space

(Ω,M,P).

i. For all a, p > 0, one has P(|X| ≥ a) ≤ apE(|X|p). 1


ii. If X has an expected value, then for all a > 0, one has

P(|X − mX | ≥ a) ≤ a2 1 Var(X).

Proof. The first part follows from the general Chebyshev inequality for f(x) =
|x|p. The second part follows when applying it to X −mX as the real random
variable and the function f(x) = x2 .

Definition 3.28. Let Xn be a sequence of real random variables defined on a


probability space (Ω,M,P). Given a real random variable X on the same
probability space, we say that Xn converges to X in probability, if for all ε >
0, one has P(|X − Xn| > ε) n→∞ −→ 0. One writes Xn → p X.

Theorem 3.29 (Weak Law of Large Numbers). Let Xn be a sequence of


identically distributed real random variables defined on a probability space
(Ω,M,P) that are pairwise independent. Suppose that every (or any) Xn has a
mean m ∈ R. Then it follows that n 1 ∑n k=1 Xk → p m.

Proof. Since they are identically distributed, it follows that both the mean and
the variance of Xn are the same for every n ≥ 1. We may assume without loss
of generality m = 0. Let us first assume that the variance of Xn is finite and
equal to σ2 for all n ≥ 1. For every n ≥ 1, denote ¯Xn = n 1 ∑n k=1 Xk. We
have that ¯Xn also has mean 0, and hence
Var( ¯Xn) = E( ¯X2 n) = n2E 1 ( j,k=1 n∑ XjXk ) = nσ2 n2 = σ2 n . Here we
used Proposition 3.25 for j = k. By applying Corollary 3.27(ii), we may
hence see P(| ¯Xn| ≥ ε) ≤ E( ε2 ¯X2 n) = nε2 σ2 n→∞ −→ 0 for all ε > 0.
Now for the general case, we allow the possibility that Xn has infinite
variance. Let us still assume m = 0. For all n, N ≥ 1, write Xn = X≤N n +
X>N n , where X≤N n = Xn · χ(|Xn|≤N). Then by the Monotone Convergence
Theorem aN := E(|X>N n |) = E(|Xn|) − E(|X≤N n |) N→∞ −→ 0, where we
are using that these values do not depend on n as the Xn were identically
distributed. Let us also denote

¯X≤N n = n 1 j=1 n∑ X≤N j

and ¯X>N n = n 1 j=1 n∑ X>N j .

We have in particular for any ε > 0 that

P(| ¯X>N n | ≥ ε) ≤ E(| ¯X>N ε n |) ≤ aN ε .

So, given any δ > 0 with δ < 1 and any N ≥ 1 large enough such that aN ≤ 2δε
1 for all n ≥ 1, it follows that P(| ¯X>N n | ≥ ε/2) ≤ δ. Hence

P(| ¯Xn| ≥ ε)

≤ P (| ¯X>N n | + | ¯X≤N n | ≥ ε ) ≤ δ + P (| ¯X>N n | + | ¯X≤N n | ≥ ε ∧ |


¯X>N n | ≤ ε/2 )

≤ δ + P (| ¯X≤N n | ≥ ε/2 )

≤ δ + P (| ¯X≤N n − (E( ¯X≤N n ) + E( ¯X>N n ))| ≥ ε/2 ) ≤ δ + P (| ¯X≤N n


− E( ¯X≤N n )| ≥ ε 2(1 − δ))

n→∞ −→ δ.

Here we have used that X≤N n has finite variance and the above falls into
our previous subcase. As δ > 0 was arbitrary, this verifies ¯Xn → p 0 in
general.

There is also a stronger law, i.e., a theorem with a stronger conclusion than
the weak law of large numbers. We will prove this strong law under an
additional assumption. 10

Theorem 3.30 (Strong Law of Large Numbers). Let Xn be an independent


sequence of identically distributed real random variables defined on a proba
bility space (Ω,M,P). Suppose that every (or any) Xn has a fourth moment,
meaning that E(X4 n) < ∞. If m = E(Xn) is the mean of all these random
variables, then it follows that P ( n 1 ∑n k=1 Xk → m ) = 1.

Proof. We first note that applying Hölder’s inequality twice yields E(|Xn|)4 ≤
E(X2 n)2 ≤ E(X4 n) =: M4, hence the assumptions on Xn ensure that m
exists.

As before, we assume without loss of generality m = 0 by replacing each Xn


by Xn − m, if necessary. Denote ¯Xn = n 1 ∑n k=1 Xk. Then

E( ¯X4 n) = n4 1 i,j,k,l=1 n∑ E(XiXjXkXl).

Considering the summand over the tuple (i, j, k, l), it follows from the inde
pendence of the sequence Xn and Proposition 3.25 that it is zero if there is
one entry in this tuple that is different from all the other three. In other words,
the summand can only be non-zero if all four entries agree, or if the tuple has
two different indices occuring exactly twice. Two different entries can
possibly occur in the pattern (k, k, l, l), (k, l, k, l) or (k, l, l, k), leading to
3n(n − 1) possible summands. Using Proposition 3.25 once again, the
expression hence simplifies to

E( ¯X4 n) = n4 1 ( k=1 n∑ E(X4 k) + 3 k,l=1 n∑ E(X2 k)E(X2 l ) )

k =l

By the very beginning of the proof, each summand in the bracket can be esti
mated above by M4, and there are a total of n + 3n(n − 1) ≤ 3n2 summands,
hence

E( ¯X4 n) ≤ 3n2M4 n4 = 3M4 n2 . If we apply Corollary 3.27 for p = 4, it


follows for every ε > 0 that

P(| ¯Xn| ≥ ε) ≤ n2ε4 3M4 .


10The conclusion of the theorem is actually true under the same assumptions
we had for the weak law. The proof of the general case is however quite
demanding, which is why we add some common extra assumptions allowing
for a much simpler proof. Anyone who is interested in the proof of the
general case is referred to, for example, “Theorem 22.1” in the book
“Probability and Measure” by Patrick Billingsley.

The right side is a summable sequence of non-negative numbers. Hence it


follows by the Borel–Cantelli Lemma that

P(| ¯Xn| ≥ ε, infinitely often) = 0. However, we note that by the definition of


convergence of sequences, we have ( ¯Xn → 0) = ⋃ (| ¯Xn| ≥ ε, infinitely
often),

ε>0 ε∈Q

and hence it really follows that ¯Xn → 0 occurs almost surely.

3.4 Central Limit Theorem

Definition 3.31. Given a real random variable X, one defines its character
istic function φX : R → C via φX(t) = E(eitX).

Theorem 3.32 (Lévy’s Inversion Theorem). Let X be a real random variable


with characteristic function φX and distribution PX .

Then one has for all a, b ∈ R with a < b that

T lim →∞ 2π 1 ∫ −T T e−ita it − e−itb φX(t) dt = 2 1 PX({a}) + PX ((a, b))


+ 2 1 PX({b}). Proof of Theorem 3.32. Note first that due to the l’Hôpital
rule, we have limt→0 e−ita −e−itb t = i(b−a), so the function in the integral
is to be understood as a continuous function in this sense. Using Fubini’s
theorem, we may write

∫ −T T e−ita it − e−itb ︸ φX(t) ︷︷ ︸


=E(eitX)

dt = ∫ R ( ∫ −T T eit(x−a) it − eit(x−b) dt ) dPX(x).

For the purpose of the proof, we define the function κ : R → R via κ(x) =

∫ 0 x sin t t

dt for x ≥ 0, and κ(x) = −κ(−x) when x < 0. We will appeal

(without proof) to a fact from calculus stating that limx→∞ κ(x) = π 2 . We


obviously have the identity dx d κ(x) = sin(x) x for all x ≥ 0. Using the fact
that cos is an even function, we hence observe that

∫ −T T eit(x−a) it − eit(x−b) dt

= ∫ −T T cos(t(x − a)) − cos(t(x − b)) + it i sin(t(x − a)) − i sin(t(x − b)) dt = ∫


−T T sin(t(x − a)) − t sin(t(x − b)) dt = 2κ(T (x − a)) − 2κ(T (x − b)). 50

In particular,∫ −T T e−ita it − e−itb φX(t) dt = 2 ∫ R κ(T (x − a)) − κ(T (x −


b)) dPX(x) In order to determine what happens within the integral when T →
∞, we note that due to the initial remark about κ we obtain

T lim →∞ κ(T (x − a)) − κ(T (x − b)) = π π/2 0

, x < a or x > b , a < x < b , x = a or x = b.

Since PX is a probability measure, it follows from the Dominated


Convergence Theorem that

T lim →∞ 2π 1 ∫ −T T e−ita it − e−itb φX(t) dt = ∫ R 2 1 χ{a,b} + χ(a,b) dPx,

which is precisely the claim.

We will use the following fact from real analysis without proof:11
Theorem 3.33. A monotone function f : R → R can have at most countably
many points of discontinuity.

Corollary 3.34. Two real random variables are identically distributed if and
only if their characteristic functions are equal.

Proof. The “only if” part is clear, so we show the “if” part. Let X and Y be
two real random variables. The distribution function FX(t) = PX((−∞, t]) is
increasing, and therefore by the above theorem, it is discontinuous in at most
countably many points. In analogy to Example 2.27, this means that we may
have PX({t}) = 0 for at most countably many t ∈ R. The same observation is
of course true for Y in place of X. Let W ⊆ R be the subset of all points t
with PX({t}) = 0 = PY ({t}). Then W is co-countable and in particular
dense. It is then an easy exercise to show that the semi-ring

AW = {(a, b] | a, b ∈ W, a < b} ⊆ 2R

generates the Borel σ-algebra. By Theorem 2.17, it follows that PX = PY


holds if and only if PX and PY agree on AW . If we assume φX = φY , then
this is a direct consequence of Theorem 3.32.

11For those interested, see for example “Theorem 4.30” in the book
“Principles of Math ematical Analysis” (third edition) by Walter Rudin.

Definition 3.35. A sequence Pn of Borel probability measures on R is said to


weakly converge to a Borel probability measure P, if one has

n→∞ lim ∫ R f dPn = ∫ R f dP

for every compactly supported continuous function f : R → R. We write

Pn → w P.

If Fn is the distribution function for Pn and F the distribution function

for P, we say that Fn converges to F weakly, written Fn → w F , if Pn → w P.


Definition 3.36. A sequence of real random variables Xn is said to converge
in distribution to a real random variable X, written Xn → d X, if PXn → w
PX . Remark. WARNING! Unlike the notions of convergence we considered
be fore, convergence of real random variables in distribution is not well-
behaved with respect to standard operations such as addition or
multiplication. That is, if Xn → d X and Yn → d Y holds, it is not
necessarily true that Xn + Yn → d X + Y or XnYn → d XY .

We also note that it is shown in the exercises that for a sequence Pn → w P,


the same kind of limit behavior follows for f being any bounded continuous
function on R, not necessarily compactly supported. We will subsequently
use this characterization of weak convergence without further mention.

Theorem 3.37. Let Fn, n ≥ 1, and F be distribution functions with respect to


Borel probability measures Pn and P. Let C(F ) ⊆ R be the set of all points
over which F is continuous. Then Fn → w F if and only if Fn(x) → F (x) for
all x ∈ C(F ).

Proof. Let us first assume Fn → w F holds. We shall first show the following
intermediate claim. If A ⊆ R is a closed interval, then lim supn→∞ Pn(A) ≤
P(A). Indeed, we may define a pointwise decreasing sequence of piecewise
linear functions fk : R → R with fk|A = 1 and limk→∞ fk(t) = 0 for all t /∈ A.
Then

lim n→∞ sup Pn(A) ≤ lim n→∞ sup ∫ R fk dPn = ∫ R fk dP,

k ≥ 1.

Here we used the fact that Pn → w P. Letting k → ∞ on the right side, we


obtain P(A) as the limit by the Dominated Convergence Theorem, so the
intermediate claim is proved.

Now let x ∈ C(F ). Then it follows from the intermediate claim that

lim sup Fn(x) = lim sup Pn((−∞, x]) ≤ P((−∞, x]) = F (x).

n→∞

n→∞

Since x is a continuous point of F , we also have P({x}) = 0 and hence


F (x) = P((−∞, x))

1 − P([x, ∞))

≤ 1 − lim sup Pn([x, ∞))

n→∞

= lim inf Pn((−∞, x))

n→∞

≤ lim inf Fn(x).

n→∞

Conversely, let us assume that Fn(x) → F (x) holds for all x ∈ C(F ). By
definition, we may easily observe Pn((a, b]) → P((a, b]) for all a, b ∈ C(F )
with a < b. Therefore, it follows that

∫ R f dPn → ∫ R f dP
holds whenever f belongs to the linear subspace spanned by all indicator
functions χ(a,b] for a, b ∈ C(F ). Since C(F ) is dense in R, it is easy to see
that the closure of this linear subspace with respect to the sup-norm ∥ · ∥∞
contains the space of continuous compactly supported functions. So (with a
standard ε/2-argument) we may conclude that the above limit behavior even
holds for f being any compactly supported continuous function.

Definition 3.38. A sequence of Borel probability measures (Pn)n on R is


called tight, if lim lim inf Pn([−R, R]) = 1.

R→∞ n→∞

Remark. In the exercise sessions, we will prove the following analytical


statement which is useful for the proof of the next theorem. Let Fn : R → [0,
1] be a sequence of increasing right-continuous functions satisfying the limit
formula limR→∞ Fn(R) − Fn(−R) = 1.

If the sequence Fn(x) is convergent for a dense set of numbers x ∈ R,


then there exists an increasing right-continuous function F : R → [0, 1]
such that Fn(x) → F (x) holds whenever F is continuous in x.
Suppose that a limit function F as above exists and that

R→∞ lim lim n→∞ inf Fn(R) − Fn(−R) = 1. Then F is the distribution
function for a Borel probability measure on R. Theorem 3.39 (Helly’s
Selection Theorem). For any tight sequence (Pn)n of Borel probability
measures on R, there exists a subsequence (Pnk)k and a Borel probability
measure P on R such that Pnk → w P.

Proof. Let Fn be the distribution function of Pn for every n ≥ 1. Then Pn being


tight translates to

R→∞ lim lim n→∞ inf Fn(R) − Fn(−R) = 1. Of course this tightness
criterion holds for every subsequence as well. By Theorem 3.37 and the
above remark, it suffices to show that there is some increasing sequence of
natural numbers nk such that Fnk converges pointwise on a dense set of real
numbers, for example on the rational numbers Q. Let N ∋ ℓ → qℓ be an
enumeration of Q. By Bolzano–Weierstrass, we know that there is some
increasing sequence of numbers n(1,k) such that Fn(1,k)(q1) converges as k
→ ∞. Applying Bolzano–Weierstrass again, we know that (n(1, k))k admits a
subsequence (n(2, k))k such that Fn(2,k)(q2) converges as k → ∞. Proceed
like this inductively, and find finer and finer subsequences (n(ℓ, k))k such
that Fn(ℓ,k)(qℓ) converges as k → ∞. Finally define nk = n(k, k). Then (nk)k
is a subsequence of (n(ℓ, k))k (up to the finitely many indices k ≤ ℓ) for every
ℓ ≥ 1, so indeed Fnk(qℓ) converges as k → ∞, for every ℓ ≥ 1. This finishes
the proof.

Lemma 3.40. Let X be a real random variable. Then we have for all R > 0
the estimate

P(|X| > 2R) ≤ R ∫ −1/R 1/R 1 − φX(t) dt.

Proof. Using Fubini’s theorem we compute



R ∫ −1/R 1/R 1 − φX(t) dt

2 − R ∫ −1/R 1/R φX(t) dt

= 2 − R ∫ −1/R 1/R ∫ R eitx dPX(x) dt

= 2 − R ∫ R ∫ −1/R 1/R cos(tx) dt dPX(x) = 2 − 2R ∫ R ∫ 0 1/R cos(tx) dt


dPX(x) = 2 − 2R ∫ R sin(x/R) x dPX(x) = 2E ( 1 − sin(X/R) X/R )

Note that we have sin(X/R) X/R then |x/R| > 2, in which case | sin(x/R)| ≤ 1 ≤
|x|/2R, hence 1− sin(x/R) x/R ≥ 2 1 . This leads to the inequality χ(|X|>2R) ≤
2(1 − sin(X/R) X/R ), and hence forming the expected value on both sides
yields the claim.

≤ 1. Moreover, if any x ∈ R satisfies |x| > 2R,

Theorem 3.41 (Lévy’s Continuity Theorem). Let Xn and X be real random


variables for n ≥ 1. Then Xn → d X holds if and only if φXn (t) → φX(t) for
all t ∈ R.

Proof. The direction “⇒” holds by definition of the characteristic functions


and by what it means to converge in distribution. For the converse, let us
assume φXn (t) → φX(t) for all t ∈ R. Let us first show that the sequence
PXn is tight. Indeed, using the above lemma we observe for R > 0 that

PXn ([−R, R])

= 1 − PXn (|Xn| > R)

n→∞ −→ 1 − 2 · R 4 ∫ −2/R 2/R 1 − φX(t) dt.

≥ 1 − R 2 ∫ −2/R 2/R 1 − φXn (t) dt

Here we also used the Dominated Convergence Theorem. Using the fact that
φX is a continuous function with φX(0) = 1, we see that the integral R 4 ∫
−2/R 2/R 1 − φX(t) dt goes to zero as R → ∞. Hence the above inequality
yields
R→∞ lim lim n→∞ inf PXn ([−R, R]) = 1,

or in other words the tightness of the sequence PXn .

In order to show Xn → d X we proceed by contradiction, and suppose that


Xn does not converge to X in distribution. This means that for some
compactly supported continuous function f : R → R, one has that ∫ R f dPXn
does not converge to ∫ R f dPX . Thus, after passing to a subsequence, we
may assume without loss of generality that

lim n→∞ inf ∣∣∣∣ ∫ R f dPXn − ∫ R f


dPX ∣∣∣∣ > 0.
By Theorem 3.39, we may again pass to a subsequence and assume without
loss of generality that Xn → d Y for some real random variable Y . By the
“⇒” part, we obtain φX(t) = limn→∞ φXn (t) = φY (t) for all t ∈ R. By
Corollary 3.34, it follows that PX = PY , but this leads to a contradiction to
what we assumed above. We conclude that Xn → d X must hold in the first
place.

Lemma 3.42. For all x ∈ R and n ≥ 0, we have the inequality

∣∣∣∣e ix − k=0 n∑ (ix)k k! ∣∣∣∣ ≤ min (2|x|n n! , (n |x|n+1 + 1)! ) . 55

Proof. We proceed by induction in n. We note that it suffices to consider the


case x ≥ 0 since we may obtain the claim for x < 0 by complex conjugation of
the involved terms. Let us first proceed with proving that 2|x|n n! is an upper
bound. For n = 0, this is equal to 2, and since eix has norm one, this follows
from the triangle inequality. Now assume that this inequality holds for a given
number n ≥ 0 and all x ≥ 0. We then compute

∣∣∣∣e ix − n+1∑ k=0 (ix)k k! ∣∣∣∣


= ∣∣∣∣ ∫ 0 x eit − k=0 n∑ (it)k k! dt
∣∣∣∣ ≤ ∫ 0 x ∣∣∣∣e it − k=0 n∑ (it)k k!
∣∣∣∣ dt
≤ ∫ 0 x 2tn n! dt = (n 2xn+1 + 1)! .

We can also proceed by induction to show the upper bound (n+1)! |x|n+1 .
Then one can first consider n = −1, in which case both the left and right side
is 1. One can then perform the induction step (n − 1) → n with a completely
analogous calculation as above, namely

∣∣∣∣e ix − k=0 n∑ (ix)k k! ∣∣∣∣


= ∣∣∣∣ ∫ 0 x eit − n−1∑ k=0 (it)k k! dt
∣∣∣∣ ≤ ∫ 0 x ∣∣∣∣e it − n−1∑ k=0 (it)k
k! ∣∣∣∣ dt
≤ ∫ 0 x n! tn dt = (n xn+1 + 1)! .

Lemma 3.43. Let X be a real random variable such that E(X2) = 1 and E(X)
= 0. Then one has the limit formula

t→∞ lim φX(t/√ n)n = e−t2/2 ,

t ∈ R.

Proof. As an application of Lemma 3.42, we obtain for all t ∈ R the estimate


∣∣∣eitX − ( 1 + itX − t2X2 2 )∣∣∣ ≤ min ( t2X2 , t3|X3| 6 ) = t2 min ( X2 , t|X3|
6).

In particular, since X has a second absolute moment we see that the random
variable on the left has a mean. Moreover the expected value of the random
variable X 0 t = min ( X2 , t|X3| 6 ) tends to zero as t → 0 as a consequence
of the Dominated Convergence Theorem. So by integrating and applying the
triangle inequality, we see

∣∣∣φX(t) − 1 + t2 2 ∣∣∣ ≤ E (∣∣∣eitX −


1 − itX + t2X2 2 ∣∣∣ ) ≤ t2E(X 0). t
If we substitute t → t/√ n, this leads to the observation that

xn = n(φX(t/√ n) − ( 1 − 2n t2 )) n→∞ −→ 0.

Using the fact from calculus that one has convergence

ex = n→∞ lim ( 1 + n x )n

uniformly over bounded intervals, we finally conclude that

φX(t/√ n)n

( 1 + (φX(t/√ n) − 1) )n

= ( 1 − t2 2 − n(φX(t/√ n n) − 1 + 2n t2 ) )n

= ( 1 − t2 2 − n xn )n

n→∞ e−t2/2 .

−→
Theorem 3.44 (Central Limit Theorem). Let Xn be an independent se quence
of identically distributed real random variables with mean E(Xn) = 0 and
variance E(X2 n) = 1. Consider the standard normal variable N given by

PN ((a, b]) = 1√ 2π ∫ a b e−t2/2 dt.

Then one has

1√ n k=1 n∑ Xk −→ d N .

Proof. Since the Xn are identically distributed, they all have the same charac
teristic function, which we shall denote by φX . Denote Yn = n−1/2 ∑n k=1
Xk, so that we wish to show Yn → d N . Using that the Xn are independent,
we observe for all t ∈ R that

φYn (t)

= E(e−itYn)

= E(e−i t√ n ∑n k=1 Xk)

= φ∑n k=1 Xk (t/√ n) = n∏ φXk(t/√ n)

k=1

= φX(t/√ n)n

57

Given that the standard normal variable has the characteristic function φN (t)
= e−t2/2 , the proof is complete by combining Lemma 3.43 and Theo rem
3.41.

Corollary 3.45. Let Xn be an independent sequence of identically distributed


real random variables with a finite mean E(Xn) = 0 and finite variance E(X2
n) = σ2 . Then one has


1√ n k=1 n∑ Xk −→ d N (0, σ),

where N (0, σ) is the real random variable given by the distribution P((a, b])
=

σ √ 2π

1 ∫ a b e − 2σ2 t2 dt.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy