Notes GSzabo
Notes GSzabo
Contents
1.1 σ-Algebras and Measurable Spaces
..............2..............4
69
. . . . . . . . . . . . 30 . . . . . . . . . . . . 33 . . . . . . . . . . . . 35 . . . . . . . . . . . . 38
3.2 Independence . . . . . .
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
3 Probability
3.1 Probability spaces and random variables
40
..
. . . . . . . . . . . . 40 . . . . . . . . . . . . . 43
. . . . . . . . . . . . . 46 . .
. . . . . . . . . . . . . . 50 . .
E ∩ F = (Ec ∪ F c)c ,
E \ F = (Ec ∪ F )c .
Example 1.3. The entire power set 2Ω of Ω is the largest possible σ-algebra
on Ω, whereas {∅, Ω} is the smallest one. We may also consider
M = {E ⊆ Ω | E or Ec is countable} ,
Proof. It is a trivial exercise to prove the first part of the statement. For the
second part, if S is given, consider the family { M ⊆ 2Ω | M is a σ-algebra
with S ⊆ M } . Since this family contains 2Ω , it is non-empty, and hence the
intersection of this family is the smallest possible σ-algebra that contains S.
T1. ∅, Ω ∈ T .
T2. For any family of sets O ⊆ T , one has ⋃ O ∈ T . T3. For O1, O2 ∈ T ,
one has O1 ∩ O2 ∈ T .
Then Ω has a
Proof. In light of the previous remark, the second part of the statement
follows automatically if we can show that the metric topology on Ω has a
countable base consisting of open balls. As we assumed Ω to be separable,
choose a countable dense set D ⊆ Ω, and consider
B = {B(x, r) | x ∈ D, 0 < r ∈ Q} .
This is a countable family of open balls, and we claim that it is a base for the
metric topology. Indeed, let O ⊆ Ω be an open set. For x ∈ D ∩ O, set Rx =
{0 < r ∈ Q | B(x, r) ⊆ O}. Then evidently
⋃ {B(x, r) | x ∈ D ∩ O, r ∈ Rx} ⊆ O.
n∈N
Singleton sets are also Borel since they are closed. To summarize, there are
in general many more Borel sets than open sets. Note that in light of the
previous remark, the Borel-σ-algebra on R is generated by all open intervals.
Proposition 1.11. Let (Ω1,M1) and (Ω2,M2) be two measurable spaces and f
: Ω1 → Ω2 a map. Suppose that M2 is the σ-algebra generated by a set S ⊆
2Ω2 .
f−1(O) ∈ M1.
Proof. The “only if” part is trivial, so let us consider the “if part”. Consider
the set
M = { O ⊆ Ω2 | f−1(O) ∈ M1 } .
Theorem 1.13. Let (Ω,M) be a measurable space and (Y, d) a metric space.
We equip Y with the Borel-σ-algebra associated to the metric topology. Sup
pose that a sequence of measurable functions fn : Ω → Y converges to a map
f : Ω → Y pointwise. Then f is measurable.
k∈N ︸
︷︷
=:Ck
Then each of the sets Ck is open. For every x ∈ Ω, we use fn(x) → f(x) and
observe
x ∈ f−1(C)
⇔ f(x) ∈ C ⇔ ∀ k ∈ N : f(x) ∈ Ck
fn(x)→f(x) ⇔
Notation 1.14. We will equip [0, ∞] := [0, ∞) ∪ {∞} with the topology of the
one point compactification, that is, we define a subset O ⊆ [0, ∞] to be open
when the following holds:
x + ∞ := ∞,
x · ∞ := ∞ (x > 0),
0 · ∞ := 0.
Then the addition map + : [0, ∞] × [0, ∞] → [0, ∞] is continuous, but this is
not true for the multiplication map. 2 We also extend the usual order relation
“≤” of numbers to [0, ∞] in the obvious way.
1.3 Measures on σ-algebras, Measure Spaces
Definition 1.15. Let (Ω,M) be a measurable space. A (positive) measure µ
on (Ω,M) is a map M → [0, ∞] satisfying:
The triple (Ω,M, µ) is called a measure space. If µ(Ω) < ∞, we call µ a finite
measure, and if more specifically µ(Ω) = 1, we call it a probability measure
and the triple (Ω,M, µ) a probability space. If there exists a sequence En ∈ M
with µ(En) < ∞ and Ω = ⋃ n∈N En, then we say that µ is σ-finite.
the series ∑ n∈N µ(En) is a series in [0, ∞], which we may define as
the supremum supN≥1 ∑N n=1 µ(En).
σ-additivity implies finite additivity: If E1, E2 ∈ M are two disjoint
sets, then
µ(E1 ∪ E2)
︷︷
=0
= µ(E1) + µ(E2).
1Keep in mind that we implicitly force everything to be commutative, so the
order of addition and multiplication does not matter here by default.
µ(E) = µ(E ∪ ∅ ∪ ∅ ∪ . . . )
= µ(E) + ∞ · µ(∅).
If µ(E) < ∞, then the above can only happen when µ(∅) = 0.
Example 1.17 (Counting measure). For any non-empty set Ω, we can con
sider the σ-algebra 2Ω and define a measure µ via
µ(E) = #E ∞
, E is finite , E is infinite.
δx(E) = 01
, x ∈ E , x /∈ E.
Proposition 1.20. Let (Ω,M, µ) be a measure space. Then µ has the fol
lowing extra properties:
µ ( n∈N ⋃ En )
= ∑ µ(Fn)
n∈N
= sup ∑ µ(Fk)
n∈N k≤n
n∈N
= sup µ(Fn)
n∈N
= µ ( n∈N ⋃ Fn )
= µ(E1) − µ ( n∈N ⋂ En )
This implies the claim. Remark 1.21. In condition (v) above, it is really
necessary to assume that at least one of the sets En has finite measure. For
example, we may consider Ω = N with the counting measure µ, and the sets
En = {k ∈ N | k ≥ n}. Then µ(En) = ∞ for all n, the sequence is decreasing,
and ⋂ n∈N En = ∅.
Proposition 1.22. Let (Ω,M) be a measurable space with two measures µ1,
µ2. For any numbers α1, α2 ≥ 0, the map
is a measure.
E → α1µ1(E) + α2µ2(E)
Notation 1.24. For brevity, we may also say “P holds a.e.” (a.e.=almost
everywhere) or “P (x) holds for (µ-)almost all x”.
χE : Ω → {0, 1} ,
χE(x) = 01
, x ∈ E , x /∈ E.
s=
∑ λ · χs−1(λ).
λ∈s(Ω)
Proof. The “only if” part is clear. Since we assumed the σ-algebra on Y to
contain all singleton sets, it follows in particular that s−1(y) ∈ M for all y ∈
Y.
For the “if” part, let A be a measurable subset of Y . Then s−1(A) = ⋃
λ∈s(Ω) s−1(A ∩ {λ}). By assumption this is a finite union of sets of the form
s−1(λ) for λ ∈ s(Ω), and hence if these are measurable, then so is s−1(A).
∫ Ω s dµ := λ∈s(Ω) ∑ λ · µ(s−1(λ)).
Proof. (i): If the union ⋃k j=1 Aj is not Ω, then we may set Ak+1 = Ω\(⋃k
j=1 Aj) and αk+1 = 0. Then we still have s = ∑k+1 j=1 αj ·χAj and the sets
A1, . . . , Ak+1 are pairwise disjoint. Clearly our claim boils down to the
same claim for this expression of s, so we may simply assume without loss of
generality that we had Ω = ⊔k j=1 Aj to begin with.
Due to this assumption, we can notice that every value λ ∈ s(Ω) has to equal
one of the coefficients αj, and conversely every such coefficient with Aj = ∅
is in s(Ω). Upon grouping all such indices together, we can observe
λ ∈ s(Ω).
∫
∫ Ω s dµ = λ∈s(Ω) ∑ λµ(s−1(λ)) = λ∈s(Ω) ∑ λ · k∑ j=1 =λ µ(Aj) = j=1 k∑
αjµ(Aj).
αj
(ii): We keep the earlier assumption that Ω = ⊔k j=1 Aj. We can write t = ∑ℓ
i=1 βiχBi for finitely many coefficients β1, . . . , βℓ ≥ 0 and sets B1, . . . , Bℓ
∈ M with Ω = ⊔ℓ i=1 Bi, for instance via the canonical form of t. Then we
can
write
s = k∑ αj · ℓ∑ χAj∩Bi ,
j=1
i=1
t = ℓ∑ βi · k∑ χAj∩Bi .
i=1 j=1
s + t = k∑ ℓ∑ (αj + βi)χAj∩Bi ,
j=1 i=1
where Ω = ⊔ℓ i=1 ⊔k j=1(Aj ∩ Bi). Using part (i) and the additivity of µ,
we obtain
∫ Ω (s + t) dµ
i=1 j=1
= k∑ αjµ(Aj) + ℓ∑ βiµ(Bi)
j=1
i=1
= ∫ Ω s dµ + ∫ Ω t dµ.
(iii): We keep in mind that every summand αjχAj can itself be understood as
a positive measurable step function whose integral is equal to αjµ(Aj) by
part (i). Since we have already proved in part (ii) that the integral
construction on such functions is additive, this yields the desired sum
expression.
s = k∑ αjχAj ,
j=1
t = ℓ∑ βiχBi
i=1
with Ω = ⊔k j=1 Aj = ⊔ℓ i=1 Bi, for example via their canonical forms.
Then we can also write
s = k∑ αj · ℓ∑ χAj∩Bi ,
j=1
t = ℓ∑ βi · k∑ χAj∩Bi .
i=1
i=1 j=1
ν ( n∈N ⋃ En )
µ ( A ∩ n∈N ⋃ En )
= µ ( n∈N ⋃ (A ∩ En) )
= ∑ µ(A ∩ En)
n∈N
= ∑ ν(En).
n∈N
Definition 1.32. Let (Ω,M, µ) be a measure space. For a measurable func tion
f : Ω → [0, ∞], we define its integral as ∫ Ω f dµ = sup {∫ Ω s dµ ∣∣∣∣ s is a
positive measurable step function with s ≤ f } . We call f integrable when ∫ Ω
f dµ < ∞.
Remark 1.33. We should first convince ourselves that the new definition of
the integral does not contradict the old one in the case where f is assumed to
be a step function. But indeed, s = f is the largest step function with
Definition 1.28.
Secondly, we remark that the above definition a priori makes sense even
when f is not assumed to be measurable. However, we will see in the
exercises that the resulting notion of integral will have some undesirable
properties when evaluated on non-measurable functions, for example not
being additive.
Proof. We already know that the “if” part is true. For the “only if” part,
assume that f is measurable. We claim that it suffices to consider the special
case f = id as a function [0, ∞] → [0, ∞]. Indeed, if we can realize id =
supn∈N tn for an increasing sequence of positive measurable step functions
tn, then f = id ◦f = n→∞ lim tn ◦ f = sup n∈N (tn ◦ f), where sn = tn ◦ f is an
increasing sequence of positive measurable step functions. But we can come
up with the following sequence tn, which is easily seen to do the trick:
k=1
supn∈N ∫ Ω sn dµ.
4In exactly one instance later, we will use an extra property that we arrange
in our proof. Namely, we can find (sn)n with |f(x) − sn(x)| ≤ 2−n for all x ∈ Ω
with f(x) ≤ n, and sn(x) = n whenever f(x) ≥ n.
Proposition 1.20(iv) that
c ∫ Ω s dµ cν(Ω)
Proof. Both
∫ Ω f + g dµ
sup n∈N ∫ Ω sn + tn dµ
For the converse, assume that f(x) = 0 for µ-almost all x ∈ Ω, which means
that the set E above is a null set. If s : Ω → [0, ∞) is a simple measurable
function with s ≤ f , then it means in particular that s−1(λ) ⊆ E for all λ = 0.
This immediately implies ∫ Ω s dµ = 0, but hence also ∫ Ω f dµ = 0 by
definition.
u = u+ − u− , v = v+ − v− , u+u− = 0, v+v− = 0,
Definition 1.40. Let (Ω,M, µ) be a measure space. We say that a mea surable
function f : Ω → C is integrable, if |f | is integrable in the sense of Definition
1.32. In this case we define the integral as
∫ Ω f dµ = ∫ Ω u+ dµ − ∫ Ω u− dµ + i ( ∫ Ω v+ dµ − ∫ Ω v− dµ ) ,
where we use the decomposition from Remark 1.39. The set of all such
functions is denoted L1(Ω,M, µ) or L1(µ).
Theorem 1.41. L1(Ω,M, µ) is a complex vector space with the usual oper
ations. Moreover the integral
L1(Ω,M, µ) → C,
is a linear map.
f → ∫ Ω f dµ
Proof. The fact that L1 is a vector space is left as an exercise. Let us proceed
to prove linearity of the integral.
∫ Ω h+ dµ + ∫ Ω f− dµ + ∫ Ω g− dµ = ∫ Ω h− dµ + ∫ Ω f+ dµ + ∫ Ω g+ dµ.
All of these summands are finite, and hence we can rearrange this equation to
∫ Ω h+ dµ − ∫ Ω h− dµ = ∫ Ω f+ dµ − ∫ Ω f− dµ + ∫ Ω g+ dµ − ∫ Ω g− dµ,
∫ ∫ ∫
and hence ∫ Ω h dµ = ∫ Ω f dµ + ∫ Ω g dµ. It is also clear from the definition
that ∫ Ω f + ig dµ = ∫ Ω f dµ + i ∫ Ω g dµ. These two equations imply together
that the integral is indeed additive.
= α1 ∫ Ω f dµ − α2 ∫ Ω g dµ + i ( α1 ∫ Ω g dµ + α2 ∫ Ω f dµ ) = α ( ∫ Ω f dµ + i
∫ Ω g dµ ) = α ∫ Ω f + ig dµ.
0 ≤ ∫ Ω f dµ = ∫ Ω u dµ ≤ ∫ Ω u+ dµ u+≤|f ≤ | ∫ Ω |f | dµ.
Proposition 1.43. Let (Ω,M, µ) be a measure space and f : Ω → C a
measurable function.
µ(E) 1 ∫ Ω fχE dµ ∈ F.
Proof. From Proposition 1.43 we get that indeed N ⊆ L1(Ω,M, µ), and the
second equivalence of the statement is clear.
Definition 1.46. Let (Ω,M, µ) be a measure space. Then in light of the above,
we define the quotient vector space
L1(Ω,M, µ) → C,
[f ] → ∫ Ω f dµ
is a well-defined linear map, which we will call the integral. N happens
precisely when |f | ∈ N , it makes sense to define |[f ]| = [|f |]. Consequently,
the following semi-norm ∥ · ∥1 on L1 becomes a norm (cf. Proposition 1.38)
on the L1 -space via
∥f∥1 = ∫ Ω |f | dµ,
The “≥” relation is on the other hand clear from the fact that the integral is
monotone. This finishes the proof.
5But this is justified by the fact that we will mostly form integrals, which do
not depend on the representative for such a coset.
∫ Ω (lim n→∞ inf fn) dµ ≤ lim n→∞ inf ∫ Ω fn dµ. Proof. Denote gk = infn≥k
fn, and recall that lim infn→∞ fn = supk≥1 gk. We thus see that lim infn→∞
fn is a measurable function. If n ≥ k, then evidently gk ≤ fn, so ∫ Ω gk dµ ≤ ∫ Ω
fn dµ. In particular, this is true for arbitrarily large n, so ∫ Ω gk dµ ≤ lim
infn→∞ ∫ Ω fn dµ. Using the Monotone Convergence Theorem, we thus see ∫
Ω lim n→∞ inf fn dµ = sup k≥1 ∫ Ω gk dµ ≤ lim n→∞ inf ∫ Ω fn dµ.
︸ ︷︷ ︸
≥0
Remark 1.51. The Dominated Convergence Theorem really only holds for
sequences of functions, and its analogous generalizations for more general
families of functions (such as nets) are false. A counterexample is discussed
in the exercise sessions.
Theorem 1.52. Let (Ω,M, µ) be a measure space. Suppose that a sequence fn
∈ L1(Ω,M, µ) satisfies the Cauchy criterion in the semi-norm ∥·∥1. Then
there exists a subsequence (fnk)k and a function f ∈ L1(Ω,M, µ) such that
k=1
k=1
x∈E
is well defined and measurable on E. We extend f to a measurable function
on Ω by defining it to be zero on the complement of E. We get by the triangle
inequality that for all k ≥ 1, the function fnk is dominated (on E) by the
integrable function |fn1|+g. Therefore it follows from the Dominated
Convergence Theorem that ∥f − fnk∥1 k→∞ −→ 0. Since the sequence (fn)n
was assumed to satisfy the Cauchy criterion in ∥ · ∥1 and we just showed that
a subsequence converges to f in this norm, it follows that also ∥f − fn∥ → 0. 6
This finishes the proof.
φ∗M1 := { E ⊆ Ω2 | φ−1(E) ∈ M1 }
is a σ-algebra and
φ∗µ1(E) = µ1(φ−1(E))
These are called the push-forward σ-algebra and the push forward measure
with respect to φ. Then with respect to this measure space structure on Ω2, φ
becomes a measurable map.
is a measure.
sense.
∫ Ω2 f dµ2 = ∫ Ω1 f ◦ φ dµ1.
In other words, the claim holds for functions of the form f = χE. By linearity
of the integral, the desired equation holds for all positive measurable step
functions in place of f . Now let f be as general as in the statement, and write
f = supn∈N sn for an increasing sequence of positive measurable step
functions sn : Ω2 → [0, ∞), using Proposition 1.35. Clearly we also have f ◦
φ = supn∈N sn ◦ φ. Then by Proposition 1.36, we see that
Remark 1.55. In the above theorem, we pushed forward the measure space
structure from Ω1 to get one on Ω2 which makes the statement of the theorem
true. It may of course happen that we have an a priori given measure space
(Ω2,M2, µ2) and a measurable map φ : Ω1 → Ω2 with the property that
µ1(φ−1(E)) = µ2(E) for all E ∈ M2. Convince yourself that the statement of
the theorem will still be true!
2 Carathéodory’s Construction of
Measures
2.1 Measures on Semi-Rings and Rings
Definition 2.1. Let Ω be a non-empty set. A semi-ring on Ω is a set of subsets
A ⊆ 2Ω such that
1. ∅ ∈ A.
2. If E, F ∈ A, then E ∩ F ∈ A.
3. If E, F ∈ A, then E \ F is a finite disjoint union of sets in A.
Example 2.2. The set A of all subsets of Ω with at most one element form a
semi-ring. More interestingly, if n ≥ 2, then the set of half-open cubes
1. ∅ ∈ R.
2. If E, F ∈ R, then E ∪ F ∈ R.
3. If E, F ∈ R, then E \ F ∈ R.
Lemma 2.5. Let A be a semi-ring over Ω. Then the smallest ring R over Ω
containing A is given as
E \ F ∈ R.
Now from this we also get E∪F ∈ R because one can write it as a disjoint
union E ∪ F = E ∪ (F \ E), and it is clear that R is closed under disjoint
unions of its elements.
i. µ(∅) = 0.
ii. If En ∈ A is a sequence of pairwise disjoint sets with ⋃ n≥1 En ∈ A,
then µ(⋃ n≥1 En) = ∑ n≥1 µ(En).
(σ-additivity)
Ej = Ej ∩ E = m⋃ (Ej ∩ Fℓ),
ℓ=1
j=1
j=1 ℓ=1
ℓ=1 j=1
ℓ=1
µ(E) = m∑ µ0(Aj)
j=1
∑∞ n=1 ν(En).
(σ-subadditivity)
7Keep in mind that by convention, inf ∅ := ∞. For certain sets E there may
not exist any sequence En with these properties.
Proof. One gets µ∗(∅) = 0 by choosing En = ∅. If E ⊆ F ⊆ Ω are two sub
sets, then evidently there are at least as many ways to cover E by sequences
in R as for F , which leads directly to µ∗(E) ≤ µ∗(F ).
n≥1
n=1 k=1
n=1
n=1
Since ε > 0 was arbitrary, this shows the claim. Definition 2.11. Let ν be an
outer measure on a set Ω. We say that a set E ⊆ Ω is ν-measurable, if for
every subset A ⊆ Ω, we have ν(A) = ν(A ∩ E) + ν(A \ E).
ν(A) = ν ( A ∩ (E ∪ F ) ∪ A \ (E ∪ F ))
ν(A) = ν ( A ∩ ( n≤m ⋃ En )) + ν ( A
\ ( n≤m ⋃ En ))
= n=1 m∑ ν(A ∩ En) + ν ( A \ ( n≤m ⋃ En ))
n=1
n=1
n=1
In particular we get the equality ν(A) = ν(A ∩ E) + ν(A \ E), which yields E
∈ M as A was arbitrary. Furthermore, if we insert A = E, then we also have
ν(E) = ∑∞ n=1 ν(En), which shows that ν is σ-additive on M. In particular it
is indeed a measure when restricted to M.
Definition 2.13. A measure space (Ω,M, µ) is called complete, if for all sets
E ⊆ F ⊆ Ω, one has that F ∈ M and µ(F ) = 0 implies E ∈ M.
Proof. Suppose E ⊆ F ∈ M are given with µ(F ) = 0. Then we observe for all
A ⊆ Ω that ν(A) ≤ ν(A ∩ E) + ν(A \ E) ≤ ν(A ∩ F ) + ν(A \ E) ≤ 0 + ν(A).
So we see that these are all equalities. Since A was arbitrary, this implies
that E ∈ M.
Proof. We see right away for all E ∈ R that µ∗(E) ≤ µ(E) since we can write
E = E ∪ ∅ ∪ ∅ ∪ . . . . On the other hand, if En ∈ R is any sequence of sets
with E ⊆ ⋃ n≥1 En, then it follows from σ-subadditivity and monotonicity of
µ that
By taking the infimum over all possible choices of such sequences, we arrive
at µ(E) = µ∗(E). Since E ∈ R was arbitrary, we have just shown µ∗|R = µ.
Now we need to show that every set E ∈ R is µ∗ -measurable. Let A ⊆ Ω be
any set. Since we always have µ∗(A) ≤ µ∗(A ∩ E) + µ∗(A \ E), we may
assume without loss of generality that µ∗(A) < ∞. Let An ∈ R be a sequence
of sets with A ⊆ ⋃ n≥1 An. Then
If we take the infimum over all possible such sequences An, then the right
side approaches the value µ∗(A), and hence we get the equality µ∗(A) =
µ∗(A ∩ E) + µ∗(A \ E). Since A ⊆ Ω and E ∈ R were arbitrary, this finishes
the proof.
But since this holds for any choice of (En)n, we obtain µ1(E) ≤ µ∗(E).
29
Proof. This is immediate from the definition of both R and µ, and is left as an
exercise.
µ∗(A) = µ∗(φ−1(A))
The reverse
n=1
n=1
Since M is arbitrary, this leads to ∑∞ n=1 µ0(En) ≤ µ0(E). Let ε > 0 with ε <
b − a. Then in particular
n≥1
n≥1
The right-hand side is an open covering of the compact set on the left side,
and hence there is some N ≥ 1 such that [a+ε, b] ⊂ ⋃N n=1(an, bn +2−nε).
We change the ordering of the intervals appearing in this union by the
following inductive procedure: Choose k1 ∈ {1, . . . , N} to be the index so
that
ak1 = max { aj | a + ε ∈ (aj, bj + 2−jε)} . If b < bk1 + 2−k1ε, then the
procedure stops here. Otherwise, choose k2 ∈ {1, . . . , N} to be the index so
that ak2 = max { aj | bk1 + 2−k1 ∈ (aj, bj + 2−jε)} . If b < bk2 +2−k2ε, then
the procedure stops here. Otherwise one continues in ductively until the
procedure stops after L ≤ N steps. This yields an injective map k : {1, . . . ,
L} → {1, . . . , N} such that [a+ε, b] ⊂ ⋃L n=1(akn , bkn +2−knε) and such
that for all n < L, we have akn+1 < bkn + 2−knε. From this we can
b−a−ε
n=1
n=1
n=1
≤ ε + N∑ bn − an n=1
≤ ε + b − a.
b − a ≤ 2ε + N∑ bn − an ≤ 2ε + ∞∑ µ0(En).
n=1
n=1
Since ε > 0 was arbitrary, this finally implies µ0(E) = ∑∞ n=1 µ0(En) and
finishes the proof.
n≥1
(c, b] = ⋂ (c, bn].
n≥1
Remark 2.23. WARNING! The σ-algebra of Lebesgue sets is indeed big ger
than the Borel σ-algebra. 9
Proof. From these properties it follows that if such a function F exists at all,
then it has to be given by the formula
, t > 0 , t = 0 , t < 0.
We claim that this function has indeed the right properties. Let a, b ∈ R with a
< b. We aim to show µ((a, b]) = F (b) − F (a). If a ≥ 0, then
9This is not so easy to see, but an example is discussed here, for whoever is
interested:
https://www.math3ma.com/blog/lebesgue-but-not-borel.
µ((a, b]) = µ((0, b] \ (0, a]) = µ((0, b]) − µ((0, a]) = F (b) − F (a). If b < 0, If
a < 0 ≤ b, then we have µ((a, b]) =
we can prove this analogously. µ((a, 0] ∪ (0, b]) = µ((a, 0]) + µ((0, b]) = F
(b) − F (a). The fact that F is increasing follows immediately from the fact
that µ has nonnegative values. The right-continuity follows from n→∞ lim F
(a + εn) = n→∞ lim µ((0, a + εn]) = µ((0, a]) = F (a) for any sequence εn > 0
with εn → 0. Here we used the continuity property of µ as a measure with
respect to countable decreasing intersections.
M∑ µ(En) n=1
n=1
= F (bM) − F (a1)
≤ F (b) − F (a)
= µ(E).
for every n ≥ 1 a small δn > 0 such that F (bn + δn) − F (bn) ≤ 2−nε. Then
[a + ε, b] ⊂ E = ∞⋃ En ⊂ ∞⋃ (an, bn + δn)
n=1
n=1
The rest of the claim follows from the Carathéodory construction, exactly as
in the proof of Theorem 2.22, see also Corollary 2.18.
Example 2.27. For the choice F = idR, we recover the Lebesgue measure. On
the other hand, if a > 0 is some chosen number and we set
F (t) = 01
, t < a , t ≥ a,
then one can show that we recover the Dirac measure µ = δa. For a ≤ 0 one
also recovers this measure with the function
F (t) = 0 −1
, t < a , t ≥ a.
Proposition 2.29. Let (Ωi,Mi, µi) be two measure spaces for i = 1, 2. Then
the set of measurable rectangles
Theorem 2.30. Let (Ωi,Mi, µi) be two measure spaces for i = 1, 2. Then the
measure µ0 on the measurable rectangles extends to a measure on the product
σ-algebra µ1 ⊗ µ2 : M1 ⊗ M2 → [0, ∞].
λ(d) = λ ⊗ λ ⊗ · · · ⊗ λ
︸
︸
︷︷
d times
with respect to the measure space (R,L, λ). The Lebesgue σ-algebra L(d) on
Rd is the one consisting of all λ(d) -measurable sets in the sense of Defi
nition 2.11, which contains the Borel σ-algebra. If the dimension d is clear
from context, we may sometimes slightly abuse notation and just write λ for
the Lebesgue measure on Rd .
Let t = (t1, . . . , td) = (t′ , td) ∈ Rd = Rd−1 ×R. Denote by µ0 the product
measure defined on the measurable rectangles. If E ∈ L(d−1) and F ∈ L are
measurable, then (E × F ) + t = (E + t′) × (F + td), and so µ0((E × F ) + t) =
λ(d−1)(E + t′)λ(F + td) = λ(d−1)(E)λ(F ) = µ0(E ×F ). It now follows
directly from Proposition 2.19 and Proposition 2.20 that λ(d)(A + t) = λ(d)
(A) for all A ∈ L(d) , concluding the proof.
Theorem 2.33 (Tonelli; see exercises). Let (Ωi,Mi, µi) be two σ-finite mea
sure spaces for i = 1, 2. Let f : Ω1 × Ω2 → [0, ∞] be a M1 ⊗ M2-
measurable function. Denote fx = f(x, _) : Ω2 → [0, ∞] and f y = f(_, y) : Ω1
→ [0, ∞].
Then:
∫
c. The function Ω1 → [0, ∞] given by x → ∫ Ω2 fx dµ2 is M1-measurable.
d. The function Ω2 → [0, ∞] given by y → ∫ Ω1 f y dµ1 is M2-
measurable.
e. One has the equalities ∫ Ω1×Ω2 f d(µ1⊗µ2) = ∫ Ω1 ( ∫ Ω2 fx dµ2 )
dµ1(x) = ∫ Ω2 ( ∫ Ω1 f y dµ1 ) dµ2(y).
Theorem 2.34 (Fubini). Let (Ωi,Mi, µi) be two σ-finite measure spaces for i
= 1, 2. Let f : Ω1 × Ω2 → C be a M1 ⊗ M2-measurable function. Denote fx
= f(x, _) : Ω2 → C and f y = f(_, y) : Ω1 → C. Then the following are
equivalent:
If any (or every) one of these statements holds, then we have that
Proof. It follows from Tonelli’s theorem that the integrals appearing in (A),
(B), and (C) are always the same, so their finiteness are indeed equivalent.
Now let us assume that these statements are true. By splitting f up into its real
and imaginary parts, let us assume without loss of generality that f is a real
function.
By part (c) in Tonelli’s theorem and Proposition 1.38, it follows that the set
E of all x ∈ Ω1, for which fx is integrable, is in M1 and its complement is a
null set. Let f = f+ − f− be the canonical decomposition into positive
functions as in Remark 1.39. Then it is very easy to see that (f+)x = (fx)+ and
(f−)x = (fx)− . So f+ x and f− x are both integrable whenever x ∈ E.
Moreover the map
The remaining equality follows exactly in the same way by exchanging the
roles of Ω1 and Ω2.
Proof. Clearly any measure must satisfy this property, so the “only if” part is
clear. For the “if” part, we may first take A1 = Ω and An = ∅ for all n ≥ 2,
which immediately implies µ(∅) = 0. We need to show that µ is σ additive.
Let Bn ∈ A be a sequence of pairwise disjoint sets such that B = ⋃ n≥1 Bn ∈
A. As A is a semi-ring, we have Ω \ B = A1 ∪ · · · ∪ Aℓ for pairwise
disjoint sets A1, . . . , Aℓ ∈ A. Then the assumption implies on the one hand
that 1 = µ(B) + ∑ℓ n=1 µ(An). On the other hand, if we set Aℓ+k = Bk for k
≥ 1, then the sequence (An)n≥1 defines a pairwise disjount covering of Ω, so
1 = ∑∞ n=1 µ(An). Comparing these two equations, we see that µ(B) = ∑
n>ℓ µ(An) = ∑∞ n=1 µ(Bn), which shows our claim.
Definition 2.36. Let I be a non-empty index set and (Ωi,Mi, µi) a proba bility
space for every i ∈ I. We denote Ω = ∏ i∈I Ωi, and set
Theorem 2.37. Adopt the notation from the above definition. Then A is a
semi-ring and µ is a measure on A. In particular, it extends uniquely to a
probability measure µ on ⊗ i∈I Mi. One also denotes µ = ⊗ i∈I µi.
Proof. The “in particular” part is due to Corollary 2.18. By definition, every
set in A is a product of subsets of the spaces Ωi which are proper subsets
only over finitely many indices. In particular, given any sequence An, there
are countably many indices in I so that over every other index i ∈ I, the
projection of every set An to the i-th coordinate yields Ωi. Considering the
semi-ring axioms for A and the axioms of being a measure for µ, we can see
that it is enough to consider the case where I is countable. As we already
know that the claim is true if I is finite, we may from now on assume I = N.
The fact that A is a semi-ring now follows directly from the exercises. We
hence need to show that µ is a measure on A, where our goal is to appeal to
the condition in the above lemma. Suppose that An ∈ A is a sequence of
pairwise disjoint sets with Ω = ⋃ n≥1 An.
For each n ≥ 1 let us write An = ∏∞ i=1 An,i and pick in ≥ 1 such that An,i =
Ωi whenever i > in. For all m, n ∈ N and ω = (ωi)i ∈ Am we claim to have
the equation
i≤im
i>im
Indeed, if n = m, then all the involved factors are equal to one, so the
equation holds. Assume n = m. We observe χAk((ωi)i) = ∏ i≤ik χAk,i(ωi).
As 1 = ∑∞ k=1 χAk , we conclude for (ωi)i ∈ Am that 0 = ∏ i≤in χAn,i(ωi).
So either in ≤ im, in which case the desired inequality above follows
immediately. Or, if in > im, then every tuple of the form (ω1, . . . , ωim ,
αim+1, . . . ) is also in Am, so analogously
0 = ∏ χAn,i(ωi) ·
i≤im
in∏ χAn,i(αi).
i=im+1
have
∞∑ µ(An)
n=1
n=1 i≤k
∞∑ [ ∏ χAn,i(ωi) · ∏ µi(An,i) ] = 1
i>k
3 Probability
3.1 Probability spaces and random variables
Definition 3.1. A probability space is a measure space (Ω,M, µ) with µ(Ω) =
1. In this context,
For more involved probabilistic questions, the sample space is not neces
sarily countable, which makes it somewhat more difficult to come up with the
right choices of events and probability measures. The following can be seen
as a model for tossing a coin infinitely often.
Example 3.3 (infinite coin tossing). For each toss, a coin can only come up as
heads or tails, which we conveniently denote as the outcomes 0 and 1. Since
we want to model tossing the coin infinitely often, the outcomes are
sequences having value 0 or 1. This gives rise to the sample space Ω = {0,
1}N .
A rather obvious example of an event is when the first k tosses are equal to
some k-tuple ω ∈ {0, 1}k . The associated subset of Ω is given as Aω = {ω}
× {0, 1}N>k . For example, A1 is the event where the first coin toss comes
up as tails, or A0,1,1 is the event where the first three tosses come up as
heads → tails → tails. Let M be the event σ-algebra generated by all such
sets. We note that a lot of other natural choices for events are automatically in
M. For example, the event that the n-th coin toss comes up as heads is given
by
⋃ A(ω,0).
ω∈{0,1}n−1
As for determining the probability, we want all of our coin tosses to be fair,
meaning that there is always an equal chance that either heads or tails comes
up. In particular, the coin tosses should all be independent from each other. If
µ : M → [0, 1] is supposed to be a probability measure modelling this
behavior, then we can agree on µ(A0) = µ(A1) = 2 1 . Inductively, we may
conclude that for all ω ∈ {0, 1}n , one should have µ(Aω) = 2−n . In this
case, if we view {0, 1} as a discrete space, we may give Ω the product
topology, in which case M is the Borel σ-algebra. If µ2 : {0, 1} → [0, 1] is
the measure that assigns the value 2 1 to both singletons, then µ is in fact the
infinite product measure µ = ⊗∞ n=1 µ2.
Remark 3.5. One sometimes also says “stochastic variable”. In the partic ular
case W = Rn , one calls it a vector random variable, and for W = R, a real
random variable. The case W = RN is referred to as a random sequence. In
the case of a real random variable X, we will also freely play around with
the above notation, for example (|X| ≤ 1) is written instead of (X ∈ [−1, 1]),
etc.
Remark 3.6. From our previous study on measurable maps we can observe
that real random variables are closed under addition and multiplication, and
pointwise limits. General random variables are closed under the same type
of operations under which measurable maps are closed.
E(X) = ∫ Ω X dP.
Notation 3.12. Certain expected values get a special name. Let X be a real
random variable.
3.2 Independence
Definition 3.13. Let (Ω,M,P) be a probability space. Two events A, B ∈ M
are called independent, if P(A ∩ B) = P(A)P(B).
In our model we assume that the individual coin tosses are not supposed to
influence each other, and hence we should certainly view these events as
independent. Indeed, we have here
so by the properties of the product measure µ we can see here that µ(A) = 4 1
, µ(B) = 4 1 , and µ(A ∩ B) = 16 1 . Similarly, all pairs of events defined by
the outcomes of coin tosses happening at distinct times will be independent
in this model.
lim sup An = ⋂ ⋃ Ak
n→∞
n→∞
n≥1 k≥n
n≥1 k≥n
Example 3.17. Let us motivate this again from the point of view of our
infinite coin tossing model. Let Hn be the event that where the n-th toss
comes up heads. Then the event lim supn→∞ Hn describes the situation
where heads comes up infinitely many times, and the event lim infn→∞ Hn
describes the situation where heads comes up in all but finitely many tosses.
For these reasons, it is not uncommon in a probabilistic context to use the
notation
n→∞
n→∞
and
and An ∈ M a sequence.
n=1
n→∞
n→∞
Proof. We prove (i) in the exercise sessions, so we only need to prove (ii).
We will use the fact from the exercise sessions that the family of com
plements {Ac n}n∈N is also independent. Moreover, we are about to use the
well-known inequality 1 − x ≤ e−x for all x ∈ R. We observe for all n ≥ 1
that
P ( k=n ∞⋂ Ac k )
k=n
k=n
k=n
Therefore
Example 3.20. Let us have yet another look at our infinite coin tossing model
and what we observed in Example 3.14. Let Hn denote the event where the n-
th coin toss comes up as heads. Then the family {Hn}n∈N is independent,
each event has probability 2 1 , and hence it follows from the above that the
event (Hn, infinitely often) has probability 1.
MY = {(Y ∈ S) | S ⊆ R Borel} ,
which are both sub-σ-algebras of M. Given any A ∈ MX and B ∈ MY , it
follows by independence that
∫ Ω st dP = ∫ Ω s dP · ∫ Ω t dP.
By the Monotone Convergence Theorem, this even holds when s and t are not
assumed to be simple. Applying this to s = |X| and t = |Y | yields E(|XY |) =
E(|X|)E(|Y |) < ∞, so indeed XY has a mean. Furthermore if we decompose X
= X+ − X− and Y = Y + − Y − , then
E(XY )
∫ Ω f ◦ X dP
= f(a)P(|X| ≥ a).
(Ω,M,P).
P(|X − mX | ≥ a) ≤ a2 1 Var(X).
Proof. The first part follows from the general Chebyshev inequality for f(x) =
|x|p. The second part follows when applying it to X −mX as the real random
variable and the function f(x) = x2 .
Proof. Since they are identically distributed, it follows that both the mean and
the variance of Xn are the same for every n ≥ 1. We may assume without loss
of generality m = 0. Let us first assume that the variance of Xn is finite and
equal to σ2 for all n ≥ 1. For every n ≥ 1, denote ¯Xn = n 1 ∑n k=1 Xk. We
have that ¯Xn also has mean 0, and hence
Var( ¯Xn) = E( ¯X2 n) = n2E 1 ( j,k=1 n∑ XjXk ) = nσ2 n2 = σ2 n . Here we
used Proposition 3.25 for j = k. By applying Corollary 3.27(ii), we may
hence see P(| ¯Xn| ≥ ε) ≤ E( ε2 ¯X2 n) = nε2 σ2 n→∞ −→ 0 for all ε > 0.
Now for the general case, we allow the possibility that Xn has infinite
variance. Let us still assume m = 0. For all n, N ≥ 1, write Xn = X≤N n +
X>N n , where X≤N n = Xn · χ(|Xn|≤N). Then by the Monotone Convergence
Theorem aN := E(|X>N n |) = E(|Xn|) − E(|X≤N n |) N→∞ −→ 0, where we
are using that these values do not depend on n as the Xn were identically
distributed. Let us also denote
So, given any δ > 0 with δ < 1 and any N ≥ 1 large enough such that aN ≤ 2δε
1 for all n ≥ 1, it follows that P(| ¯X>N n | ≥ ε/2) ≤ δ. Hence
P(| ¯Xn| ≥ ε)
≤ δ + P (| ¯X≤N n | ≥ ε/2 )
n→∞ −→ δ.
Here we have used that X≤N n has finite variance and the above falls into
our previous subcase. As δ > 0 was arbitrary, this verifies ¯Xn → p 0 in
general.
There is also a stronger law, i.e., a theorem with a stronger conclusion than
the weak law of large numbers. We will prove this strong law under an
additional assumption. 10
Proof. We first note that applying Hölder’s inequality twice yields E(|Xn|)4 ≤
E(X2 n)2 ≤ E(X4 n) =: M4, hence the assumptions on Xn ensure that m
exists.
Considering the summand over the tuple (i, j, k, l), it follows from the inde
pendence of the sequence Xn and Proposition 3.25 that it is zero if there is
one entry in this tuple that is different from all the other three. In other words,
the summand can only be non-zero if all four entries agree, or if the tuple has
two different indices occuring exactly twice. Two different entries can
possibly occur in the pattern (k, k, l, l), (k, l, k, l) or (k, l, l, k), leading to
3n(n − 1) possible summands. Using Proposition 3.25 once again, the
expression hence simplifies to
k =l
By the very beginning of the proof, each summand in the bracket can be esti
mated above by M4, and there are a total of n + 3n(n − 1) ≤ 3n2 summands,
hence
ε>0 ε∈Q
Definition 3.31. Given a real random variable X, one defines its character
istic function φX : R → C via φX(t) = E(eitX).
For the purpose of the proof, we define the function κ : R → R via κ(x) =
∫ 0 x sin t t
∫ −T T eit(x−a) it − eit(x−b) dt
We will use the following fact from real analysis without proof:11
Theorem 3.33. A monotone function f : R → R can have at most countably
many points of discontinuity.
Corollary 3.34. Two real random variables are identically distributed if and
only if their characteristic functions are equal.
Proof. The “only if” part is clear, so we show the “if” part. Let X and Y be
two real random variables. The distribution function FX(t) = PX((−∞, t]) is
increasing, and therefore by the above theorem, it is discontinuous in at most
countably many points. In analogy to Example 2.27, this means that we may
have PX({t}) = 0 for at most countably many t ∈ R. The same observation is
of course true for Y in place of X. Let W ⊆ R be the subset of all points t
with PX({t}) = 0 = PY ({t}). Then W is co-countable and in particular
dense. It is then an easy exercise to show that the semi-ring
AW = {(a, b] | a, b ∈ W, a < b} ⊆ 2R
11For those interested, see for example “Theorem 4.30” in the book
“Principles of Math ematical Analysis” (third edition) by Walter Rudin.
Pn → w P.
Proof. Let us first assume Fn → w F holds. We shall first show the following
intermediate claim. If A ⊆ R is a closed interval, then lim supn→∞ Pn(A) ≤
P(A). Indeed, we may define a pointwise decreasing sequence of piecewise
linear functions fk : R → R with fk|A = 1 and limk→∞ fk(t) = 0 for all t /∈ A.
Then
k ≥ 1.
Now let x ∈ C(F ). Then it follows from the intermediate claim that
lim sup Fn(x) = lim sup Pn((−∞, x]) ≤ P((−∞, x]) = F (x).
n→∞
n→∞
1 − P([x, ∞))
n→∞
n→∞
n→∞
Conversely, let us assume that Fn(x) → F (x) holds for all x ∈ C(F ). By
definition, we may easily observe Pn((a, b]) → P((a, b]) for all a, b ∈ C(F )
with a < b. Therefore, it follows that
∫ R f dPn → ∫ R f dP
holds whenever f belongs to the linear subspace spanned by all indicator
functions χ(a,b] for a, b ∈ C(F ). Since C(F ) is dense in R, it is easy to see
that the closure of this linear subspace with respect to the sup-norm ∥ · ∥∞
contains the space of continuous compactly supported functions. So (with a
standard ε/2-argument) we may conclude that the above limit behavior even
holds for f being any compactly supported continuous function.
R→∞ n→∞
R→∞ lim lim n→∞ inf Fn(R) − Fn(−R) = 1. Then F is the distribution
function for a Borel probability measure on R. Theorem 3.39 (Helly’s
Selection Theorem). For any tight sequence (Pn)n of Borel probability
measures on R, there exists a subsequence (Pnk)k and a Borel probability
measure P on R such that Pnk → w P.
R→∞ lim lim n→∞ inf Fn(R) − Fn(−R) = 1. Of course this tightness
criterion holds for every subsequence as well. By Theorem 3.37 and the
above remark, it suffices to show that there is some increasing sequence of
natural numbers nk such that Fnk converges pointwise on a dense set of real
numbers, for example on the rational numbers Q. Let N ∋ ℓ → qℓ be an
enumeration of Q. By Bolzano–Weierstrass, we know that there is some
increasing sequence of numbers n(1,k) such that Fn(1,k)(q1) converges as k
→ ∞. Applying Bolzano–Weierstrass again, we know that (n(1, k))k admits a
subsequence (n(2, k))k such that Fn(2,k)(q2) converges as k → ∞. Proceed
like this inductively, and find finer and finer subsequences (n(ℓ, k))k such
that Fn(ℓ,k)(qℓ) converges as k → ∞. Finally define nk = n(k, k). Then (nk)k
is a subsequence of (n(ℓ, k))k (up to the finitely many indices k ≤ ℓ) for every
ℓ ≥ 1, so indeed Fnk(qℓ) converges as k → ∞, for every ℓ ≥ 1. This finishes
the proof.
Lemma 3.40. Let X be a real random variable. Then we have for all R > 0
the estimate
Note that we have sin(X/R) X/R then |x/R| > 2, in which case | sin(x/R)| ≤ 1 ≤
|x|/2R, hence 1− sin(x/R) x/R ≥ 2 1 . This leads to the inequality χ(|X|>2R) ≤
2(1 − sin(X/R) X/R ), and hence forming the expected value on both sides
yields the claim.
Here we also used the Dominated Convergence Theorem. Using the fact that
φX is a continuous function with φX(0) = 1, we see that the integral R 4 ∫
−2/R 2/R 1 − φX(t) dt goes to zero as R → ∞. Hence the above inequality
yields
R→∞ lim lim n→∞ inf PXn ([−R, R]) = 1,
We can also proceed by induction to show the upper bound (n+1)! |x|n+1 .
Then one can first consider n = −1, in which case both the left and right side
is 1. One can then perform the induction step (n − 1) → n with a completely
analogous calculation as above, namely
Lemma 3.43. Let X be a real random variable such that E(X2) = 1 and E(X)
= 0. Then one has the limit formula
t ∈ R.
In particular, since X has a second absolute moment we see that the random
variable on the left has a mean. Moreover the expected value of the random
variable X 0 t = min ( X2 , t|X3| 6 ) tends to zero as t → 0 as a consequence
of the Dominated Convergence Theorem. So by integrating and applying the
triangle inequality, we see
xn = n(φX(t/√ n) − ( 1 − 2n t2 )) n→∞ −→ 0.
ex = n→∞ lim ( 1 + n x )n
φX(t/√ n)n
( 1 + (φX(t/√ n) − 1) )n
= ( 1 − t2 2 − n(φX(t/√ n n) − 1 + 2n t2 ) )n
= ( 1 − t2 2 − n xn )n
n→∞ e−t2/2 .
−→
Theorem 3.44 (Central Limit Theorem). Let Xn be an independent se quence
of identically distributed real random variables with mean E(Xn) = 0 and
variance E(X2 n) = 1. Consider the standard normal variable N given by
1√ n k=1 n∑ Xk −→ d N .
Proof. Since the Xn are identically distributed, they all have the same charac
teristic function, which we shall denote by φX . Denote Yn = n−1/2 ∑n k=1
Xk, so that we wish to show Yn → d N . Using that the Xn are independent,
we observe for all t ∈ R that
φYn (t)
= E(e−itYn)
k=1
= φX(t/√ n)n
57
Given that the standard normal variable has the characteristic function φN (t)
= e−t2/2 , the proof is complete by combining Lemma 3.43 and Theo rem
3.41.
√
1√ n k=1 n∑ Xk −→ d N (0, σ),
where N (0, σ) is the real random variable given by the distribution P((a, b])
=
σ √ 2π
1 ∫ a b e − 2σ2 t2 dt.