Notes
Notes
1
The above theorem, which is ubiquitous in construction of measures, is The-
orem 11.2, pg 166, of Billingsley (1995).
Definition 4. A sequence S of sets An increases to a set A, denoted by An ↑ A
∞
if A1 ⊂ A2 ⊂ . . ., and A = n=1 An . Similarly, An ↓ A is also defined.
Definition 5. A family M of subsets of Ω is a monotone class if
• (closed under monotone union) An ∈ M and An ↑ A implies that A ∈ M,
• (closed under monotone intersection) An ∈ M and An ↓ A implies that
A ∈ M.
The following is Theorem 3.4, pg 43, of Billingsley (1995).
Theorem 1.2 (Monotone class theorem). If F is a field, and M is a monotone
class, then F ⊂ M implies that σ(F) ⊂ M.
Theorem 1.3 (Uniqueness). Suppose F is a field, and µ1 , µ2 are measures on
(Ω, σ(F)) which agree on F and are σ-finite on F. Then
µ1 = µ2 .
Proof. Follows from Theorem 1.2.
Definition 6. A non-empty collection S of subsets of Ω is a semi-field if
A, B ∈ S implies A ∩ B ∈ S and A ∈ S implies
Ac = A1 ∪ . . . ∪ An ,
for some disjoint A1 , . . . , An ∈ S.
Exercise 1.2. If S is a semi-field, show that
F = {A1 ∪ . . . ∪ An : A1 , . . . , An ∈ S are disjoint}
is the smallest field generated by S.
The following corollary of Theorems 1.1 and 1.3 will be most useful for us.
Corollary 1.1. Suppose that S is a semi-field on Ω, and µ : S → [0, ∞] is a
countably additive function. Then, there exists a measure µ∗ on (Ω, σ(S)) such
that
µ∗ (A) = µ(A) for all A ∈ S .
Furthermore, if µ is σ-finite on S, then µ∗ is unique.
Exercise 1.3. Prove Corollary 1.1.
Exercise 1.4. Let µ1 and µ2 be measures on (R, B(R)) defined by
µ1 (A) = #(A ∩ Q) , µ2 (A) = 2#(A ∩ Q) , A ∈ B(R) .
Show that µ1 , µ2 are σ-finite and agree on S = {(a, b] ∩ R : −∞ ≤ a ≤ b ≤ ∞}
which is a semi-field but they do not agree on (R, σ(S)). Corollary 1.1 thus fails
if µ1 and µ2 are σ-finite, but not on S, and everything else holds.
2
Definition 7. For d ≥ 1, B(Rd ) is the Borel σ-field on Rd , that is, the σ-field
generated by all open sets in Rd . A measure µ on Rd , B(Rd ) is Radon if
µ(K) < ∞ for all compact K ⊂ Rd .
The following, which is Theorem 12.4, pg 176, of Billingsley (1995), is fun-
damental to probability theory.
Theorem 1.4. If F : R → R is a non-decreasing right continuous function,
then there exists a unique Radon measure µ on (R, B(R)) such that
µ (a, b] = F (b) − F (a) , −∞ < a < b < ∞ .
Furthermore, if F (∞) = 1 and F (−∞) = 0, then µ is a probability measure.
In Theorem 1.4, the following conventions are used:
F (∞) = lim F (x) if it exists, and F (−∞) = lim F (x) if it exists.
x→∞ x→−∞
Unless mentioned otherwise, F (∞) and F (−∞) will mean the above throughout
the course.
Exercise 1.5. Show that a Radon measure µ on (Rd , B(Rd )) is regular, that is,
µ(A) = inf {µ(U ) : U open, and U ⊃ A} = sup {µ(F ) : F closed, and F ⊂ A} ,
for all A ∈ B(Rd ).
Exercise 1.6. Show that µ1 as in Exc 1.4 is σ-finite but not regular. Thus not
all σ-finite measures on (R, B(R)) are regular.
Exercise 1.7. Suppose that µ1 , µ2 are Radon measures on (Rd , B(Rd )) such
that
µ1 (U c ) = µ2 (U c ) = 0
for some open set U ⊂ Rd . If
µ1 (R) ≤ µ2 (R) ,
for all rectangles R = (a1 , b1 ] × . . . × (ad , bd ] ⊂ U with a1 , . . . , ad , b1 , . . . , bd ∈ Q,
then show that
µ1 (A) ≤ µ2 (A) , A ∈ B(Rd ) .
1.2 Integration
Definition 8. Let R̄ = R ∪ {−∞, ∞} and
B R̄ = σ B(R) ∪ {−∞}, {∞} .
That is, B R̄ is the smallest σ-field which is a superset of B(R) and contain
the singleton sets {−∞}, {∞}. Given a measurable space (Ω, A) and a function
f : Ω → R̄, f is A-measurable if
f −1 A ∈ A for all A ∈ B(R̄) ,
where f −1 A = {ω ∈ Ω : f (ω) ∈ A}.
3
Given a measure space (Ω, A, µ) and a measurable f : Ω → [0, ∞], the
integral of f with respect to µ will be denoted by any of the following:
Z Z Z Z
f dµ , f (ω) dµ(ω) , f (ω) µ(dω) , f dµ etc.
Ω
(−x) ∨ 0 for all x ∈ R̄. The above is defined even when the right hand side is
±∞; “∞ − ∞” is the only caseR when it is undefined.
We say f is integrable if f + dµ < ∞ and f − dµ < ∞, which happens
R
if and only if Z
|f | dµ < ∞ .
The above are Theorems 16.2, 16.3 and 16.4 on pg 208-209 of Billingsley
(1995), respectively.
Definition 9. Suppose (Ω1 , A1 , µ) is a measure space, (Ω2 , A2 ) is a measurable
space and T : Ω1 → Ω2 is a measurable map, that is,
T −1 A ∈ A1 for all A ∈ A2 .
The push forward measure of µ under T is the measure µ ◦ T −1 on (Ω2 , A2 )
defined by
µ ◦ T −1 (A) = µ T −1 A , A ∈ A2 .
4
Theorem 1.6. Suppose (Ω1 , A1 , µ) is a measure space, (Ω2 , A2 ) is a measurable
space and T : Ω1 → Ω2 is a measurable map. Then, for a measurable f : Ω2 →
R̄, Z Z
f (y) d µ ◦ T −1 (y) ,
f (T (x)) dµ(x) =
Ω1 Ω2
Definition 10. If µ and ν are measures on (Ω, A), then µ is absolutely contin-
uous with respect to ν, we write µ ν, if
Definition 12. If (Ω1 , A1 ) and (Ω2 , A2 ) are measurable spaces, then the product
σ-field A1 ⊗ A2 is defined by
A1 ⊗ A2 = σ (A1 × A2 : A1 ∈ A1 , A2 ∈ A2 ) .
B(R) ⊗ B(R) = B R2 .
5
Exercise 1.12. If (Ω1 , A1 , µ1 ) and (Ω2 , A2 , µ2 ) are σ-finite measure spaces,
then show that there there exists a unique measure µ1 ⊗µ2 on (Ω1 ×Ω2 , A1 ⊗A2 )
satisfying
Convention
The usual convention for iterated integrals is the following:
Z Z Z Z
f (ω1 , ω2 ) µ2 (dω2 )µ1 (dω1 ) = f (ω1 , ω2 ) µ2 (dω2 ) µ1 (dω1 ) ,
Ω1 Ω2 Ω1 Ω2
that is, the left hand side above means the right hand side.
The theorems of Tonelli and Fubini are subsumed in Theorem 18.3, pg 234,
Billingsley (1995).
6
Definition 13. The Lebesgue measure λ is the measure on (R, B(R)) satisfying
λ (a, b] = b − a , for all − ∞ < a ≤ b < ∞ ,
the existence and uniqueness of which is guaranteed by Theorem 1.4 by taking
F (x) = x. For a Borel measurable function f : (a, b) → R̄, where −∞ ≤ a <
b ≤ ∞, its Lebesgue integral is defined as
Z b Z
f (x) dx = f (x) λ(dx) ,
a (a,b)
then Z b
F 0 (x) dx = F (b) − F (a) .
a
If (1.2) holds with b = ∞, then the above holds as well with F (b) replaced by
limx→∞ F (x) which necessarily exists, and likewise if −∞ = a < b ≤ ∞.
The above is Theorem 7.21, page 149, of Rudin (1987).
Theorem 1.11 (Change of variable). If U, V ⊂ R are open sets, ψ : U → V is
a C 1 bijection whose derivative ψ 0 never vanishes, then
Z Z
f ◦ ψ(x)|ψ 0 (x)| dx = f (y) dy ,
U V
Soln.: Let
Z ∞ Z ∞
2
+y 2 )
I= e−(x dx dy
−∞ −∞
Z ∞ Z ∞
2 2
(Tonelli) = e−x e−y dy dx .
−∞ −∞
7
For a fixed x ∈ R \ {0}, put y = xz using Theorem 1.11 to get
Z ∞ Z ∞
−y 2 2 2
e dy = |x| e−z x dz .
−∞ −∞
Thus
Z ∞ Z ∞
2
(1+z 2 )
I= |x|e−x dz dx
−∞ −∞
Z ∞ Z ∞
2
(1+z 2 )
(Tonelli) = |x|e−x dx dz .
−∞ −∞
For a fixed z ∈ R,
Z ∞ Z ∞
2
(1+z 2 ) 2 2
|x|e−x dx = 2 xe−x (1+z ) dx
−∞ 0
Z ∞
2
(Theorem 1.11: y = x2 , dy = 2xdx) = e−y(1+z ) dz
"0 2
#y=∞
e−y(1+z )
(Theorem 1.10) = −
1 + z2
y=0
1
= .
1 + z2
Therefore
Z ∞
1
I= dz
−∞ 1 + z2
−1 ∞
(Theorem 1.10) = tan z −∞
π π
= − − = π.
2 2
This completes the solution of the exercise.
An immediate consequence of the above exercise is
Z ∞
2 √
e−x dx = π ,
−∞
Definition 14. The Lebesgue measure λd is the d-fold product of the one-
dimensional Lebesgue measure λ, that is, λd is the unique measure on the space
(Rd , B(Rd )) satisfying
d
Y
λd (A1 × . . . × Ad ) = λ(Ai ) , A1 , . . . , Ad ∈ B(R) .
i=1
8
For stating the next result, a Jacobian matrix has to be first defined. Con-
sider an open set U ⊂ Rd and a function F : U → Rd . Denote by f1 , . . . , fd the
coordinate functions of F , that is,
If the first partial derivatives of F exist, that is, ∂fi (x)/∂xj exists for all x ∈ U
and 1 ≤ i, j ≤ d, then its Jacobian matrix at x, denoted by J(x), is a d × d
matrix defined by
∂fi (x)
J(x) = ,x ∈ U ,
∂xj 1≤i,j≤d
that is, the (i, j)-th entry of J(x) is ∂fi (x)/∂xj . The statement of the theorem
is the following, of which Theorem 1.11 is a special case.
Theorem 1.12. For open subsets U and V of Rd , let T : U → V be a bijection
which is continuously differentiable, that is, the first partial derivatives of T
exist and are continuous. Assume that its Jacobian matrix J(x) is non-singular
for all x ∈ U . Then for any non-negative measurable function f : V → R,
Z Z
f T (x) |det(J(x))| dx = f (y) dy ,
U V
R = [a1 , b1 ] × . . . × [ad , bd ] ⊂ Rd ,
9
Fact 1.3. Suppose that U ⊂ Rd is open, R ⊂ U is a closed rectangle and
T : U → Rd is continuously differentiable such that
where Jij (z) is the (i, j)-th entry of the Jacobian matrix J(z) of T at z for all
z ∈ U and 1 ≤ i, j ≤ d. Then,
10
Recall that the function A → A−1 , from the space of d × d non-singular
matrices to itself, is continuous. Since J(x) is non-singular for all x ∈ U , the
map x 7→ J(x)−1 is continuous on U . Thus,
f : R × {z ∈ Rd : kzk = 1} → Rd ,
defined by
In other words,
J(x)−1 z ≤ ckzk , x ∈ R, z ∈ Rd . (1.8)
Denote by Jij (x) the (i, j)-th entry of J(x) for all x ∈ U and 1 ≤ i, j ≤ d.
Uniform continuity of Jij (·) on R ensures the existence of δ2 > 0 such that
ε
|Jij (x) − Jij (x0 )| ≤ for all x, x0 ∈ R, kx − x0 k ≤ δ2 . (1.9)
cd
Let 0 < δ ≤ min{δ1 , δ2 } be such that δ −1 (bi − ai ) is an integer for every i.
Choosing such a δ is possible because bi − ai is rational; if pi , qi are positive
integers with bi − ai = pi /qi , letting
1
δ= ,
nq1 . . . qd
works for large n, for example.
Consider the square
R = Q1 ∪ . . . ∪ Qk ,
11
The above is precisely the advantage of working with the L∞ norm.
For i = 1, . . . , k, fix xi ∈ Qi and define
where
Qεi = B(1+ε)δ/2 (xi ) , i = 1, . . . , k .
Proceeding towards proving (1.12), fix i ∈ {1, . . . , k}, and use Fact 1.3 along
with (1.9) to claim that for all z ∈ Qi ,
ε
kT (z) − T (xi ) − J(xi )(z − xi )k ≤ kz − xi k .
c
Since the left hand side above equals kT (z) − φi (z)k, it follows that
ε
kT (z) − φi (z)k ≤ kz − xi k , z ∈ Qi . (1.13)
c
Therefore, for z ∈ Qi ,
φ−1 −1 −1
i ◦ T (z) − z = φi ◦ T (z) − φi ◦ φi (z)
= J(xi )−1 (T (z) − φi (z))
≤ c kT (z) − φi (z)k
≤ εkz − xi k ,
(1.8) and (1.13) implying the inequalities in the penultimate line and the last
line, respectively. Thus, for z ∈ Qi ,
φ−1 −1
i ◦ T (z) − xi ≤ φi ◦ T (z) − z + kz − xi k ≤ (1 + ε)kz − xi k .
φ−1 ε
i ◦ T (z) ∈ Qi , z ∈ Qi ,
the second line following from the translation-invariance of the Lebesgue mea-
sure, and Fact 1.1 and the observation that Qεi is a rectangle implying the last
line. This is the crux of the proof in that it shows how the modulus of the
12
determinant of the Jacobian appears. Further, (1.11) shows Qεi is a square of
side-length (1 + ε)δ. Therefore,
Thus,
k
X
λ (T (R)) = λ(T (Qi ))
i=1
k
X
d
≤ (1 + ε) | det(J(xi ))|λ(Qi )
i=1
k
X
≤ (1 + ε)d ε + min | det(J(z))| λ(Qi )
z∈Qi
i=1
Z
≤ (1 + ε)d ελ(R) + | det(J(x))|dx ,
R
(1.7) and that δ ≤ δ1 implying the penultimate line. Since the above holds for
all ε > 0, letting ε ↓ 0 completes the proof of Step 1.
While Step 1. was mostly based on analysis and linear algebra, the proof of
Step 2. is standard in measure theory and follows from Exc 1.7.
Proof of Step 2. Define measures µ and ν on Rd by
and Z
ν(B) = |det(J(x))| dx , B ∈ B(Rd ) .
B∩U
The claim (1.5) is equivalent to
In view of Exc 1.7, it suffices to show that the claim holds for any compact
rectangle with rational corners, that is,
13
Proof of Step 3. First let f : V → R be a non-negative simple function, that is,
k
X
f= αi 1Ai ,
i=1
14
Apply (1.17) to this g to get
Z Z
g ◦ T −1 (y) det J(T −1 y)−1 dy
f ◦ T (x)| det(J(x))| dx ≤
U
ZV
f (y) det J(T −1 y) det J(T −1 y)−1 dy
=
ZV
= f (y) dy .
V
This completes the proof of Step 4. and that of Theorem 1.12 as well.
p T . . . T (n times)H = 2−n−1 , n = 0, 1, 2, . . . ,
15
and X
P (A) = p(ω) , A ⊂ Ω .
ω∈A
Thus (Ω, P(Ω), P ) is the probability space associated with this experiment.
As in the above examples, if the sample space Ω is countable and p(ω) ≥ 0
is the probability of the outcome ω for all ω ∈ Ω, where
X
p(ω) = 1 ,
ω∈Ω
then letting X
P (A) = p(ω) , A ⊂ Ω ,
ω∈A
Ω, P(Ω), P is the natural probability space. The above method, however, fails
for an uncountable sample space. For random experiments with such a sample
space, measure theory is essential, as illustrated in the next example.
Example 2.3. A fair coin is tossed infinitely often. That is, for n = 1, 2, 3, . . .,
there is a n-th toss which yields either a head or a tail. The sample space of
this experiment is
Ω = (ω1 , ω2 , ω3 , . . .) : ωn ∈ {H, T } , n = 1, 2, . . . ,
That is, Eω1 ...ωn is the event that the first toss yields ω1 , the outcome of the
second toss is ω2 , and so on till the n-th toss. Define
S = {∅} ∪ Eω1 ...ωn : n ∈ N, ω1 , . . . , ωn ∈ {H, T } .
The following several exercises show that there is a unique probability measure
P on (Ω, σ(S)) satisfying
A \ B = C1 ∪ . . . ∪ Ck ,
16
Definition 15. Given a non-empty set Ω and a non-empty collection C of
subsets of Ω, C has the “Cantor intersection property” if the following holds.
Whenever C1 , C2 , C3 . . . ∈ C are such that C1 ⊃ C2 ⊃ C3 ⊃ . . . and Cn 6= ∅ for
all n, it holds that
∞
\
Cn 6= ∅ .
n=1
For example, the collection of all compact sets of Rd has the Cantor inter-
section property.
Exercise 2.2. Suppose S is a semi-field having the Cantor intersection prop-
S∞
erty. If A1 , A2 , . . . ∈ S are disjoint and n=1 An ∈ S, show that An = ∅ for all
but finitely many n’s.
Hint.: Assume A1 , A2 , . . . ∈ S are disjoint,
∞
[
A= An ∈ S ,
n=1
A \ A1 = C1 ∪ . . . ∪ Ck ,
because otherwise A is the union of finitely many An ’s which would imply that
all but finitely many of An ’s are empty. Let B1 = Ci . Apply Exc 2.1 to B1 \ A2
to get B2 ∈ S such that B2 ⊂ B1 \ A2 and
∅=
6 Bn ⊂ An+1 ∪ An+2 ∪ . . . , n ≥ 1 .
17
Exercise 2.3. Let Ω and S be as in Example 2.3. Show that S is a semi-field
having the Cantor intersection property.
Exercise 2.4. Let Ω and S be as in Example 2.3. Define P : S → [0, 1] by
P (E) = 0 if E = ∅ ,
ω11 , . . . , ωk11 −1 , u, u, u, . . . ∈ Ai .
Exercise 2.5. Let Ω and S be as in Example 2.3. Use Exc 2.2 and 2.3 to show
if A1 , A2 , . . . ∈ S are disjoint such that
∞
[
An ∈ S ,
n=1
then all but finitely many of An ’s are empty. Hence argue that P defined in Exc
2.4 is countably additive on S.
Exercise 2.6. Let Ω and S be as in Example 2.3. Use the corollary of Theorems
1.1 and 1.3 to show that there exists a unique probability measure P on (Ω, σ(S))
satisfying (2.1).
Now that the natural association of a probability space with a random ex-
periment is understood, we shall define a random variable and its C.D.F.
18
Definition 16. A random variable X defined on a probability space (Ω, A, P )
is a measurable function X : Ω → R̄ such that
P X −1 {−∞, ∞} = 0 .
(2.3)
F (x) = P X −1 (−∞, x] , x ∈ R .
and
F (−∞) = lim F (x) whenever it exists .
x→−∞
Proof of Theorem 2.1. The proof of 1.–4. is easy and is left as an exercise when
F is a C.D.F. Conversely, for a F : R → [0, 1] satisfying 1.–4., Theorem 1.4
guarantees the existence of a unique probability measure P on (R, B(R)) satis-
fying
P ((a, b]) = F (b) − F (a) , −∞ < a < b < ∞ .
Letting Ω = R, A = B(R) and X : R → R to be the identity function, it is easy
to see that F is the C.D.F. of X which is a random variable on (Ω, A, P ). This
completes the proof.
Henceforth, (Ω, A, P ) will be the probability space underlying any random
variable talked about, unless explicitly mentioned otherwise. Theorem 2.1 guar-
antees that such a probability space exists whenever a few conditions are satis-
fied.
19
Definition 17. Given a possibly improper random variable X, its distribution,
usually denoted by P (X ∈ ·) or P ◦ X −1 (·), is the push forward measure of P
on (R̄, B(R̄)) under X, that is,
P (X ∈ B) = P ◦ X −1 (B) = P ({ω ∈ Ω : X(ω) ∈ B}) , B ∈ B(R̄) .
For a Borel function f : R̄ → R̄, we denote by
Z Z
f (x)P (X ∈ dx) or f (x)P ◦ X −1 (dx) ,
R̄ R̄
= P (dω) ⊗ dx ,
{(ω,x)∈Ω×R:0≤X(ω)<x}
20
the last line following from Tonelli and Exc 2.8. Use Tonelli again to write
Z ∞Z
E(X) = 1[0≤x<X(ω)] P (dω) dx
Z0 ∞ Ω
= P (X > x) .
0
the second equality follows from the observation that for x ≥ 0, X > x ⇐⇒
X + > x. Similarly,
Z ∞
−
P (X − > x) dx
E X =
0
Z 0
= P (X − > −x) dx
−∞
Z 0
X − > −x ⇐⇒ X < x for x ≤ 0 =
P (X < −x) dx
−∞
Z 0
= P (X ≤ −x) dx ,
−∞
the last line following from the fact that P (X ≤ ·) and P (X < ·) differs on a set
which is at most countable and hence has Lebesgue measure zero. Recalling that
X = X + − X − and either of E(X + ) and E(X − ) is finite, the proof follows.
The following theorem relates the expectation with its distribution.
Theorem 2.3. For a possibly improper random variable X and a Borel function
f : R̄ → R̄, Z
E(f (X)) = f (x)P (X ∈ dx) ,
R̄
21
Proof. Follows from the inequality |αX + βY | ≤ |α||X| + |β||Y | and the fact
that integral on a measure space is monotone and linear.
If Definition 18 were replaced by any other definition, for example, by (2.4),
then proving the above theorem would have become extremely difficult.
Definition 19. For a random variable X with a finite mean µ, its variance is
defined as
Var(X) = E (X − µ)2 .
p
The standard deviation of X is Var(X).
Theorem 2.5. The variance of X is defined and finite if and only if E(X 2 ) <
∞, in which case,
2
Var(X) = E(X 2 ) − (E(X)) . (2.5)
For the proof, the following fact is needed.
Fact 2.1 (Cauchy-Schwarz inequality). If f, g are measurable functions from a
measure space (Ω, A, µ) to R, then
Z Z 1/2 Z 1/2
|f g| dµ ≤ f 2 dµ g 2 dµ .
for random variables X and Y . Take Y to be identically 1 and square both sides
to get
2
(E(|X|)) ≤ E(X 2 ) . (2.6)
If E(X 2 ) < ∞, then (2.6) shows X has a finite expectation. Let µ = E(X).
Write
(X − µ)2 = X 2 − 2µX + µ2 .
Since X 2 and X have a finite expectation, the linearity of expectation implies
so does the left hand side and
then writing
X 2 = (X − µ)2 + 2µX − µ2 ,
it follows that E(X 2 ) < ∞. This shows the “only if” part and thus completes
the proof.
22
The formula (2.5) is used almost always for calculating variance. A word of
caution: (2.6) should not be misinterpreted as
Z 2 Z
|f | dµ ≤ f 2 dµ ,
when µ is not a probability measure. The above is clearly false, for example,
if µ is the Lebesgue measure on R and f (x) = x−1 1(x > 1) because then the
right hand side is finite whereas the left hand side is not.
Definition 20. A random variable X is discrete if there exists a countable set
C ⊂ R such that P (X ∈ C) = 1. The probability mass function of a discrete
random variable X is the function f : R → [0, 1] defined by
f (x) = P (X = x) , x ∈ R .
Theorem 2.6. If X is a discrete random variable, then for any measurable
f : R → R̄, X
E (f (X)) = f (x)P (X = x) , (2.7)
x∈R
whenever the left hand side is defined, where the sum on the right hand side is to
be interpreted as the sum over those x for which P (X = x) > 0. In particular,
if X has an expectation, then
X
E(X) = xP (X = x) .
x∈R
In other words,
P (X ∈ dx)
= P (x = x) ,
µ(dx)
that is, P (X = ·) is the Radon-Nikodym derivative of P (X ∈ ·) with respect to
µ. Exc 1.10 shows that for a measurable f : R → R̄,
Z Z
f (x)P (X ∈ dx) = f (x)P (X = x)µ(dx) ,
R R
whenever either side is defined. The Pleft hand side equals E(f (X)) by Theorem
2.3, and the right hand side is simply x∈C f (x)P (X = x). Since P (X = x) = 0
for x ∈
/ C, this completes the proof of (2.7). The second claim being a special
case of (2.7), the proof follows.
23
Example 2.4. A coin with chances of head p is tossed infinitely often. Pro-
ceeding like in Example 2.3, construct the probability space for this experiment.
That is, if Ω and S are as therein, show that P : S → [0, 1], defined by
n
Y
P (Eω1 ...ωn ) = [p1(ωi = H) + q1(ωi = T )] , n ∈ N , ω1 , . . . , ωn ∈ {H, T } ,
i=1
24
The distribution of X is called Poisson(λ). Show that its mean and variance
both equal λ.
Definition 21. A random variable X is continuous if
P (X = x) = 0 for all x ∈ R .
Exercise 2.11. If X is a random variable with C.D.F. F , show that for all
x ∈ R,
P (X = x) = F (x) − F (x−) .
Hence argue that X is a continuous random variable if and only if F is a con-
tinuous function.
Definition 22. A Borel function f : R → [0, ∞) is the density of a random
variable X if Z
P (X ∈ B) = f (x) dx , for all B ∈ B(R) .
B
P (X ∈ dx)
= f (x) , x ∈ R ,
dx
that is, f is the Radon-Nikodym derivative of P (X ∈ ·) with respect to the
Lebesgue measure.
2. Prove that if f and g are densities of X, then f = g a.e. In other words, a
density is unique upto a set of Lebesgue measure zero.
3. Prove that a non-negative Borel function f on R is a density of X having
C.D.F. F if and only if
Z x
f (t) dt = F (x) , x ∈ R .
−∞
Theorem 2.7. A random variable has a density if and only if its C.D.F. is
absolutely continuous.
25
The proof uses the following exercise.
Exercise 2.13. If h is an integrable function on a measure space (Ω, A, µ), then
given ε > 0 there exists δ > 0 such that
Z
|h| dµ ≤ ε ,
A
P (X ∈ ·) λ , (2.8)
λ being the Lebesgue measure. To prove this, fix any B ∈ B(R) with λ(B) = 0.
We shall prove P (X ∈ B) = 0 by showing for any ε > 0
P (X ∈ B) ≤ ε . (2.9)
Fix ε > 0. Absolute continuity of F implies there exists δ > 0 such that
n
X
|F (yi ) − F (xi )| ≤ ε ,
i=1
Since λ(B) = 0 and the Lebesgue measure is regular, see Exc 1.5, there exists
an open set U ⊂ R with U ⊃ B and λ(B) ≤ δ. An open subset of R is the union
of countably many disjoint open intervals, that is,
[
U= (xi , yi ) ,
i≥1
26
showing that
n
X
ε≥ [F (yi ) − F (xi )]
i=1
Xn
(absolute continuity implies continuity) = [F (yi −) − F (xi )]
i=1
Xn
= P (xi < X < yi )
i=1
n
!
[
=P X∈ (xi , yi ) .
i=1
Since !
n
[
P X∈ (xi , yi ) ↑ P (X ∈ U ) ≥ P (X ∈ B) ,
i=1
27
Theorem 2.8. A random variable X with C.D.F. F has a density if and only
if Z ∞
f (x) dx = 1 ,
−∞
where (
d
f (x) = dx F (x) , if F is differentiable at x ,
(2.10)
0, otherwise .
In that case, f is the density of X.
The proof uses the following facts.
Fact 2.2 (Theorem 31.2, pg 404, Billingsley (1995)). A non-decreasing function
F : [a, b] → R is differentiable a.e. on (a, b). If f is as in (2.10), then f is Borel
measurable, non-negative and satisfies
Z b
f (x) dx ≤ F (b) − F (a) .
a
d
F (x) = f (x) for almost all x ∈ (a, b) .
dx
Proof of Theorem 2.8. We start with proving the “if” part. Assume
Z ∞
f (x) dx = 1 , (2.11)
−∞
Keeping b fixed and letting a → −∞, MCT shows that the left hand side goes
to the corresponding integral from −∞ to b. Since F is a C.D.F., F (−∞) = 0,
showing that
Z b
f (x) dx ≤ F (b) . (2.12)
−∞
28
Thus
Z ∞
F (a) ≤ 1 − f (x) dx
a
Z a
(using (2.11) = f (x) dx
−∞
≤ F (a) ,
Exc 2.12.3 shows f is the density of X. This proves the “if” part.
For the “only if” part, assume X has a density g. That is,
Z x
g(t) dt = P (X ≤ x) = F (x) , x ∈ R .
−∞
Fact 2.3 shows that the left hand side is differentiable a.e. on (a, b) and
d
F (x) = g(x) for a.e. x ∈ (a, b) .
dx
Since this is true for all a, b with −∞ < a < b < ∞, the above equality holds
a.e. on R. A comparison with (2.10) shows f = g a.e. Therefore
Z ∞ Z ∞
f (x) dx = g(x) dx = 1 ,
−∞ −∞
29
4. Given ε > 0 there exists δ > 0 such that
Proof. Exc.
The equivalence of 2. and 5. in the above theorem is important from the
point of view of analysis as well.
Theorem 2.10. If X has density f , then for any measurable g : R → R̄,
Z ∞
E (g(X)) = g(x)f (x) dx ,
−∞
P (X ∈ dx)
f (x) = .
dx
30
Example 2.6. A random variable X follows standard normal or standard Gaus-
sian if its density is
1 2
f (x) = √ e−x /2 , x ∈ R .
2π
It is easy to see that E(X) = 0 and
Z ∞
2
E(X ) = x2 f (x) dx
−∞
r Z ∞
2 2
= x2 e−x /2 dx
π 0
r Z ∞
2 2
= x xe−x /2 dx .
π 0
Integrating by parts with the help of the observation that
d −x2 /2 2
−e = xe−x /2 ,
dx
we get
r
2
2
∞ Z ∞ 2
2
E(X ) = x −e−x /2 − −e−x /2 dx
π 0 0
r Z ∞
2 2
= e−x /2 dx
π 0
= 1.
Thus the mean and variance of the standard normal distribution are 0 and 1,
respectively.
The distribution with density
1 (x − µ)2
1
f (x) = √ exp − ,x ∈ R,
σ 2π 2 σ2
P (B ∩ A)
P (B|A) = .
P (A)
31
This is the same as
that is,
1 − F (s + t) = 1 − F (s) 1 − F (t) , s, t ≥ 0 .
Letting G = log(1 − F ), the condition is
It follows that
G(r) = −λr , r ∈ [0, ∞) ∩ Q ,
where λ = −G(1) = − log 1 − F (1) ≥ 0. Right continuity of G implies
G(x) = λx , x ≥ 0 ,
that is,
F (x) = 1 − e−λx , x ≥ 0 .
Since F (∞) = 1, it is necessary that λ > 0. As X ≥ 0, F (x) = 0 for x < 0.
For λ > 0, X follows Exponential(λ) if F defined by
(
1 − e−λx , x ≥ 0 ,
F (x) =
0, x < 0.
and its mean and variance are λ−1 and λ−2 , respectively.
Definition 25. For α > 0, define
Z ∞
Γ(α) = xα−1 e−x dx .
0
32
Definition 26. For α, β > 0, define
Z 1
B(α, β) = xα−1 (1 − x)β−1 . (2.13)
0
Exercise 2.15. 1. Show that the RHS of (2.13) is finite for α, β > 0.
2. Show that
1 1
B , = π,
2 2
√
by substituting θ = sin−1 x in the RHS of (2.13).
Example 2.9. For α, β > 0, X follows Beta(α, β) if its density is
1
f (x) = xα−1 (1 − x)β−1 , 0 < x < 1 .
B(α, β)
When α = β = 1/2, show that X has C.D.F.
0 ,
√
x < 0,
F (x) = π2 sin−1 x , 0 ≤ x ≤ 1 ,
1, x > 1.
This is why the Beta(1/2, 1/2) distribution is also called the “arc-sine law”.
3 Independence
In this chapter the concept of independence is studied, which is a fundamental
concept in probability theory. Unless mentioned otherwise, (Ω, A, P ) is the
probability space underlying everything we talk about. For example, all random
variables are defined on this space and any collection of sets we talk about is a
subset of A, unless the contrary is explicitly stated.
Definition 27. A collection of σ-fields A1 , . . . , An are independent if
n
Y
P (A1 ∩ . . . ∩ An ) = P (Ai ) ,
i=1
for all A1 ∈ A1 , . . . , An ∈ An .
The following is an important observation which connects the usually given
definition of independence of events with the above.
Exercise 3.1. If A1 , . . . , An are independent σ-fields and A1 , . . . , An belong to
A1 , . . . , An , respectively, show that
33
The above when n = 2 should be compared with Definition 24.
Theorem 3.1. If S1 , . . . , Sn are semi-fields such that
n
! n
\ Y
P Ai = P (Ai ) , for all A1 ∈ S1 , . . . , An ∈ Sn ,
i=1 i=1
for all A ∈ A. Thus µ1 and µ2 are finite measures on (Ω, A), which agree on S1
by the hypothesis of the theorem. As S1 is a semi-field, Corollary 1.1 implies
that µ1 and µ2 agree on σ(S1 ). In other words, (3.1) holds.
We shall now show inductively that for i = 1, . . . , n,
n
! n
\ Y
P Ai = P (Ai ) , (3.2)
i=1 i=1
34
Definition 28. Random variables X1 , . . . , Xn , all of which by convention are
defined on (Ω, A, P ), are independent if σ(X1 ), . . . , σ(Xn ) are independent σ-
fields, where σ(X) is the smallest σ-field with respect to which X is measurable
for any X : Ω → R̄, that is,
σ(X) = X −1 B : B ∈ B(R̄) .
X X n
Y
= ... (−1)1(xi =ai ) 1(Xi ≤ xi )
x1 ∈{a1 ,b1 } xn ∈{an ,bn } i=1
X
= (−1)#{1≤i≤n:xi =ai } 1(X1 ≤ x1 , . . . , Xn ≤ xn ) .
(x1 ,...,xn )∈{a1 ,b1 }×...×{an ,bn }
Taking expectation on both sides and using (3.4), the solution follows.
35
Proof of Theorem 3.2. The “only if” part is trivial because (3.3) follows from
the observation that X1−1 (−∞, x1 ], . . . , Xn−1 (−∞, xn ] belong to σ(X1 ), . . . . . .
. . . , σ(Xn ), respectively. For the “if” part, assume (3.3). The first observation
is that (3.3) holds for x1 , . . . , xn ∈ R̄ because if xi = −∞ for one or more i,
then both sides are zero and if xi → ∞, then both sides of (3.3) increase to the
respective quantities obtained by putting xi = ∞ for those i’s. Thus, (3.3) can
be assumed to hold for all x1 , . . . , xn ∈ R̄ without loss of generality.
Let Si = {Xi−1 (ai , bi ] : −∞ ≤ ai ≤ bi ≤ ∞} for i = 1, . . . , n. For A1 ∈
S1 , . . . , An ∈ Sn , that is, Ai = Xi−1 (ai , bi ] for some ∞ ≤ ai ≤ bi ≤ ∞,
P (A1 ∩ . . . ∩ An )
= P (a1 < X1 ≤ b1 , . . . , an < Xn ≤ bn )
X
= (−1)#{1≤i≤n:xi =ai } P (X1 ≤ x1 , . . . , Xn ≤ xn )
(x1 ,...,xn )∈{a1 ,b1 }×...×{an ,bn }
X n
Y
#{1≤i≤n:xi =ai }
= (−1) P (Xi ≤ xi )
(x1 ,...,xn )∈{a1 ,b1 }×...×{an ,bn } i=1
X n
Y
= (−1)1(xi =ai ) P (Xi ≤ xi )
(x1 ,...,xn )∈{a1 ,b1 }×...×{an ,bn } i=1
n
Y X
= (−1)1(xi =ai ) P (Xi ≤ xi )
i=1 xi ∈{ai ,bi }
n
Y
= (P (Xi ≤ bi ) − P (Xi ≤ ai ))
i=1
Yn
= P (ai < Xi ≤ bi )
i=1
= P (A1 ) . . . P (An ) ,
Exc 3.3 implying the third line and the fourth line following from (3.3) which
holds for all x1 , . . . , xn ∈ R̄. Theorem 3.1 shows σ(S1 ), . . . , σ(Sn ) are indepen-
dent, which is the same as independence of X1 , . . . , Xn .
Exercise 3.4. Let Ω = (0, 1], A be the collection of Borel subsets of (0, 1] and
P be the restriction of Lebesgue measure to (0, 1].
1. For all ω ∈ Ω, show that there exist unique X1 (ω), X2 (ω), . . . ∈ {0, 1, 2}
such that
X∞
ω= 3−n Xn (ω) ,
n=1
36
2. For n ≥ 1 and i1 , . . . , in ∈ {0, 1, 2}, show that for ω ∈ Ω,
n
X n
X
X1 (ω) = i1 , . . . , Xn (ω) = in ⇐⇒ 3−j ij < ω ≤ 3−n + 3−j ij .
j=1 j=1
∞
X o
ω= 3−n xn for some x1 , x2 , . . . ∈ {0, 1}
n=1
37
For a collection {Ai : i ∈ I} of σ-fields, denote
!
_ [
Ai = σ Ai .
i∈I i∈I
are independent.
Proof. The first step of the proof is to show the claim when I1 , . . . , Ik are finite
sets. Suppose
Ii = {ni1 , . . . , niki } , i = 1, . . . , k .
Define
where
respectively.
Now let I1 , . . . , Ik be disjoint non-empty subsets of I. Define
[ _
Fi = Aj , i = 1, . . . , k .
J⊂Ii , J finite j∈J
Once again, Theorem 3.1 implies independence of σ(F1 ), . . . , σ(Fk ) and com-
pletes the proof.
38
An immediate corollary of the above theorem is the following.
Corollary 3.1. If {Ai : i ∈ I} is an independent collection of σ-fields and for
each α ∈ Θ, ∅ =
6 Iα ⊂ I is such that
Iα ∩ Iβ = ∅ , for all α, β ∈ Θ , α 6= β ,
W
then { i∈Iα Ai : α ∈ Θ} is independent.
W
Exercise 3.6. 1. If Ai is a σ-field for all i ∈ I, show that A ∈ i∈I Ai if
and only if _
A∈ Ai ,
i∈I0
A ∈ σ(Ai : i ∈ I0 )
P (Y ∈ B) = P (Z ∈ B) , B ∈ B(R) .
E(XY ) = E(X)E(Y ) .
Proof. As the first step, we show that for simple non-negative independent ran-
dom variables X and Y ,
E(XY ) = E(X)E(Y ) .
39
Since X and Y are simple and non-negative,
m
X n
X
X= αi 1Ai , Y = βi 1Bi ,
i=1 i=1
= E(X)E(Y ) ,
the independence of σ(X) and σ(Y ) being used in the second line.
The second step is to show that for non-negative independent random vari-
ables X and Y , E(XY ) = E(X)E(Y ). There exist σ(X)-measurable simple
random variables sn such that 0 ≤ sn ↑ X and σ(Y )-measurable simple random
variables tn such that 0 ≤ tn ↑ Y . As sn and tn are independent by Exc 3.7,
the first step implies
E(sn tn ) = E(sn )E(tn ) .
Observing that 0 ≤ sn tn ↑ XY , letting n → ∞ and using MCT, it follows that
E(XY ) = E(X)E(Y ).
Finally suppose X and Y are independent and integrable. Then |X| and |Y |
are independent. The second step shows
Theorem 3.5. If X and Y are random variables with finite variance, then
Cov(X, Y ) is defined and
p
|Cov(X, Y )| ≤ Var(X)Var(Y ) .
40
2. If X and Y are independent and integrable, then Cov(X, Y ) exists and
equals zero.
3. If X has a finite variance, Cov(X, X) = Var(X).
4. If Cov(X, Y ) is defined, then
Cov(αX + γ, βY + δ) = αβCov(X, Y ) , α, β, δ, γ ∈ R .
Proof. Exc.
Definition 32. For random variables X and Y whose variances are finite and
positive, their correlation is
Cov(X, Y )
Corr(X, Y ) = p .
Var(X)Var(Y )
Exercise 3.8. For a random variable X, show that Var(X) = 0 if and only if
X is a degenerate random variable, that is, for some c ∈ R, X = c a.s.
Theorem 3.7. Suppose X and Y are non-degenerate random variables with
finite variances.
1. For α, β, γ, δ ∈ R with α, β 6= 0,
|Corr(X, Y )| ≤ 1 .
41
1. being used in the second line and the last line follows from the observation
that Corr(X, X) = 1 which is a restatement of Theorem 3.6.3. Thus aX +bY = c
a.s. for some a, b 6= 0 implies Corr(X, Y ) = ±1.
Conversely, suppose Corr(X, Y ) = ±1. Define
X − E(X) Y − E(Y )
X0 = p , Y0 = p .
Var(X) Var(Y )
X Y E(X) E(Y )
p − =p − a.s.
Var(X) Var(Y ) Var(X) Var(Y )
X Y E(X) E(Y )
p + =p + a.s.
Var(X) Var(Y ) Var(X) Var(Y )
This proves the “if” part and thus completes the proof.
Now we proceed towards showing that given a countable collection of CDFs,
there exist independent random variables with those CDFs. The first step in
that direction is the following result, which is an alternate way of proving the
second part of Theorem 2.1.
Theorem 3.8. Let F be a C.D.F. and define
because then it would follow from the fact that 0 < U < 1 a.s. that for x ∈ R,
P (F ← (U ) ≤ x) = P (U ≤ F (x)) = F (x) .
x0 ∈ {x ∈ R : F (x) ≥ y0 } .
Thus, x0 ≥ inf{x ∈ R : F (x) ≥ y0 } = F ← (y0 ), proving the “⇐” part, that is,
the “if” part.
42
For the reverse implication of (3.7), we shall show that y0 > F (x0 ) ⇒
F ← (y0 ) > x0 . Assume y0 > F (x0 ). Right continuity of F implies there exists
x1 > x0 with F (x1 ) < y0 . As F is non-decreasing,
F (−∞, x1 ] = 0, F (x1 ) ⊂ (0, y0 ) .
{x : F (x) ≥ y0 } ⊂ (x1 , ∞) .
Thus, inf{x : F (x) ≥ y0 } ≥ x1 > x0 . That is, F ← (y0 ) > x0 , as desired. This
proves the “⇒” implication, that is, the “only if” part of (3.7), which completes
the proof.
The next step in the same direction gives an alternate way of generating an
Uniform(0, 1) random variable.
Theorem 3.9. If X1 , X2 , . . . are i.i.d. from Bernoulli(1/2), that is, they take
values 0 and 1 with probability 1/2 each, then
∞
X
U= 2−n Xn
n=1
P Un = 2−n i = 2−n , i = 0, 1, . . . , 2n − 1 .
(3.8)
Since U1 = X1 /2, that is, U1 takes values 0 and 1/2 each with probability
1/2, (3.8) trivially holds for n = 1. Assume (3.8) for some n as the induction
hypothesis. Notice
Un+1 − Un = 2−n−1 Xn+1 . (3.9)
Since Un is measurable with respect to σ(X1 , . . . , Xn ) which is independent of
σ(Xn+1 ) by Corollary 3.1, Un is independent of Un+1 −Un . Another implication
of (3.9) and the fact Un ∈ {2−n i : i = 0, 1, . . . , 2n − 1} is that
(
{2−n i : i = 0, 1, . . . , 2n − 1} , if Un+1 − Un = 0 ,
Un+1 ∈ (3.10)
{2−n i + 2−n−1 : i = 0, 1, . . . , 2n − 1} , otherwise.
43
Since {2−n i : i = 0, 1, . . . , 2n − 1} ∩ {2−n i + 2−n−1 : i = 0, 1, . . . , 2n − 1} = ∅,
for i = 0, 1, . . . , 2n − 1, (3.10) implies
the penultimate line follows from (3.9), whereas (3.8) for n implies the last line.
A similar calculation with (3.10) shows
Observing that
P (U ≤ x) = lim P (Un ≤ x)
n→∞
= lim 2−n ([2n x] + 1)
n→∞
= x,
(3.8) implying the second line, [z] denoting the largest integer less than or equal
to z. Monotonicity of the CDF implies P (U ≤ 0) = 0 and P (U ≤ 1) = 1.
Hence,
0, x ≤ 0 ,
P (U ≤ x) = x, 0 < x < 1 ,
1, x ≥ 1 ,
44
Proof. Let (Ω, A, P ) be the probability space associated with infinite tosses of
a fair coin as in Example 2.3. Define for n = 1, 2, . . .,
45
The map T is measurable, that is, T −1 B ∈ A for all B ∈ B(RN ), because for all
n ≥ 1 and A1 , . . . , An ∈ B(R),
n
\
T −1 (A1 × . . . × An × R × R × . . .) = Xi−1 Ai ∈ A .
i=1
−1
Let P = P ◦ T . Then, for A1 , . . . , An as above,
P (A1 × . . . × An × R × R × . . .) = P T −1 (A1 × . . . × An × R × R × . . .)
n
!
\
−1
=P Xi Ai
i=1
n
Y
= P(Xi ∈ Ai )
i=1
Yn
= Pi (Ai ) ,
i=1
The above infinite product could be defined because P1 , P2 , . . . are all prob-
ability measures.
We conclude this chapter by pointing out that measure theory is indispens-
able for a rigorous treatment of probability theory, which is now amply clear.
The following are a few instances where usage of measure theory was necessary.
1. For studying the simple random experiment of infinite tosses of a fair coin,
as in Example 2.3.
2. For defining expectation of a general random variable, that is, one which
is neither discrete nor has a density.
3. Showing linearity of expectation is very difficult, even for random variables
having a density, without the measure theoretic definition.
4. Answering the question of when a random variable has a density is im-
possible without the Radon-Nikodym theorem.
5. Last but not the least, the study of independence in this chapter became
much easier thanks to measure theory.
46
4 Several random variables
Definition 35. For random variables X1 , . . . , Xd , which by convention are de-
fined on the same probability space (Ω, A, P ), the joint CDF of (X1 , . . . , Xd ) is
a function F : Rd → [0, 1] defined by
F (x1 , . . . , xd ) = P (X1 ≤ x1 , . . . , Xd ≤ xd ) , x1 , . . . , xd ∈ R .
A joint CDF will often be referred to simply by ‘CDF’. For this chapter, we
introduce the following notations:
lim F (x1 , . . . , xd ) = 0 ,
i1 →−∞,...,ik →−∞
Proof. Exercise.
The following measure-theoretic fact is a d-dimensional generalization of
Theorem 1.4.
Fact 4.1. If F : Rd → R is a function which is continuous from above and
satisfies ∆R F ≥ 0 for all R ∈ H, then there exists a unique Radon measure µ
on (Rd , B(Rd )) such that
47
Neither the above fact nor the following theorem, which is built on it and
gives a converse of Theorem 4.1, will be used much in the course. Nonetheless,
a proof of the above fact is given in Subsection 9.2 of the Appendix.
Theorem 4.2. If F : Rd → [0, 1] satisfies 1.-4. of Theorem 4.1, then there exist
random variables X1 , . . . , Xd defined on some probability space such that F is
the joint CDF of (X1 , . . . , Xd ).
Proof. Since F satisfies 1. and 2., the preceding fact implies there exists a Radon
measure µ on (Rd , B(Rd )) such that
µ(R) = ∆R F , R ∈ H .
Our first goal is to show µ is a probability measure. To that end, rewrite the
above for R = (−m, n]d , where m, n ∈ N, as
X
µ (−m, n]d = (−1)#{i:xi =−m} F (x1 , . . . , xd ) , m, n ∈ N .
In the right hand side above, if n is fixed and m → ∞, then every term except
F (n, . . . , n) goes to zero by 3. of Theorem 4.1, which F satisfies by hypothesis.
As µ is a measure, the left hand side increases to µ((−∞, n]d ) as m → ∞.
Therefore,
µ (−∞, n]d = F (n, n, . . . , n) , n ≥ 1 .
(4.3)
Let n → ∞ and use 4. to conclude µ(Rd ) = 1. In other words, µ is a probability
measure.
Let Ω = Rd , A = B(Rd ) and P = µ. Define Xi : Rd → R by
Xi (x1 , . . . , xd ) = xi , (x1 , . . . , xd ) ∈ Rd ,
As the above is the same as saying F is the CDF of (X1 , . . . , Xd ), the proof
follows.
Definition 36. For random variables X1 , . . . , Xd , the joint distribution of
(X1 , . . . , Xd ) is the measure on (Rd , B(Rd )) given by P ◦(X1 , . . . , Xd )−1 , that is,
the measure pushed forward to Rd by (X1 , . . . , Xd ). For a measurable function
f : Rd → R, the integral of f with respect to the measure P ◦ (X1 , . . . , Xd )−1 , if
defined, is denoted by
Z ∞ Z ∞
... f (x1 , . . . , xd )P (X1 ∈ dx1 , . . . , Xd ∈ dxd ) .
−∞ −∞
d
For random variables Y1 , . . . , Yd , (X1 , . . . , Xd ) = (Y1 , . . . , Yd ) means
48
The following result is similar to its one-dimensional analogue.
Theorem 4.3. 1. For random variables X1 , . . . , Xd and a Borel measurable
f : Rd → R,
Z ∞ Z ∞
E (f (X1 , . . . , Xd )) = ... f (x1 , . . . , xd )P (X1 ∈ dx1 , . . . , Xd ∈ dxd ) ,
−∞ −∞
if and only if the CDFs of (X1 , . . . , Xd ) and (Y1 , . . . , Yd ) are the same.
Proof. 1. Follows from Theorem 1.6.
2. Follows from (4.2) and the observation that
( d ! )
Y
d
(ai , bi ] ∩ R : −∞ ≤ ai ≤ bi ≤ ∞ , i = 1, . . . , d
i=1
is a semi-field and for every set in the above class, there exists sets in H increas-
ing to that.
Definition 37. For discrete random variables X1 , . . . , Xd , the joint PMF of
(X1 , . . . , Xd ) is the function p : Rd → [0, 1] defined by
p(x1 , . . . , xd ) = P (X1 = x1 , . . . , Xd = xd ) , x1 , . . . , xd ∈ R .
49
for all x1 , . . . , xk ∈ R. If f is the joint density of (X1 , . . . , Xd ), then for k =
1, . . . , d − 1, g : Rk → R defined by
Z ∞ Z ∞
g(x1 , . . . , xk ) = ... f (x1 , . . . , xd ) dxk+1 . . . dxd , x1 , . . . , xk ∈ R ,
−∞ −∞
50
The following is a converse of the above, and in fact, slightly stronger than
that.
Theorem 4.7. If X1 , . . . , Xd are discrete random variables for which there exist
c ∈ R and functions p1 , . . . , pd : R → R such that
d
Y
P (X1 = x1 , . . . , Xd = xd ) = c pi (xi ) , x1 , . . . , xd ∈ R ,
i=1
Thus,
d Z
Y ∞ Z ∞ Z ∞ d
Y
|c| |fi (xi )| dxi = ... |c| |fi (xi )| dx1 . . . dxd
i=1 −∞ −∞ −∞ i=1
= 1,
This allows integrating both sides of (4.4) over (x1 , . . . , xd ) ∈ Rd which yields
c α1 . . . αd = 1 , (4.6)
where Z ∞
αi = fi (x) dx , i = 1, . . . , d .
∞
51
Theorem 4.5 shows that for i = 1, . . . , d, the density gi of Xi can be obtained
by fixing xi and integrating the right hand side of (4.4) over all other variables.
That is,
Y
gi (xi ) = c αj fi (xi ) , xi ∈ R ;
j∈{1,...,d}\{i}
(4.6) implies
gi (x) = αi−1 fi (x) , x ∈ R .
Use this to rewrite (4.4) as
d
Y
f (x1 , . . . , xd ) = c αi gi (xi )
i=1
d
Y
= gi (xi ) ,
i=1
for all x1 , . . . , xd ∈ R, the last line following from (4.6). Therefore, for B1 , . . .
. . . , Bd ∈ B(R),
Z Z
P (X1 ∈ B1 , . . . , Xd ∈ Bd ) = ... f (x1 , . . . , xd ) dxd . . . dx1
B1 Bd
Z Z d
Y
= ... gi (xi ) dxd . . . dx1
B1 Bd i=1
d Z
Y
= gi (xi ) dxi
i=1 Bi
d
Y
= P (Xi ∈ Bi ) ,
i=1
the equality in the last line holds because g1 , . . . , gd are the respective densities
of X1 , . . . , Xd . Thus, X1 , . . . , Xd are independent.
If Z ∞
fi (x) dx = 1 , i = 1, . . . , d ,
−∞
52
for almost all (x1 , . . . , xm , y1 , . . . , yn ) ∈ Rm+n , where fX and fY are the den-
sities of (X1 , . . . , Xm ) and (Y1 , . . . , Yn ), respectively. Conversely, if there exist
measurable g : Rm → R and h : Rn → R and c ∈ R such that
f (x1 , . . . , xm , y1 , . . . , yn ) = cg(x1 , . . . , xm )h(y1 , . . . , yn ) ,
for almost all (x1 , . . . , xm , y1 , . . . , yn ) ∈ Rm+n , then show that (X1 , . . . , Xm )
and (Y1 , . . . , Yn ) are independent. Besides, if
Z Z
g(x) dx = 1 = h(x) dx ,
Rm Rn
then show that c = 1, that g, h are non-negative a.e., and that g ∨ 0 and h ∨ 0
are the respective densities of (X1 , . . . , Xm ) and (Y1 , . . . , Yn ).
Theorem 4.8. Suppose X = (X1 , . . . , Xd ) is a random vector with P (X ∈
U ) = 1 for some open set U ⊂ Rd . Let ψ : U → V be a bijection for some open
set V ⊂ Rd . Let T : V → U be the inverse of ψ. Assume T is continuously
differentiable and its Jacobian matrix J(y) at y ∈ V , defined by
∂T (y)
J(y) = ,
∂y
is non-singular for all y ∈ V . Then the joint density of Y = (Y1 , . . . , Yd ) = ψ(X)
is (
f ◦ T (y)| det(J(y))| , y ∈ V ,
g(y) =
0, y∈/V.
the penultimate line following from Theorem 1.12. Hence the proof follows.
Example 4.1. Let X ∼ Gamma(α) and Y ∼ Gamma(β) independently of each
other. We want to find the distribution of W = X/(X + Y ).
Theorem 4.8 is the only tool at our disposal, which is valid for one-one
functions from an open subset of R2 to R2 . Therefore, we define an auxiliary
random variable Z = X + Y . Thus, (W, Z) = ψ(X, Y ) where ψ : U → V is a
bijection defined by
x
ψ(x, y) = , x + y , (x, y) ∈ U ,
x+y
53
and U = (0, ∞)2 and V = (0, 1) × (0, ∞) are open sets. The inverse of ψ is
T : V → U defined by
showing | det J(w, z)| = z for (w, z) ∈ V . The joint density of (X, Y ) is
1
f (x, y) = e−x−y xα−1 y β−1 , (x, y) ∈ U .
Γ(α)Γ(β)
where
1
h1 (w) = wα−1 (1 − w)β−1 1(0 < w < 1) , w ∈ R ,
B(α, β)
1
h2 (z) = e−z z α+β−1 1(z > 0) , z ∈ R ,
Γ(α + β)
and
B(α, β)Γ(α + β)
c= .
Γ(α)Γ(β)
Since h1 and h2 are densities of Beta(α, β) and Gamma(α + β), respectively,
Theorem 4.7 shows that c = 1 and W and Z are independent with respective
densities h1 and h2 . In particular, this means X/(X + Y ) follows Beta(α, β).
Furthermore, c = 1 reconfirms that
Γ(α)Γ(β)
B(α, β) = , α, β > 0 .
Γ(α + β)
54
Theorem 4.9. If X = (X1 , . . . , Xd ) has density f , A is a d × d non-singular
matrix and
Y = AX + µ ,
for some fixed µ ∈ Rd , then Y = (Y1 , . . . , Yd ) has density
1
f A−1 (y − µ) , y ∈ Rd .
g(y) =
| det (A) |
ψ(x) = Ax + µ , x ∈ Rd ,
T (y) = A−1 (y − µ) , y ∈ Rd ,
The next result is a striking application of Theorem 4.8 which is very useful
in statistics.
Theorem 4.10. If X1 , . . . , Xn are i.i.d. from standard normal for n ≥ 2, and
n n
1X X
X̄ = Xi , and S = (Xi − X̄)2 ,
n i=1 i=1
Yn
The density of X is
n
!
−n/2 1X 2 1
f (x) = (2π) exp − x = (2π)−n/2 exp − xT x ,
2 i=1 i 2
55
for all x = (x1 , . . . , xn ) ∈ Rn . Theorem 4.9 shows that the density of Y is
1
f PTy
g(y) = T
| det(P )|
T
−n/2 1 T T T
since det(P ) = ±1 = (2π) exp − (P y) (P y)
2
−n/2 1 T T
= (2π) exp − y P P y
2
−n/2 1 T
= (2π) exp − y y ,
2
for y ∈ Rn , the last line following from the fact that P is an orthogonal matrix.
Thus,
n
Y 1 2
g(y) = √ e−yi /2 , y = (y1 , . . . , yn ) ∈ Rn . (4.7)
i=1
2π
Write
n
X
Xi2 − 2Xi X̄ + (X̄)2
S=
i=1
n
X
= Xi2 − n(X̄)2
i=1
n
X
= Yi2 − Y12
i=1
n
X
= Yi2 ,
i=2
56
Definition
Pn 38. If Z1 , . . . , Zn are i.i.d. from standard normal, the distribution
of i=1 Zi2 is called χ2n .
Exercise 4.2. Show that S, which is as in Theorem 4.10, has the χ2n−1 distri-
bution.
Exercise 4.3. Let X1 , . . . , Xn be i.i.d. from standard normal, and
X1
..
X= . .
Xn
Y = µ + Σ1/2 X . (4.10)
X ∼ Nn (µ, Σ) .
57
Writing
Y = Σ−1/2 X − Σ−1/2 µ ,
Theorem 4.9 with A = Σ−1/2 and µ replaced by −Σ−1/2 µ therein implies that
the density of Y is
1
1/2
−1/2
g(y) = f Σ y + Σ µ
det(Σ−1/2 )
1
1/2
= f Σ y + µ
det(Σ−1/2 )
−1/2 −1/2
−n/2 1 1/2 T −1 1/2
det(Σ ) = (det(Σ)) = (2π) exp − Σ y Σ Σ y
2
1
= (2π)−n/2 exp − y T y ,
2
the last line follows from the fact that Σ1/2 is symmetric and
Like in (4.7) and the subsequent argument, Theorem 4.7 shows Y1 , . . . , Yn are
i.i.d. from standard normal, which completes the proof.
The next theorem shows that if X ∼ Nn (µ, Σ), then µ and Σ are the “mean
vector” and the “covariance matrix” of X, respectively.
Theorem 4.12. If X ∼ Nn (µ, Σ) where
then
E(Xi ) = µi , i = 1, . . . , n ,
Cov(Xi , Xj ) = σij , 1 ≤ i, j ≤ n .
X = µ + Σ1/2 Y ,
or
n
X
Xi = µi + θij Yj , i = 1, . . . , n ,
j=1
where Σ1/2 = ((θij ))1≤i,j≤n . Since Y1 , . . . , Yn are zero mean random variables,
it immediately follows E(Xi ) = µi for i = 1, . . . , n. Theorem 3.6 shows that for
58
fixed 1 ≤ i, j ≤ n,
n n
!
X X
Cov(Xi , Xj ) = Cov θik Yk , θjl Yl
k=1 l=1
n X
X n
= θik θjl Cov (Yk , Yl )
k=1 l=1
Xn
= θik θjk ,
k=1
the last line following from the fact that Y1 , . . . , Yn are independent and each
has variance one. Recalling that θik is the (i, k)-th entry of Σ1/2 which is a
symmetric matrix, write
n
X n
X
θik θjk = θik θkj
k=1 k=1
Cov(Xi , Xj ) = σij , 1 ≤ i, j ≤ n .
BX + µ ∼ Nm µ, BB T .
59
Proof. Let us first deal with the case m = n. In this case, B is a non-singular
matrix as Rank(B) = m. The density of X is
−n/2 1 T
f (x) = (2π) exp − x x , x ∈ Rn .
2
Theorem 4.9 implies that the density of Y = BX + µ is
1
f B −1 (y − µ)
g(y) =
| det(B)|
1 1 −1 T −1
= exp − (B (y − µ)) (B (y − µ))
| det(B)|(2π)n/2 2
1 1 T T −1
= exp − (y − µ) (BB ) (y − µ)
| det(B)|(2π)n/2 2
1 1 T −1
= p exp − (y − µ) Σ (y − µ) ,
(2π)n/2 det(Σ) 2
Thus for y (1) ∈ Rm and y (2) ∈ Rn−m , which are column vectors by convention,
and letting (1)
y
y = (2) , (4.14)
y
60
we get
−1 −1 −1
(y − µ̃)T AAT (y − µ̃) = (y (1) − µ)T BB T (y (1) − µ) + y (2) CC T y (2) .
(4.15)
Recall (4.12) to write the density of (Y1 , . . . , Yn ) as
1 1 −1
g(y) = p exp − (y − µ̃)T AAT (y − µ̃)
(2π)n/2 det(AAT ) 2
1 1 (1) T T −1
(1)
= p exp − (y − µ) BB (y − µ)
(2π)n/2 det(AAT ) 2
1 −1 (2)
× exp − y (2) CC T y ,
2
where g1 and g2 are the densities of Nm (µ, BB T ) and Nn−m (0, CC T ), respec-
tively. Exc 4.1 shows that (Y1 , . . . , Ym ) and (Ym+1 , . . . , Yn ) are independent
from Nm (µ, BB T ) and Nn−m (0, CC T ), respectively. Since
Y1
..
. = BX + µ ,
Ym
BX + µ ∼ Nm (µ, BB T ) ,
61
Write
BX = B(X − µ) + Bµ = BΣ1/2 Σ−1/2 (X − µ) + Bµ = AY + Bµ ,
where A = BΣ1/2 . Since A is an m × n matrix with Rank(A) = m because Σ1/2
is non-singular, Theorem 4.13 shows
AY + Bµ ∼ Nm Bµ, AAT .
Observing that
AAT = BΣB T ,
the proof follows.
An immediate corollary of the above theorem is the following.
Corollary 4.1. If X1 , . . . , Xn are independent and Xi ∼ N (µi , σi2 ) for i =
1, . . . , n, then !
X n Xn n
X
Xi ∼ N µi , σi2 .
i=1 i=1 i=1
62
Exercise 4.7. Suppose (X1 , . . . , Xn ) ∼ Nn (µ, Σ) where µ = (µ1 , . . . , µn ) and
Σ = ((σij ))1≤i,j≤n .
1. Show that Xi ∼ N (µi , σii ) for i = 1, . . . , n.
2. If i1 , . . . , ik ∈ {1, . . . , n} are distinct, show that
(Xi1 , . . . , Xik ) ∼ Nk (µij )1≤j≤k , ((σij ,ij0 ))1≤j,j 0 ≤k .
if and only if
Cov(Xi , Xj ) = 0 for all i ∈ A, j ∈ B .
Exercise 4.8. Suppose X and Y are i.i.d. from standard normal. Let ρ ∈
(−1, 1) and set p
Z = ρX + 1 − ρ2 Y .
Show that
0 1 ρ
(X, Z) ∼ N2 , .
0 ρ 1
Exercise 4.9. If X1 , . . . , Xn are i.i.d. from standard normal, P is an m × n
matrix with 1 ≤ m < n and P P T = Im , and
(Y1 , . . . , Ym ) = Y = P X ,
show that
m
X
Yi2 ∼ χ2m ,
i=1
n
X m
X
Xi2 − Yi2 ∼ χ2n−m ,
i=1 i=1
and
m
X n
X m
X
Yi2 , Xi2 − Yi2 are independent.
i=1 i=1 i=1
The last topic to be studied in this chapter is order statistics. We start with
defining the same.
Definition 40. The ascending sort map is a map T : Rn → Rn which sorts
the entries of a vector in ascending order, that is, T (x1 , . . . , xn ) = (y1 , . . . , yn )
means y1 ≤ . . . ≤ yn and (y1 , . . . , yn ) is a permutation of (x1 , . . . , xn ) for all
(x1 , . . . , xn ) ∈ Rn . For random variables X1 , . . . , Xn , their order statistics
X(1) , . . . , X(n) are defined by
X(1) , . . . , X(n) = T (X1 , . . . , Xn ) .
63
The ascending order map is a continuous function from Rn to Rn and thus
Borel measurable. Therefore, if X1 , . . . , Xn are random variables, which by
convention are defined on the same probability space, their order statistics are
random variables as well.
Theorem 4.15. If X1 , . . . , Xn are i.i.d. from some density f , then the density
of their order statistics (X(1) , . . . , X(n) ) is
(
n!f (x1 ) . . . f (xn ) , x1 < x2 < . . . < xn ,
g(x1 , . . . , xn ) =
0, otherwise .
The proof uses the following exercise which is a special case of Exc 1.7.
Exercise 4.10. Suppose P, Q are finite measures on (Rn , B(Rn )) such that for
some open set U ,
P (U c ) = 0 = Q(U c ) .
If P (R) = Q(R) for all R = (a1 , b1 ]×. . .×(an , bn ] ⊂ U with −∞ < ai < bi < ∞,
i = 1, . . . , n, show that P and Q agree on B(Rn ).
Proof of Theorem 4.15. Letting U = {(x1 , . . . , xn ) ∈ Rn : x1 < . . . < xn }, the
proof would follow from the above exercise once it is shown that
Z
c
P (X(1) , . . . , X(n) ) ∈ U = 0 = g(x1 , . . . , xn ) dx1 . . . dxn , (4.16)
Uc
and Z
P (X(1) , . . . , X(n) ) ∈ R = g(x1 , . . . , xn ) dx1 . . . dxn , (4.17)
R
for all
Consequently,
This along with the obvious fact that X(1) ≤ . . . ≤ X(n) shows that
P (X(1) , . . . , X(n) ) ∈ U c = 0 .
64
That Z
g(x1 , . . . , xn ) dx1 . . . dxn = 0
Uc
follows tautologically from the definition of g. That is, (4.16) holds.
For (4.17), fix R as in (4.18). An immediate consequence of R ⊂ U is that
a1 < b1 ≤ a2 < b2 ≤ . . . ≤ an < bn and hence
(ai , bi ] ∩ (aj , bj ] = ∅ , 1 ≤ i < j ≤ n . (4.19)
Therefore,
P (X(1) , . . . , X(n) ) ∈ R
= P ai < X(i) ≤ bi , i = 1, . . . , n
[
=P ai < Xπ(i) ≤ bi , i = 1, . . . , n
π permutation of {1,...,n}
X
= P ai < Xπ(i) ≤ bi , i = 1, . . . , n ,
π permutation of {1,...,n}
the penultimate line following from the fact that X(1) (ω), . . . , X(n) (ω) is a per-
mutation of X1 (ω), . . . , Xn (ω) for every ω ∈ Ω, and the last line follows from
the observation that for distinct permutations π and π 0 ,
ai < Xπ(i) ≤ bi , i = 1, . . . , n ∩ ai < Xπ0 (i) ≤ bi , i = 1, . . . , n = ∅ ,
which is another consequence of (4.19).
For a fixed permutation π, the independence of Xπ(1) , . . . , Xπ(n) implies
X n
Y
P (X(1) , . . . , X(n) ) ∈ R = P ai < Xπ(i) ≤ bi
π permutation of {1,...,n} i=1
n Z bi
d
X Y
Xπ(i) = X1 , i = 1, . . . , n = f (x) dx
π permutation of {1,...,n} i=1 ai
n Z
Y bi
= n! f (x) dx
i=1 ai
Z b1 Z bn
= n! ... f (x1 ) . . . f (xn ) dxn . . . dx1
a1 an
Z
= g(x1 , . . . , xn ) dx1 . . . dxn .
R
Thus (4.17) holds for every R as in (4.18). This completes the proof.
Exercise 4.11. If (X1 , . . . , Xn ) is a random vector in Rn with joint density f
and (X(1) , . . . , X(n) ) is its order statistic, show that the density of the latter is
(P
π permutation of {1,...,n} f xπ(1) , . . . , xπ(n) , x1 < . . . < xn ,
g(x1 , . . . , xn ) =
0, otherwise .
65
5 Conditional expectation
Following the usual convention, (Ω, A, P ) is the probability space underlying
everything we talk about, unless specifically mentioned otherwise. As in Defi-
nition 24, for A, B ∈ A with P (A) > 0, the conditional probability of B given
that A has occurred is
P (B ∩ A)
P (B|A) = .
P (A)
Suppose now that 0 < P (A) < 1 and we want to define the conditional prob-
ability of B given that we know whether A has occurred or not. If A has
occurred, then the above is the natural definition of the said conditional prob-
ability, whereas it is
P (B ∩ Ac )
P (Ac )
c
if A has not occurred, that is, A has occurred. In other words,
P (B ∩ A) P (B ∩ Ac )
1A + 1Ac (5.1)
P (A) P (Ac )
is a natural definition of the conditional probability of B given the knowl-
edge of whether A has occurred or not. The said conditional probability is
thus a random variable depending on 1A and 1Ac alone. To generalize this, if
A1 , A2 , A3 , . . . are mutually and exhaustive events of positive probability, that
is,
∞
[
Ai ∩ Aj = ∅ for all i 6= j , Ai = Ω , and P (Ai ) > 0 , i = 1, 2, . . . , (5.2)
i=1
then a similar reasoning says that the conditional probability of B given the
knowledge which one of A1 , A2 , . . . has occurred is
∞
X P (B ∩ Ai )
Z= 1Ai ,
i=1
P (Ai )
Therefore,
X
P (B ∩ E) = P (B ∩ Ai )
i∈N
Z XZ
P (B ∩ Ai )
Z
Z dP = dP = P (B ∩ Ai ) = Z dP
Ai P (Ai ) Ai Ai
i∈N
Z
= Z dP .
E
66
The conditional probability of B, given the knowledge of which one of A1 , A2 , . . .
has occurred, is thus a random variable Z that is σ({A1 , A2 , . . .})-measurable
and satisfies
Z
Z dP = P (B ∩ E) for all E ∈ σ({A1 , A2 , . . .}) .
E
R
Since the right hand side is the same as E 1B dP , and interpreting the condi-
tional probability of B as the “conditional expectation” of 1B , a natural can-
didate for the latter is thus a σ({A1 , A2 , . . .})-measurable random variable Z
satisfying Z Z
Z dP = 1B dP for all E ∈ σ({A1 , A2 , . . .}) .
E E
Reasoning along similar lines, for an integrable random variable X and a
σ-field F ⊂ A, the conditional expectation of X given F should be defined as
an F-measurable random variable Z which satisfies
Z Z
Z dP = X dP for all E ∈ F .
E E
The following theorem guarantees the existence of such Z and its uniqueness
upto zero probability sets.
Theorem 5.1. For an integrable random variable X and a σ-field F ⊂ A, there
exists an integrable random variable Z which is F-measurable and satisfies
Z Z
Z dP = X dP for all E ∈ F . (5.3)
E E
0
If Z is another F-measurable and integrable random variable such that the above
holds with Z replaced by Z 0 , then Z 0 = Z a.s.
Proof. Write X = X + − X − where X + = X ∨ 0 and X − = (−X) ∨ 0. Since X
is integrable, so are X + and X − . Define measures µ+ and µ− on (Ω, F) by
Z Z
µ+ (E) = X + dP , µ− (E) = X − dP for all E ∈ F .
E E
+ −
Thus µ and µ are finite measures on (Ω, F) and each of them is absolutely
continuous with respect to P . Theorem 1.7, which is the Radon-Nikodym the-
orem, implies there exist F-measurable functions Z1 and Z2 from Ω to [0, ∞)
satisfying
Z Z
Z1 dP = µ+ (E) , and Z2 dP = µ− (E) , for all E ∈ F .
E E
67
Since Z − Z 0 is F-measurable, the above implies Z − Z 0 = 0 a.s. This completes
the proof.
Definition 41. For an integrable random variable X and a σ-field F ⊂ A,
the conditional expectation of X given F, denoted by E(X|F), is an integrable
random variable Z which is F-measurable and satisfies (5.3). Theorem 5.1 guar-
antees the existence of such Z and its uniqueness upto sets of zero probability.
While the above definition is motivated by (5.1), the same can be arrived at
by another route of reasoning. Recall that for any X ∈ L2 (Ω),
that is, E(X) is the unique α ∈ R at which the right hand side is minimized.
When no additional information is available, minimization over the set of real
numbers makes sense. However, if for a σ-field F ⊂ A, we know for each
E ∈ F whether it has occurred or not, then the class should be expanded to all
F-measurable functions, because any F-measurable function is now “known”.
The following result makes this idea precise.
Theorem 5.2. If E(X 2 ) < ∞, and F ⊂ A is a σ-field, then
Z
2
E(X|F) = arg min (X − Y ) dP : Y is measurable with respect to F a.s.
Our first task is to show that the above Z actually minimizes the L2 distance
from X over all F-measurable functions. Indeed, for a random variable Y with
E(Y 2 ) = ∞, it holds that
Z
(X − Y )2 dP = ∞
Thus,
Z
2
Z = arg min (X − Y ) dP : Y is measurable with respect to F .
68
To complete the proof, all that remains to show is Z = E(X|F) a.s. Since Z is
F-measurable, this would follow once (5.3) is shown to hold for this Z.
Results in functional analysis show that Z as in (5.4) is an orthogonal pro-
jection onto L2 (Ω, F, P ). That is, X − Z belongs to the orthogonal complement
of L2 (Ω, F, P ). Since 1E ∈ L2 (Ω, F, P ) for any E ∈ F, it thus follows that
Z
(X − Z)1E dP = 0 ,
the equality in the last line is implied by the fact that Y is the Radon-Nikodym
derivative of µ with respect to P on (Ω, A) and the inequality follows from the
hypothesis that XY is integrable. Thus XZ is P -integrable.
Thus, for all E ∈ F, XZ1E is P -integrable, showing by a similar argument
that Z Z
X1E Z dP = X1E dµ , (5.5)
(Ω,F ) (Ω,F )
69
because X1E is F-measurable. A similar argument shows
Z Z
X1E Y dP = X1E dµ . (5.6)
(Ω,A) (Ω,A)
Once again, the right hand sides of (5.5) and (5.6) are equal by Exc 1.8. This
shows Z Z
XZ dP = XY dP , E ∈ F .
E E
2. For α ∈ R,
E(αX|F) = αE(X|F) a.s.
3. If X ≥ 0 a.s., then
E(X|F) ≥ 0 a.s.
4. If X ≤ Y a.s., then
E(X|F) ≤ E(Y |F) a.s.
The following is the so-called tower property and is in line with the intuition
that conditional expectation given F of an L2 random variable is projection
onto L2 (Ω, F, P ) as in Theorem 5.2.
70
Theorem 5.5 (Tower property). If F ⊂ G ⊂ A and F, G are σ-fields, then for
any integrable X,
E E(X|G) F = E(X|F) a.s.
Proof. Let Y = E(X|G) and Z = E(X|F). Then for any A ∈ F,
Z Z
Z dP = X dP
A
ZA
(Y = E(X|G) and A ∈ F ⊂ G) = Y dP .
A
Proof. Follows from Theorem 5.5 by taking F = {∅, Ω} and observing that for
any integrable random variable Z,
E(Z|F) = E(Z) .
71
the last line again following from the independence of F and G and that Z1A
and 1B are measurable with respect to them, respectively. Thus, (5.7) holds for
all E ∈ S.
Since S is a semi-field, (5.7) can easily be shown to hold for all E in the field
generated by S. Finally, standard arguments using Theorem 1.2, which is the
monotone class theorem, completes the proof.
Corollary 5.2. If X is integrable and G is a σ-field independent of σ(X), then
Definition 42. If X and Y are random variables and the former is integrable,
define
E(X|Y ) = E X|σ(Y ) .
Theorem 5.7. Suppose X and Y are independent and f : R2 → R is a Borel
function such that
E (|f (X, Y )|) < ∞ .
Then Z
Y ∈ y∈R: |f (x, y)|P (X ∈ dx) < ∞ a.s. (5.8)
R
Further,
E (f (X, Y )|Y ) = g(Y ) ,
where
(R R
f (x, y)P (X ∈ dx) , if R |f (x, y)|P (X ∈ dx) < ∞ ,
g(y) = R
0, otherwise .
Thus,
Z Z
|f (x, y)|µX (dx)µY (dy) = E (|f (X, Y )|)
R R
< ∞.
72
Tonelli’s theorem implies
Z
µY y ∈ R : |f (x, y)|µX (dx) < ∞ = 1 ,
R
Since E(e|X| ) < ∞ because X follows normal, Y eXY is integrable. The tower
property implies
E Y eXY = E E Y eXY |Y
Exercise 5.2. For independent X and Y and 0 < p < ∞, show that
|X + Y |p ≤ 2p (|X|p + |Y |p ) ,
73
and doesn’t really need independence.
For the “⇒” part, assume
E (|X + Y |p ) < ∞ .
In particular, the above set is non-empty. Thus, there exists y ∈ R such that
Since
|X|p ≤ 2p (|X + y|p + |y|p ) ,
we get E(|X|p ) < ∞. This also shows E(|Y |p ) < ∞ and thus proves the “⇒”
part.
Exercise 5.3. A random variable X is infinitely divisible if for all fixed n =
1, 2, . . ., there exist i.i.d. random variables Xn1 , . . . , Xnn defined on some prob-
ability space such that
d
X = Xn1 + . . . + Xnn . (5.9)
If X is an infinitely divisible random variable with mean zero and variance one,
show that
E(X 4 ) ≥ 3 .
Hint. Use the above exercise and HW3/15.
Exercise 5.4. If X and Y are independent and either of them is a continuous
random variable, show that
X 6= Y a.s.
Exercise 5.5. If
0 1 ρ
(X, Y ) ∼ N2 , ,
0 ρ 1
show that E(Y |X) = ρX.
74
Definition 43. For random variables Xn and X, if it holds that
n o
P ω ∈ Ω : lim Xn (ω) = X(ω) = 1,
n→∞
Xn → X a.s.
Theorem 6.1. If Xn → X a.s. and |Xn | ≤ Y for some Y which has finite
expectation, then show that X has a finite expectation, and
Proof. Exercise.
Definition 44. A sequence of random variables (Xn ) converges in probability
P
to X or Xn −→ X if for every ε > 0,
P P
Exercise 6.3. If Xn −→ X and Xn −→ X 0 , then show that X = X 0 , a.s.
P
Theorem 6.2. If Xn → X a.s., then Xn −→ X.
Proof. Assume that Xn → X a.s. Fix ε > 0. Clearly,
and hence
1(|Xn − X| > ε) → 0 a.s.
Theorem 6.1 shows
lim E (1(|Xn − X| > ε)) = 0 ,
n→∞
that is,
P (|Xn − X| > ε) → 0 .
P
Since this holds for all ε > 0, it follows that Xn −→ X which completes the
proof.
75
Example 6.1. Let Ω = (0, 1], A = B((0, 1]) and P be the restriction of Lebesgue
measure to (0, 1]. Define for all ω ∈ Ω,
1
X1 (ω) = 1 0 < ω ≤ ,
2
1
X2 (ω) = 1 <ω≤1 ,
2
1
X3 (ω) = 1 0 < ω ≤ ,
4
1 1
X4 (ω) = 1 <ω≤ ,
4 2
1 3
X5 (ω) = 1 <ω≤ ,
2 4
3
X6 (ω) = 1 <ω≤1 ,
4
..
. .
P
Then, Xn −→ 0 but
P lim Xn = 0 = 0 .
n→∞
1. For 1 ≤ q ≤ p, Xn → X in Lq .
P
2. As n → ∞, Xn −→ X.
76
Proof. 1. Follows from the fact that for any random variable Z,
kZkq ≤ kZkp ,
φ (E(|Z|q )) ≤ E (φ(|Z|q )) ,
77
Thus
n
1X
Xi → µ as n → ∞ ,
n i=1
then
P ({ω ∈ Ω : ω ∈ An for infinitely many n}) = 0 .
Proof. Let
Bn = An ∪ An+1 ∪ . . . , n ≥ 1 ,
and
∞
\
B∞ = Bn .
n=1
Clearly,
B∞ = {ω ∈ Ω : ω ∈ An for infinitely many n} .
Furthermore, since Bn ↓ B∞ , it follows that
∞
X
P (B∞ ) = lim P (Bn ) ≤ lim P (Ak ) = 0 ,
n→∞ n→∞
k=n
Xnk → X a.s. ,
as k → ∞.
P
Proof. Since Xn −→ X, there exists n1 such that
1
P (|Xn1 − X| > 1) ≤ .
2
There exists N2 such that
1
P |Xn − X| > ≤ 2−2 for all n ≥ N2 .
2
78
Define n2 = N2 ∨ (n1 + 1). Proceeding similarly, we get positive integers n1 <
n2 < n3 < . . . such that
1
P |Xnk − X| > ≤ 2−k for all k .
k
Hence,
∞
X 1
P |Xnk − X| > < ∞.
k
k=1
Thus,
Xnk → X a.s. ,
as k → ∞. This completes the proof.
P
Exercise 6.7. If Xn −→ X and |Xn | ≤ Y for some Y with E(Y ) < ∞, then
prove that
lim E(Xn ) = E(X) .
n→∞
Exercise 6.8. Prove or disprove the following claim. If Xn and X are ran-
dom variables such that any subsequence {Xnk : k ≥ 1} of Xn has a further
subsequence {Xnkl : l ≥ 1} such that
Xnkl → X a.s. ,
then Xn → X a.s.
Exercise 6.9. Show that the following are equivalent for random variables Xn
and X.
P
1. As n → ∞, Xn −→ X.
2. Every subsequence {Xnk : k ≥ 1} of {Xn : n ≥ 1} has a further subse-
quence {Xnkl : l ≥ 1} such that as l → ∞,
Xnkl → X a.s.
P
Xnkl −→ X .
79
Theorem 6.8. If X, X1 , X2 , . . . are random variables such that
∞
X
P (|Xn − X| > ε) < ∞ for all ε > 0 ,
n=1
then Xn → X a.s.
Proof. Let
Z = lim sup |Xn − X| ,
n→∞
which is a possibly improper random variable. The proof would follow if it can
be shown that Z = 0 a.s., which is the same as
For ε > 0,
The hypothesis in conjunction with the Borel-Cantelli lemma shows that the
right hand side is zero. Thus (6.1) follows, which completes the proof.
Theorem 6.9 (Strong law of large numbers (SLLN) for finite fourth moment).
If X1 , X2 , . . . are i.i.d. random variables with finite fourth moment, show that
as n → ∞,
n
1X
Xi → E(X1 ) a.s. and in L4 .
n i=1
for the following reasons. Markov’s inequality would show for ε > 0 and n =
1, 2, . . .,
! !4
n n
X 1 X
P Xi > ε ≤ ε−4 E Xi .
i=1
n i=1
80
Besides, (6.2) would show that
!4
n
1X
lim E Xi = 0,
n→∞ n i=1
E(Xi Xj Xk Xl ) 6= 0
i = j 6= k = l, i = k 6= j = l or i = l 6= j = k .
81
Thus X ≥ 0 and
1 ∞
Z
dx
E(X) =
c e x(log x)2
1 ∞ dy
Z
dx
y = log x, dy = =
x c 1 y2
1
= < ∞.
c
For any ε > 0,
1 ∞
Z
dx
E(X 1+ε ) =
c e x (log x)2
1−ε
=∞
because
x
lim = ∞,
x→∞ x1−ε (log x)2
and Z ∞
dx
= ∞.
e x
Thus E(X) < ∞ = E(X 1+ε ) for all ε > 0. That is, X has finite mean but any
higher moment is infinite.
Theorem 6.10 (SLLN). For i.i.d. random variables X1 , X2 , . . . with finite mean
µ,
n
1X
Xi → µ a.s. ,
n i=1
as n → ∞.
For proving the SLLN, the following inequality will be used.
Theorem 6.11 (Kolmogorov maximal inequality). Let X1 , . . . , Xn be indepen-
dent random variables with finite variance. Then, for any α > 0,
P max |Sk − E(Sk )| ≥ α ≤ α−2 Var(Sn ) ,
1≤k≤n
where
k
X
Sk = Xi , 1 ≤ k ≤ n .
i=1
82
Proof of Theorem 6.11. WLOG, assume that X1 , . . . , Xn are zero mean. We
start with the observation that
n
[
max |Sk | ≥ α = Ak ,
1≤k≤n
k=1
where
Ak = [|Sk | ≥ α > |Sj | for all 1 ≤ j ≤ k − 1] , k = 1, . . . , n .
Since A1 , . . . , An are disjoint, it follows that
Var(Sn ) = E(Sn2 )
Xn
E Sn2 1Ak
=
k=1
Xn
E (Sn − Sk )2 1Ak + E Sk2 1Ak + 2E ((Sn − Sk )Sk 1Ak )
=
k=1
Xn
E Sk2 1Ak + 2E ((Sn − Sk )Sk 1Ak ) .
≥
k=1
Since Sn − Sk and Sk 1Ak are independent and the former has zero mean, it
follows that
E ((Sn − Sk )Sk 1Ak ) = 0 ,
and hence
n
X
E Sk2 1Ak
Var(Sn ) ≥
k=1
Xn
E α2 1Ak
≥
k=1
= α2 P max |Sk | ≥ α .
1≤k≤n
Xn0 := Xn 1(|Xn | ≤ n) ,
and
n
X
Sn0 := Xk0 .
k=1
83
Notice that
∞
X ∞
X
P (Xn 6= Xn0 ) = P (|X1 | > n)
n=1 n=1
X∞ Z n
≤ P (|X1 | > s)ds (6.5)
n=1 n−1
Z ∞
= P (|X1 | > s)ds
0
= E(|X1 |)
< ∞,
(6.5) following from the observation that P (|X1 | > n) ≤ P (|X1 | > s) for s ≤ n.
From the Borel-Cantelli lemma, it follows that
and hence,
lim n−1 E(Sn0 ) = 0 .
n→∞
For r ≥ 1, set
Zr := max |Sk0 − E(Sk0 )| .
2r−1 ≤k<2r
Since
1 0
|S − E(Sk0 )| ≤ 2−(r−1) Zr for all 2r−1 ≤ k ≤ 2r ,
k k
it suffices to show that
2−r Zr → 0 a.s.
The above will follows from Theorem 6.8 if it can be shown that
∞
X
P [|Zr | > 2r ε] < ∞ .
r=1
84
for any ε > 0. Kolmogorov’s inequality implies that
X∞ X∞
r 0 0 r
P [|Zr | > 2 ε] ≤ P max r |Sk − E(Sk )| > 2 ε
1≤k≤2
r=1 r=1
X∞
≤ ε−2 4−r Var(S20 r )
r=1
r
∞
X 2
X
−2 −r
= ε 4 Var(Xj0 )
r=1 j=1
∞
X ∞
X
= ε−2 Var(Xj0 ) 4−r
j=1 r=dlog2 je
∞
X
≤ K j −2 Var(Xj0 ) ,
j=1
85
(6.7) following from the fact that for k ≥ 2,
∞ ∞ Z j
X
−2
X 1 2
j ≤ x−2 dx = ≤ ,
j−1 k−1 k
j=k j=k
and
∞
X Z ∞
j −2 ≤ 1 + x−2 dx = 2 ,
j=1 1
86
S∞
Since n=1 (A1 ∨ . . . ∨ An ) is a field,
S∞ Theorem 3.1 shows that T is independent
of the sigma-field generated by n=1 (A1 ∨ . . . ∨ An ). Observing that
∞ ∞
!
[ _
σ (A1 ∨ . . . ∨ An ) = An ⊃ T ,
n=1 n=1
Sn = X1 + . . . + Xn , n = 1, 2, . . . ,
show that
P lim sup Sn = ∞ = 0 or 1 ,
n→∞
1
Convince yourself that the above is not necessarily true with n Sn replaced by
Sn .
then
P (An occurs for infinitely many n) = 1 .
Proof. Since
∞ _
\ ∞
E = [An occurs for infinitely many n] = {Ak , Ack , ∅, Ω} ,
n=1 k=n
87
Recall that for any n ≥ 1,
n
! n
!
[ \
P Ai =1−P Aci
i=1 i=1
n
Y
=1− (1 − P (Ai ))
i=1
Yn
1 − x ≤ e−x for all x ∈ R ≥ 1 − e−P (Ai )
i=1
n
!
X
= 1 − exp − P (Ai )
i=1
→ 1,
P∞
as n → ∞ because i=1 P (Ai ) = ∞.
Let α1 , α2 , . . . ∈ (0, 1) be such that
∞
Y
αi > 0 .
i=1
2
For example, αi = e−1/i for i = 1, 2, . . . satisfies the above. The above calcula-
tions show there exists n1 such that
n1
!
[
P Ai ≥ α1 .
i=1
P∞
Since i=n1 +1 P (Ai ) = ∞, a similar calculation shows there exists n2 > n1
such that !
n2
[
P Ai ≥ α2 .
i=n1 +1
Clearly,
∞
\ nk
[
E⊃ Ai .
k=1 i=nk−1 +1
88
Therefore,
∞
\ nk
[
P (E) ≥ P Ai
k=1 i=nk−1 +1
∞
Y nk
[
= P Ai
k=1 i=nk−1 +1
∞
Y
≥ αk > 0 .
k=1
If Y1 , Y2 , . . . are such that σ(Xn : n ≥ 1), σ(Y1 ), σ(Y2 ), . . . are independent and
Yn takes values 1 and −1, each with probability 1/2 for n = 1, 2, . . ., show that
n
X
Xi Yi → Z , as n → ∞ ,
i=1
89
For m + 1 ≤ i < j ≤ n, independence of σ(X1 , X2 , . . .), σ(Yi ), σ(Yj ) shows
Thus,
n
X
E (Zn − Zm )2 = E(Xi2 ) .
i=m+1
which is possible from the given hypothesis, it holds that for N ≤ m < n,
n
X ∞
X
2 2
E(Xi2 ) ≤ ε ,
E (Zn − Zm ) = E(Xi ) ≤
i=m+1 i=N +1
90
and in that case its expectation is defined as
Z
E(Z) = Z dP .
Ω
Throughout this chapter,√<(z) and =(z) will denote the real and imaginary
parts of z for z ∈ C and ι = −1. That is, for z = x + ιy where x, y ∈ R,
<(z) = x , =(z) = y .
Hence prove that Z is B(C)-measurable if and only if <(Z) and =(Z) are
B(R)-measurable.
2. For a C-valued integrable random variable Z, show that
|E(Z)| ≤ E(|Z|) .
91
Theorem 7.1. Let φX be the characteristic function of a random variable X.
Then
1. φX (0) = 1 and |φX (t)| ≤ 1 for all t,
2. φX is uniformly continuous,
3. aX + b has the characteristic function φaX+b given by
E(eιtX ) ≤ E |eιtX | = 1 .
lim E eihX − 1 = 0 ,
h→0
E[cos λX] = 1 .
92
Thus, for a fixed t ∈ R,
eι(t+λ)X = eιtX a.s.
Taking expectation of both sides shows φX (t+λ) = φX (t), from which 2. follows.
2.⇒1. Trivial because φX (0) = 1.
Exercise 7.3. If the CHF φX of X satisfies
ψX (t) = E etX , t ∈ R .
then {t ∈ R : ψ(t) < ∞} ⊃ (α, β). If α < β then ezx is µ-integrable for z with
α < <(z) < β and f : {z ∈ C : α < <(z) < β} → C defined by
Z ∞
f (z) = ezx µ(dx) , (7.1)
−∞
and
∞
zn ∞ n
X Z
f (z) = x µ(dx) , z ∈ C, |z| < (−α) ∧ β . (7.2)
n=0
n! −∞
93
Proof. Monotonicity and positivity of the exponential function on R implies
that for t1 < t2 < t3 ,
e t2 x ≤ e t1 x + e t3 x , x ∈ R , (7.3)
showing that
ψ(t2 ) ≤ ψ(t1 ) + ψ(t3 ) .
Thus, {t ∈ R : ψ(t) < ∞} is a convex subset of R which contains (α, β).
If α < <(z) < β for some z ∈ C, then the above shows ψ(<(z)) < ∞ and
thus Z ∞ Z ∞
|ezx | µ(dx) = ex<(z) µ(dx) = ψ(<(z)) < ∞ .
−∞ −∞
DCT for complex-valued function shows that f defined by (7.1) is continuous. A
standard application of Fubini and Morera’s theorem in conjunction with (7.3)
proves f is holomorphic.
For the final claim, assume α < 0 < β. For 0 < t < (−α) ∧ β and n ≥ 1,
|tx|n
|x|n ≤ t−n n! ≤ n!t−n e|tx| ≤ n!t−n etx + e−tx , x ∈ R .
n!
Thus,
Z Z ∞
|x|n µ(dx) ≤ n!t−n etx + e−tx µ(dx) = n!t−n (ψ(t) + ψ(−t)) < ∞ .
R −∞
and Z
et|x| µ(dx) ≤ ψ(t) + ψ(−t) < ∞ ,
R
DCT shows that
Z n Z X n
X 1 1
lim (zx)i µ(dx) = lim (zx)i µ(dx)
R n→∞ i=0 i! n→∞ R
i=0
i!
n Z
X 1 i
= lim z xi µ(dx)
n→∞
i=0
i! R
∞ Z
X 1 i
= z xi µ(dx) .
i=0
i! R
Since this holds for all z ∈ C with |z| ≤ t and t is arbitrary in (0, (−α) ∧ β),
(7.2) follows.
94
Corollary 7.1. If µ is a probability measure such that α < 0 < β, where α, β
are as in Theorem 7.3, then the MGF ψ of µ satisfies
∞ n Z ∞
X t
ψ(t) = xn µ(dx) , t ∈ R, |t| < (−α) ∧ β ,
n=0
n! −∞
Remark 2. Only DCT and no complex analysis is used in the proof of (7.2),
and therefore for Corollary 7.1.
Exercise 7.4. Show that the characteristic function φ of standard normal is
2
φ(t) := e−t /2
, t ∈ R, (7.4)
in each of the following different ways.
1. Recall (from the solution of HW4/12) that the MGF of standard normal
is 2
ψ(t) = et /2 , t ∈ R , (7.5)
whose analytic continuation to C is
2
ψ̃(z) = ez /2
,z ∈ C.
Use Theorem 7.3 to arrive at (7.4). This line of argument essentially
justifies replacing t by ιt in (7.5).
2. Derive from HW1/18 that
Z ∞ √
2 2
eιtx−x /2 dx = 2π e−t /2 , t ∈ R ,
−∞
95
Proof of Theorem 7.4. Fix a, b satisfying the hypotheses. Before giving the ac-
tual proof, let us start with a sketch of the proof; every step in the sketch will
eventually be justified. For T > 0,
T
e−ιta − e−ιtb
Z
φ(t) dt
−T ιt
− e−ιtb ∞ ιtx
Z T −ιta Z
e
= e µ(dx) dt
−T ιt −∞
Z ∞ Z T ιt(x−a)
e − eιt(x−b)
= dt µ(dx) (7.6)
−∞ −T ιt
Z ∞Z T
= t−1 (sin(t(x − a)) − sin(t(x − b))) dt µ(dx) (7.7)
−∞ −T
Z ∞
= 2 sgn(x − a)S(T |x − a|) − sgn(x − b)S(T |x − b|) µ(dx) , (7.8)
−∞
S being as in Lemma 7.1, provided (7.6)-(7.8) can be justified. The said lemma
implies that
lim sgn(x − a)S(T |x − a|) − sgn(x − b)S(T |x − b|)
T →∞
π
= (sgn(x − a) − sgn(x − b))
2
π , a < x < b ,
= 0 , x < a or x > b ,
π
2 , x = a or x = b .
Thus,
T
e−ιta − e−ιtb
Z
lim φ(t) dt
T →∞ −T ιt
Z ∞
= lim 2 sgn(x − a)S(T |x − a|) − sgn(x − b)S(T |x − b|) µ(dx)
T →∞ −∞
Z ∞
= (2π1(a < x < b) + π1(x ∈ {a, b})) µ(dx) (7.9)
−∞
= 2πµ (a, b] ,
96
For justifying the interchange of integrals in (7.6), notice that
≤ b − a. (7.10)
Therefore,
∞ T ∞ T
eιt(x−a) − eιt(x−b)
Z Z Z Z
dt µ(dx) ≤ (b − a) 1 dt µ(dx)
−∞ −T ιt −∞ −T
(Tonelli) = 2T (b − a) < ∞ .
holds for x = a as well because in that case both sides vanish. The above holds
with a replaced by b, which establishes (7.8).
Finally, (7.9) is justified by the observation that
97
which follows from Lemma 7.1 and the fact that S(·) is a continuous function.
Since
|sgn(x − a)S(T |x − a|) − sgn(x − b)S(T |x − b|)| ≤ 2K ,
and µ is a finite measure, DCT justifies the interchange of limit and integral in
(7.9). This completes the proof.
The following is an immediate corollary of Theorem 7.4.
Corollary 7.2. If µ1 and µ2 are probability measures on (R, B(R)) with respec-
tive CHFs φ1 and φ2 , then
Theorem 7.5. Suppose µ1 and µ2 are probability measures on (R, B(R)) with
respective MGFs ψ1 and ψ2 . If there exists θ > 0 such that
then µ1 = µ2 .
Proof. For i = 1, 2, define fi : {z ∈ C : |<(z)| < θ} → C by
Z
fi (z) = ezx µi (dx) ,
R
which is possible because the MGFs of µ1 and µ2 are finite on [−θ, θ]. Theorem
7.3 shows that f1 and f2 are holomorphic. Since ψi is the restriction of fi to
(−θ, θ), the assumption implies f1 and f2 agree on an uncountable set. Thus,
f1 = f2 . As {ιt : t ∈ R} is contained in the domains of f1 and f2 , it follows that
f1 (ιt) = f2 (ιt) , t ∈ R .
The above is the same as saying the CHFs of µ1 and µ2 are identical. Corollary
7.2 completes the proof.
Theorem 7.6 (Inversion theorem for densities). If the characteristic function
φ of a probability measure µ is integrable on R, that is,
Z ∞
|φ(t)|dt < ∞ ,
−∞
then f defined by Z ∞
1
f (x) := e−ιtx φ(t)dt ,
2π −∞
is a density of µ.
Proof. Let F be the CDF of µ, that is,
98
Proof of Step 1. Suffices to show that for all x ∈ R, µ{x} = 0. Fix x ∈ R. Let
a < x ≤ b be such that µ{a, b} = 0. By the preceding result, it follows that
Z T −ιta
1 e − e−ιtb
µ(a, b] = lim φ(t)dt .
T →∞ 2π −T ιt
Notice that for all T ≥ 0,
Z T −ιta Z T −ιta Z ∞
e − e−ιtb e − e−ιtb
φ(t)dt ≤ |φ(t)|dt ≤ (b − a) |φ(t)|dt ,
−T ιt −T ιt −∞
and the RHS converges to zero as n → ∞. This completes the proof of Step
1.
Step 2. The function F is differentiable, and
F 0 (x) = f (x) .
Proof of Step 2. Fix x ∈ R and h 6= 0. Then, by Step 1 and the preceding
result, it follows that
Z T −ιtx
F (x + h) − F (x) 1 1 e − e−ιt(x+h)
= µ ((x, x + h]) = lim φ(t)dt .
h h T →∞ 2π −T ιth
Since
e−ιtx − e−ιt(x+h) 1
φ(t) = |φ(t)| e−ιth − 1 ≤ |φ(t)| , (7.12)
ιth |th|
the inequality following from (7.10) by putting b = h and a = 0. As |φ(t)| is
integrable on R, by DCT, it follows that
Z ∞ −ιtx
F (x + h) − F (x) 1 e − e−ιt(x+h)
= φ(t)dt . (7.13)
h 2π −∞ ιth
Since,
e−ιt(x+h) − e−ιtx d −ιtx
lim = e = −ιte−ιtx ,
h→0 h dx
it follows that the integrand in (7.13) converges to e−ιtx φ(t) as h → 0. By
(7.12), the modulus of the integrand in (7.13) is bounded above by |φ(t)|. DCT
allows the limit as h → 0 to be interchanged with the integral in the RHS of
(7.13), which completes the proof of Step 2.
99
Arguments similar to those in the proof of Theorem 7.1.2 show that f is a
continuous function. By Step 2, it follows that for all real a < b,
Z b
µ(a, b] = F (b) − F (a) = f (x)dx ,
a
Exercise 7.5. Use Corollary 7.3 to give a fifth proof of the fact
Z ∞ √
2
e−x /2 dx = 2π ;
−∞
100
Theorem 7.7 (Inversion theorem on Rd ). If φ is the CHF of probability measure
µ on Rd , then for ∆ = [a1 , b1 ] × . . . × [ad , bd ] where aj < bj for j = 1, . . . , d, and
µ(∂∆) = 0 where ∂∆ is the boundary of ∆,
d −ιtj aj −ιtj bj
−
Z Y e e
µ(∆) = (2π)−d lim φ(t1 , . . . , td ) dt1 . . . dtd .
T →∞ [−T,T ]d
j=1
ιtj
Proof. We shall proceed along the lines of the proof of Theorem 7.4. Let ∆ =
[a1 , b1 ] × . . . × [ad , bd ] satisfy the hypotheses. For T > 0,
d −ιtj aj −ιtj bj
−
Z Y e e
φ(t1 , . . . , td ) dt1 . . . dtd (7.14)
[−T,T ]d j=1
ιtj
d −ιtj aj −ιtj bj d
−e
Z Z
Y e X
= exp ι tj xj µ(dx1 , . . . , dxd )
[−T,T ] d
j=1
ιt j Rd j=1
dt1 . . . dtd
d
eιtj (xj −aj ) − eιtj (xj −bj )
Z Z Y
= µ(dx1 , . . . , dxd ) dt1 , . . . , dtd .
[−T,T ]d Rd j=1
ιtj
(7.15)
Recall (7.10) to write
d
eιtj (xj −aj ) − eιtj (xj −bj )
Z Z Y
µ(dx1 , . . . , dxd ) dt1 , . . . , dtd
[−T,T ]d Rd j=1
ιtj
Z Z d
Y
≤ (bj − aj ) µ(dx1 , . . . , dxd ) dt1 , . . . , dtd
[−T,T ]d Rd j=1
d
Y
= (2T )d (bj − aj ) < ∞ .
j=1
µ(dx1 , . . . , dxd ) ,
101
(7.8) implying the last equality.
Denote for x1 , . . . , xd ∈ R,
ψT (x1 , . . . , xd )
d
Y
= 2 (sgn(xj − aj )S(T |xj − aj |) − sgn(xj − bj )S(T |xj − bj |)) ,
j=1
=: ψ∞ (x1 , . . . , xd ) .
For (x1 , . . . , xd ) in the interior of ∆, that is, if aj < xj < bj for all j, then
sgn(xj − aj ) − sgn(xj − bj ) = 2 , j = 1, . . . , d ,
and hence
ψ∞ (x1 , . . . , xd ) = (2π)d .
On the other hand, if (x1 , . . . , xd ) ∈ ∆c , then there exists j for which either
xj < aj or xj > bj and hence for that j,
sgn(xj − aj ) − sgn(xj − bj ) = 0 .
In other words,
ψ∞ (x1 , . . . , xd ) = 0 , (x1 , . . . , xd ) ∈ ∆c .
Thus,
(7.16) and that µ(∂∆) = 0 imply the second line. The left hand side of the first
line above is the same as the quantity in (7.14). That is, we have shown
d −ιtj aj −ιtj bj
−
Z Y e e dt1 . . . dtd = (2π)d µ(∆) ,
lim φ(t1 , . . . , td )
T →∞ [−T,T ]d
j=1
ιtj
102
Theorem 7.8 (Uniqueness theorem on Rd ). If µ1 and µ2 are probability mea-
sures on Rd with identical CHFs, then µ1 = µ2 .
Proof. Let µ1 and µ2 be probability measures on Rd with identical CHFs. To
show µ1 = µ2 , in view of Theorem 4.3.2, it suffices to prove that
103
Example 7.2. Let Z1 , . . . , Zd be i.i.d. from standard normal and Z = (Z1 ,
. . . , Zd ). Fix a d × d symmetric non-negative definite (n.n.d.) matrix Σ and
µ ∈ Rd and define
X = µ + Σ1/2 Z , (7.20)
where elements of Rd are to be interpreted as column vectors by convention. Let
us calculate the CHF of X. Fix λ ∈ Rd and write
λT X = λT µ + θ T Z ,
where
θ = Σ1/2 λ .
Recall that θT Z follows N (0, kθk2 ), where k · k is the L2 -norm, if kθk > 0; θT Z
is degenerate at zero otherwise. Assuming for a moment that t = kθk > 0,
T
T θ Z
E eιθ Z = E exp ιt
kθk
T
θ Z 2
because ∼ N (0, 1) = e−t /2
kθk
2
= e−kθk /2
1 T
= exp − θ θ
2
1 T
= exp − λ Σλ .
2
104
The above definition is consistent with Definition 39 in the following sense.
If Σ is p.d. and X ∼ Nd (µ, Σ) according to Definition 50, then the density of X
is f as in Definition 39. Indeed, (7.20) should be compared with (4.10) to see
this immediately.
Remark 4. The Nd (µ, Σ) is called a “singular normal distribution” if Σ is
n.n.d. but not p.d. It should be noted that a singular normal distribution in one
dimension is a degenerate distribution.
Exercise 7.6. Show that a Nd (µ, Σ) distribution has a density if and only if Σ
is p.d.
Theorem 7.10. For an Rd -valued random variable X, µ ∈ Rd and a d × d
n.n.d. matrix Σ,
Proof. For the “⇒ part”, assume X ∼ Nd (µ, Σ) and fix λ ∈ Rd . Then for t ∈ R,
E eιthλ,Xi = E eιhtλ,Xi
T 1 T
= exp ι(tλ) µ − (tλ) Σ(tλ)
2
2 2
= eιtθ−σ t /2
,
d
Thus hλ, Xi = hλ, Y i for all λ ∈ Rd . The Cramér-Wold device shows
d
X=Y ,
105
2. For all t ∈ R, =(φ(t)) = 0, that is, φ is a real function.
3. The function φ is even, that is, φ(−t) = φ(t) for all t ∈ R.
Exercise 7.9. If X1 , X2 , . . . are i.i.d. from the Cauchy distribution, show that
there does not exist a random variable Z such that
n
1X P
Xi −→ Z , n → ∞ .
n i=1
106
d
Exercise 8.2. If Xn ⇒ Y and Xn ⇒ Z, show that Y = Z.
P
Theorem 8.1. If Xn −→ X, then Xn ⇒ X.
Proof. Fix x ∈ R such that P (X = x) = 0. It suffices to show for all such x,
P (X ∈ [w, y]) ≤ ε .
Clearly,
[X > y] ∩ [|Xn − X| ≤ y − x] ⊂ [Xn > x] .
Take complements of both sides to get
Thus,
P
Let n → ∞ and use the fact Xn −→ X to argue
the right inequality following from the choice of w and y. Since ε is arbitrary,
we get
lim sup P (Xn ≤ x) ≤ P (X ≤ x) .
n→∞
Since
[Xn > x] ∩ [|Xn − X| ≤ x − w] ⊂ [X > w] ,
proceeding along similar lines would yield
from which (8.1) would follow and would complete the proof.
Exercise 8.3. If X follows standard normal and Xn = −X, show that
P
Xn ⇒ X but Xn −→
6 X.
107
Theorem 8.2. For probability measures µ1 , µ2 , . . . , µ∞ on R, µn ⇒ µ∞ if and
only if Z Z
lim f dµn = f dµ∞ , (8.2)
n→∞
Letting n → ∞,
Z
lim inf µn ((−∞, x]) ≥ lim inf f dµn
n→∞ n→∞
Z
(by (8.2)) = f dµ∞
µ∞ ([a, b]c ) ≤ ε .
xi − xi−1 ≤ δ , i = 1, . . . , k ;
108
this is possible because a and b have been chosen to be continuity points of µ∞ .
Thus, for n = 1, 2, . . . , ∞,
Z k
X
f (x)µn (dx) − f (xi )µn ((xi−1 , xi ]) (8.4)
(a,b] i=1
k Z
X
= [f (x) − f (xi )] µn (dx)
i=1 (xi−1 ,xi ]
k Z
X
≤ |f (x) − f (xi )| µn (dx)
i=1 (xi−1 ,xi ]
k
X
≤ µn ((xi−1 , xi ]) max |f (x) − f (xi )|
x∈[xi−1 ,xi ]
i=1
k
X
≤ε µn ((xi−1 , xi ])
i=1
= εµn ((a, b]) ≤ ε ,
the inequality in the penultimate line following from the choice of δ and that
xi − xi−1 ≤ δ , i = 1, . . . , k.
Thus, for n = 1, 2, . . .,
Z Z
f (x) µn (dx) − f (x) µ∞ (dx)
(a,b] (a,b]
Z k
X
≤ f (x)µn (dx) − f (xi )µn ((xi−1 , xi ])
(a,b] i=1
Z k
X
+ f (x)µ∞ (dx) − f (xi )µ∞ ((xi−1 , xi ])
(a,b] i=1
k
X k
X
+ f (xi )µn ((xi−1 , xi ]) − f (xi )µ∞ ((xi−1 , xi ])
i=1 i=1
k
X k
X
≤ 2ε + f (xi )µn ((xi−1 , xi ]) − f (xi )µ∞ ((xi−1 , xi ]) .
i=1 i=1
and hence
k
X k
X
lim f (xi )µn ((xi−1 , xi ]) − f (xi )µ∞ ((xi−1 , xi ]) = 0 . (8.5)
n→∞
i=1 i=1
109
Therefore,
Z Z
lim sup f (x) µn (dx) − f (x) µ∞ (dx) ≤ 2ε . (8.6)
n→∞ (a,b] (a,b]
and
Z
lim sup f (x) µn (dx) ≤ K lim sup µn ((a, b]c )
n→∞ (a,b]c n→∞
Since ε is arbitrary, (8.2) follows. This proves the “only if part” and therefore
completes the proof.
Theorem 8.3 (Lévy Continuity theorem). Let µn , µ be probability measures on
R with characteristic functions φn , φ. Then, µn ⇒ µ if and only if
Proof. The “only if” part follows trivially from Theorem 8.2. For the “if” part,
assume that
lim φn (t) = φ(t) for all t ∈ R .
n→∞
Step 1. Let Fn be the c.d.f. of µn . There exist integers 1 ≤ n1 < n2 < . . . such
that
lim Fnk (r) exists for all r ∈ Q .
k→∞
and
G(x) := inf{H(r) : r > x, r ∈ Q} .
Then, G is a non-decreasing right continuous function.
110
Proof of Step 2. Non-decreasing is immediate. For right continuity, fix x ∈ R
and ε > 0. Clearly, there exists r ∈ (x, ∞) ∩ Q such that
H(r) ≤ G(x) + ε .
Clearly,
G((x + r)/2) ≤ H(r) ≤ G(x) + ε .
Thus, G is right continuous at x.
Proof of Step 3. Fix a continuity point x of G and ε > 0. Therefore, there exist
w < x < y such that
G(x) − ε ≤ G(w)
≤ H(r1 )
= lim Fnk (r1 )
k→∞
≤ lim inf Fnk (x)
k→∞
≤ lim sup Fnk (x)
k→∞
≤ lim Fnk (r2 )
k→∞
= H(r2 )
≤ G(y) (8.7)
≤ G(x) + ε ,
the inequality in (8.7) following from the fact that for all r ∈ (y, ∞) ∩ Q,
H(r) ≥ H(r2 ) ,
that is, H(r2 ) is a lower bound of the set of which G(y) is the infimum. Letting
ε ↓ 0 completes the proof of Step 3.
111
Proof of Step 4. Observe that for u > 0,
1 u 1 u
Z Z Z
1 − eιtx µn (dx)dt
(1 − φn (t))dt =
u −u u −u R
1 u
Z Z
1 − eιtx dtµn (dx)
=
R u −u
Z
sin 0 sin ux
Interpreting =1 = 2 1− µn (dx)
0 R ux
Z
sin z sin ux
≤ 1,z ∈ R ≥ 2 1− µn (dx)
z [|x|≥2/u] ux
| sin ux|
Z
sin ux 1 1
≤ ≤ ≥ 2 1− µn (dx)
ux |ux| |ux| [|x|≥2/u] |ux|
Z
1
≥ 2 µn (dx)
[|x|≥2/u] 2
= µn [|x| ≥ 2/u] .
1 u 1 u 1 u
Z Z Z
lim (1 − φn (t))dt = (1 − φ(t))dt ≤ |1 − φ(t)| dt .
n→∞ u −u u −u u −u
1 u
Z
lim sup µn {x : |x| ≥ a} ≤ lim (1 − φn (t))dt
n→∞ n→∞ u −u
1 u
Z
≤ |1 − φ(t)| dt
u −u
≤ ε.
112
Let x ≤ −a be a continuity point of G. Since G is non-decreasing,
G(−∞) ≤ G(x)
(By Step 3) = lim Fnk (x)
k→∞
≤ lim sup Fnk (−a)
k→∞
= lim sup µnk ((−∞, −a])
k→∞
≤ lim sup µn ((−a, a)c )
n→∞
≤ ε.
Since ε is arbitrary and G is non-negative, it follows that G(−∞) = 0. A similar
argument shows that if y ≥ −a is a continuity point of G, then G(y) ≥ 1 − ε,
and hence G(∞) = 1. This proves Step 5.
Step 6. As k → ∞, µnk =⇒ µ.
Proof of Step 6. Steps 2 and 5 in conjunction with Theorem 1.4 imply there
exists a probability measure ν on R such that
ν(−∞, x] = G(x), x ∈ R .
Step 3 implies that
µnk =⇒ ν .
By the already proven “only if” part, it follows that
Z
lim φnk (t) = eιtx ν(dx) for all t ∈ R .
k→∞
shows Z
φ(t) = eιtx ν(dx) for all t ∈ R .
113
Step 7 clearly completes the proof of the “only if” part, and thereby proves
the theorem.
Exercise 8.5. Suppose that µ1 , µ2 , . . . are probability measures on R with CHFs
φ1 , φ2 , . . ., respectively. Assume
Show that there exists a probability measure µ whose CHF is φ if and only if φ
is continuous at zero and in that case µn ⇒ µ.
Theorem 8.4 (Central limit theorem (CLT) on R for i.i.d.). Let X1 , X2 , . . .
be i.i.d. random variables with mean µ and variance σ 2 ∈ (0, ∞). Then, as
n → ∞, Pn
j=1 Xj − nµ
=⇒ Z ,
n1/2 σ
where Z follows standard normal.
Lemma 8.1. For all θ ∈ R,
1
eιθ − 1 + ιθ − θ2 ≤ 2 min(θ2 , |θ|3 ) .
2
Proof. Notice that
1 1
eιθ − 1 + ιθ − θ2 ≤ cos θ − 1 − θ2 + |sin θ − θ| .
2 2
Denote
1
R1 := cos θ − 1 − θ2 ,
2
R2 := sin θ − θ .
θ2 θ3
cos θ = 1− + sin ξ (8.8)
2 6
θ2
= 1− cos ξ 0 . (8.9)
2
Equations (8.8) and (8.9) respectively show that
|θ|3
|R1 | ≤ ≤ |θ|3 ,
6
θ2
|R1 | ≤ (1 + | cos ξ 0 |) ≤ θ2 .
2
Therefore,
|R1 | ≤ min(θ2 , |θ|3 ) .
114
Applying Taylor to sin θ shows the existence of η, η 0 satisfying
θ3
sin θ = θ− cos η
6
θ2
= θ− sin η 0 .
2
Thus,
|R2 | ≤ min(θ2 , |θ|3 ) ,
and this completes the proof.
Lemma 8.2. For y, z ∈ C with |y| ∨ |z| ≤ 1, and n ∈ N,
|y n − z n | ≤ n|y − z| .
n−1
X
|y n − z n | = (y − z) y n−1−j z j
j=0
n−1
X
= |y − z| y n−1−j z j
j=0
≤ n|y − z| ,
t2 2
√
ιt
E[eιtX1 / n
] − E 1 + √ X1 − X1
n 2n
2
√
ιt t
≤ E eιtX1 / n − 1 + √ X1 − X12
n 2n
(by Lemma 8.1) ≤ 2E min(t2 X12 /n, |t|3 |X1 |3 /n3/2 ) ,
115
that is,
√ t2
φ(t/ n) − 1 − ≤ 2E min(t2 X12 /n, |t|3 |X1 |3 /n3/2 ) .
2n
By Lemma 8.2, it follows that for n > t2 ,
n
√ n t2 √ t2
φ(t/ n) − 1 − ≤ n φ(t/ n) − 1 −
2n 2n
≤ 2E min(t2 X12 , |t|3 |X1 |3 /n1/2 ) .
116
The following result shows, among other things, that if weak convergence
on Rd were defined by CDFs as in Definition 51, then that would have been
equivalent to Definition 52.
Theorem 8.6 (Portmanteau theorem). For probability measures µ1 , µ2 , . . . , µ∞
on Rd with respective CDFs F1 , F2 , . . . , F∞ , the following are equivalent.
1. As n → ∞, µn ⇒ µ∞ .
2. For any closed set F ⊂ Rd ,
lim sup µn (F ) ≤ µ∞ (F ) .
n→∞
lim inf µn (U ) ≥ µ∞ (U ) .
n→∞
F is continuous at x ⇐⇒ µ(∂Ex ) = 0 ,
117
Proof of 1⇒2. Assume 1, that is, µn ⇒ µ∞ . Let k · k be any norm on Rd and
define
d(F, x) = inf{kx − yk : y ∈ F } , x ∈ Rd .
Fix ε > 0 and define f : Rd → R by
fε (x) = 1 , if x ∈ F ,
and
fε (x) = 0 , if d(F, x) ≥ ε .
In other words,
1F (x) ≤ fε (x) ≤ 1 (d(F, x) < ε) .
Thus,
Z
lim sup µn (F ) = lim sup 1F (x)µn (dx)
n→∞ n→∞
Z
≤ lim fε (x)µn (dx)
n→∞
Z
(as µn ⇒ µ∞ ) = fε (x)µ∞ (dx)
≤ µ∞ {x ∈ Rd : d(F, x) < ε} .
As ε ↓ 0,
{x ∈ Rd : d(F, x) < ε} ↓ {x ∈ Rd : d(F, x) = 0} = F ,
the set theoretic equality following from the fact that F is closed. Thus,
Therefore,
lim sup µn (F ) ≤ µ∞ (F ) .
n→∞
and
lim inf µn (U ) ≥ µ∞ (U ) , U ⊂ Rd open . (8.12)
n→∞
118
where Ā and A◦ are the closure and interior of A, respectively. Invoke (8.12)
with U = A◦ to get
(8.11) implying the last line. This in conjunction with (8.13) shows
Thus, 2⇒4.
Proof of 4⇒5. Assume 4. Let x = (x1 , . . . , xd ) be a continuity point of F∞ .
Exc 8.7 shows that
µ∞ (∂Ex ) = 0 ,
where Ex = (−∞, x1 ] × . . . × (−∞, xd ]. The hypothesis 4 which has been
assumed shows that
lim µn (Ex ) = µ∞ (Ex ) ,
n→∞
Thus 4⇒5.
Proof of 5⇒1. Assume 5. Let
C = {x ∈ Rd : F∞ is countinuous at x} .
µn (R) = ∆R Fn , n = 1, . . . , ∞ . (8.15)
Let
Ci = z ∈ R : µ∞ {(x1 , . . . , xd ) ∈ Rd : xi = z} > 0 , i = 1, . . . , d .
119
Clearly, C1 , . . . , Cd are countable sets and hence
c
D = (C1 ∪ . . . ∪ Cd )
In view of Exc 8.7, this means Dd ⊂ C. Combine this with (8.14) and (8.15) to
get
d
Y
lim µn (R) = µ∞ (R) , R = (ai , bi ] , a1 , b1 , . . . , ad , bd ∈ D . (8.16)
n→∞
i=1
xi − xi−1 ≤ δ , i = 1, . . . , k .
Set
Yd
H= (xij −1 , xij ] : 1 ≤ i1 , . . . , id ≤ k .
j=1
For R ∈ H, (8.18) and that k · k has been chosen to be the max-norm imply
Since the k d many rectangles in H are disjoint and their union is (a, b]d , we get
Z XZ
f dµn = f dµn , n = 1, 2, . . . , ∞ .
(a,b]d R∈H R
120
Proceeding like in (8.4)-(8.5) with the help of the above three claims, the ana-
logue of (8.6) can be shown, which is
Z Z
lim sup f dµn − f dµ∞ ≤ 2ε .
n→∞ (a,b]d (a,b]d
Thus Z Z
lim sup f dµn − f dµ∞ ≤ 2(K + 1)ε .
n→∞ Rd Rd
µnk ⇒ µ , k → ∞ .
121
Proof. We shall proceed like in the proof of Lévy continuity theorem. Let Fn
be the CDF of µn , that is,
Define
G(r) = lim Fnk (r) , r ∈ Qd ,
k→∞
d
and F : R → [0, 1] by
We shall show that F is a CDF, that is, it satisfies the assumptions of Theorem
4.2, from which it would follows that F induces a probability measure µ on Rd .
It will be shown that µnk ⇒ µ. This is achieved in the following few steps.
Step 1. The function F is continuous from above, that is,
G(r1 , . . . , rd ) ≤ F (x1 , . . . , xd ) + ε .
Define
d
1^
δ= (ri − xi ) .
2 i=1
F (x) ≤ F (y)
≤ G(r1 , . . . , rd )
≤ F (x) + ε .
122
Proof of Step 2. Fix R as above and ε > 0. Let x = (x1 , . . . , xd ) ∈ E =
{a1 , b1 } × . . . × {ad , bd }. There exist rationals r1 > x1 , . . . , rd > xd such that
Set
d
^
δx = (ri − xi ) .
i=1
R0 = (t1 , u1 ] × . . . × (td , ud ] ,
(8.19) implies
|∆R F − ∆R0 G| ≤ ε .
Since
∆R0 G = lim ∆R0 Fnk ≥ 0 ,
k→∞
it follows that
∆R F ≥ −ε .
As ε is arbitrary, Step 2 follows.
Step 3. As x1 → ∞, . . . , xd → ∞, F (x1 , . . . , xd ) → 1. On the other hand, as
Vd
i=1 xi → −∞, F (x1 , . . . , xd ) → 0.
Proof of Step 3. This is the only step in which tightness of {µn } is used. Since
0 ≤ F (x) ≤ 1 for all x ∈ Rd , it suffices to show that for ε > 0 there exists
a, b ∈ R such that for x = (x1 , . . . , xd ) ∈ Rd ,
d
^
F (x) ≥ 1 − ε if xi > b , (8.20)
i=1
123
and
d
^
F (x) ≤ ε if xi < a . (8.21)
i=1
Fix ε > 0. Tightness implies there exists a compact set K such that
lim inf µn (K) ≥ 1 − ε .
n→∞
Since K is compact and hence bounded, there exist a, b ∈ Q with a < b and
K ⊂ (a, b]d . Thus
G(b, . . . , b) = lim Fnk (b, . . . , b)
k→∞
= lim µnk (−∞, b]d
k→∞
≥ lim inf µn (K)
n→∞
≥ 1 − ε.
The definition of F and that G is non-decreasing imply that
d
Y
F (z) ≥ G(v) if z = (z1 , . . . , zd ) ∈ Rd , and v ∈ Qd ∩ (−∞, zi ] . (8.22)
i=1
Thus,
F (x1 , . . . , xd ) ≥ G(b, . . . , b) for all x1 ≥ b, . . . , xd ≥ b ,
showing that (8.20) holds. Fix x = (x1 , . . . , xd ) ∈ Rd with xj < a for some fixed
j. Let r ∈ Q be such that r > (x1 ∨ . . . ∨ xd ) and define y = (y1 , . . . , yd ) where
(
r, i 6= j ,
yi =
a, i = j .
the last line following from the argument that yj = a and K ⊂ (a, b]d show
((−∞, y1 ] × . . . × (−∞, yd ]) ∩ K = ∅ ,
and hence
(−∞, y1 ] × . . . × (−∞, yd ] ⊂ K c .
Finally,
lim sup µn (K c ) = 1 − lim inf µn (K) ≤ ε ,
n→∞ n→∞
124
Steps 1-3 in conjunction with Theorem 4.2 show that there exists a proba-
bility measure µ on Rd satisfying
F (w1 , . . . , wd ) ≥ F (x1 , . . . , xd ) − ε ,
and
F (y1 , . . . , yd ) ≤ F (x1 , . . . , xd ) + ε .
Let r1 , . . . , rd , s1 , . . . , sd ∈ Q be such that wi < ri < xi < si < yi for i = 1, . . . , d.
Thus,
F (x) − ε ≤ F (w1 , . . . , wd )
(definition of F ) ≤ G(r1 , . . . , rd )
= lim Fnk (r1 , . . . , rd )
k→∞
≤ lim inf Fnk (x)
k→∞
≤ lim sup Fnk (x)
k→∞
≤ lim Fnk (s1 , . . . , sd )
k→∞
= G(s1 , . . . , sd )
(by (8.22)) ≤ F (y1 , . . . , yd )
≤ F (x) + ε .
Since ε is arbitrary, (8.23) follows, which completes the proof of Theorem 8.7.
The following is a generalization of Theorem 7.9, and hence this also is called
the Cramér-Wold device.
Theorem 8.8 (Cramér-Wold device for weak convergence). For Rd -valued ran-
dom variables X1 , X2 , . . . X∞ , Xn ⇒ X∞ if and only if
125
Exercise 8.8. 1. If F, F1 , F2 , . . . are functions from Rd to [0, 1], show that
for every continuity point x of F if and only if every subsequence {Fnk } of {Fn }
has a further subsequence {Fnkl } such that
Xn = (Xn1 , . . . , Xnd ) , n = 1, . . . , ∞ .
For fixed i ∈ {1, . . . , d}, letting λ be the vector whose i-th coordinate is 1 and
rest are 0, (8.24) implies
We shall first show that {P ◦Xn−1 : n = 1, 2, . . .} is tight, that is, given ε > 0,
a compact K ⊂ Rd will be obtained satisfying
126
Let K = [−α, α]d . Thus,
(8.27) implying the inequality in the last line. Thus, (8.26) holds. In other
words, {P ◦ Xn−1 : n = 1, 2, . . .} is tight.
By Exc 8.8.2, it suffices to show that every subsequence {Xnk } of {Xn } has
a further subsequence converging weakly to X∞ . Fix a subsequence {Xnk }.
Since {P ◦ Xn−1 : n = 1, 2, . . .} is tight, so is {P ◦ Xn−1
k
: k = 1, 2, . . .}. Theorem
8.7 implies {Xnk } has a subsequence {Xnkl : l = 1, 2, . . .} such that
Xnkl ⇒ Y , l → ∞ ,
for some Rd -valued random variable Y . The already proven “only if” part of
this theorem implies
127
Theorem 8.9 (CLT in Rd ). Suppose X1 , X2 , . . . are i.i.d. random variables
taking values in Rd such that each coordinate of X1 has mean zero and finite
variance. Then
n
1 X
√ Xi ⇒ Z , n → ∞
n i=1
and notice that hλ, X1 i, hλ, X2 i, . . . are i.i.d. Denoting X1 = (X11 , . . . , X1d ), the
assumption that X11 , . . . , X1d are zero mean implies
E (hλ, X1 i) = 0 .
= λT Σλ .
128
The last theorem of this course is Lindeberg’s CLT, which is a generalization
of Theorem 8.4 in that the assumption of identical distribution therein is relaxed.
Theorem 8.10 (Lindeberg’s CLT). Suppose that for n = 1, 2, . . ., Xn1 , . . . , Xnn
are independent R-valued random variables satisfying the following:
E(Xni ) = 0 , i = 1, . . . , n, n = 1, 2, . . . ,
n
X
2
= σ2 < ∞ ,
lim E Xni
n→∞
i=1
and
n
X
2
lim E Xni 1(|Xni | > ε) = 0 , for every ε > 0 . (8.29)
n→∞
i=1
Then, as n → ∞,
n
X
Xni ⇒ Z ,
i=1
where Z ∼ N (0, σ 2 ).
The assumption (8.29) is called Lindeberg’s condition. The family {Xni :
1 ≤ i ≤ n, n = 1, 2, . . .} is called a triangular array, which is why, Theorem 8.10
is also known as CLT for triangular arrays. Theorem 8.4 follows from Theorem
8.10 as claimed in the following exercise.
Exercise 8.9. Suppose X1 , X2 , . . . are i.i.d. zero mean random variables with
finite variance σ 2 . Define
1
Xni = √ Xi , 1 ≤ i ≤ n, n ≥ 1 .
n
Show that {Xni : 1 ≤ i ≤ n, n = 1, 2, . . .} satisfies the assumptions of Theorem
8.10 and hence argue that Theorem 8.4 is a special case of that.
Theorem 8.10 can be proven along the lines of the proof of Theorem 8.4, that
is, with the help of the Lévy continuity theorem. For pedagogical reasons, we
shall prove it using Lindeberg’s principle which completely bypasses the Fourier
analytic method, that is, the use of characteristic functions. The following two
exercises, for example, can be easily solved using the Lévy continuity theorem,
though the solutions hinted at don’t use it.
Exercise 8.10. If Xn ∼ N (0, σn2 ) and 0 ≤ σn → σ < ∞, show that Xn ⇒ X
where X ∼ N (0, σ 2 ).
d
Hint. If Z ∼ N (0, 1), then Xn = σn Z → σZ.
Exercise 8.11. Suppose X, X1 , X2 , . . . are random variables such that for all
thrice differentiable bounded f : R → R whose first three derivatives are bounded,
it holds that
lim E (f (Xn )) = E (f (X)) .
n→∞
129
Show that Xn ⇒ X.
Hint. Let
1,
x ≤ 0,
f (x) = (1 − x4 )4 , 0 < x < 1,
0, x ≥ 1.
for all thrice differentiable f : R → R such that f and its first three derivatives
are bounded, where
Xn
Sn = Xni , n ≥ 1.
i=1
Fix such f .
Let (Z1 , Z2 , . . .) be a collection of i.i.d. standard normal random variables
which is independent of the triangular array {Xni : 1 ≤ i ≤ n, n ≥ 1}. Set
q
σni = E(Xni 2 ) , 1 ≤ i ≤ n , n = 1, 2, . . . ,
and v
u n
uX
σn = t 2 ,n ≥ 1.
σni
i=1
Since
n
X
σni Zi ∼ N (0, σn2 ) , n = 1, 2, . . . , (8.31)
i=1
130
where
n
X i
X
Yi = Xnj + σnj Zj , i = 0, 1, . . . , n ,
j=i+1 j=1
with the usual interpretation of the sum as zero if the lower limit exceeds the
upper limit. Thus,
n
!! n
X X
E f (Sn ) − f σni Zi ≤ |E(f (Yi−1 ) − f (Yi ))| . (8.33)
i=1 i=1
Yi = W + σni Zi ,
and
Yi−1 = W + Xni ,
where
n
X i−1
X
W = Xnj + σnj Zj .
j=i+1 j=1
131
Therefore,
2 3
0 1 2 00
KE Xni ∧ |Xni | ≥ E f (Yi−1 ) − f (W ) + Xni f (W ) + Xni f (W )
2
0 1 2 00
≥ E(f (Yi−1 )) − E f (W ) + Xni f (W ) + Xni f (W )
2
1 2
= E(f (Yi−1 )) − E(f (W )) − σni E(f 00 (W )) ,
2
the last line following from the independence of W and Xni and that the mean
2
and variance of Xni are zero and σni , respectively. A similar calculation shows
1 2
E(f 00 (W )) ≤ KE |σni Zi |3 = Cσni
3
E(f (Yi )) − E(f (W )) − σni ,
2
Thus, (8.32) would follow, which would complete the proof, once the following
are shown:
n
X
3
lim σni = 0, (8.36)
n→∞
i=1
n
X
2
∧ |Xni |3 = 0 .
and lim E Xni (8.37)
n→∞
i=1
132
Hence
2
max σni ≤ ε2 + max E(Xni
2
1(|Xni | > ε))
1≤i≤n 1≤i≤n
Xn
≤ ε2 + 2
E(Xni 1(|Xni | > ε)) .
i=1
Since ε is arbitrary,
2
lim max σni = 0,
n→∞ 1≤i≤n
Since ε is arbitrary, (8.37) follows. This in conjunction with (8.36) shows (8.32),
which completes the proof.
Remark 5. The above proof is transparent in that it displays the property of
normal that has been used. Indeed, (8.31) does use the fact that the sum of
independent normal random variables also follows normal.
Exercise 8.12. If X1 , X2 , . . . are i.i.d. and P (X1 = 0) < 1, show that there
does not exist a random variable Z such that
n
X
Xi ⇒ Z , n → ∞ .
i=1
133
Exercise 8.14. Show that the Lindeberg condition (8.29) is implied by the Lya-
punov condition
n
X
E |Xni |2+δ = 0 for some δ > 0 .
lim
n→∞
i=1
lim npn (1 − pn ) = ∞ ,
n→∞
show that
X − npn
p n ⇒Z,
npn (1 − pn )
where Z ∼ N (0, 1).
Exercise 8.16. Suppose X is as in Exc 5.3, that is, it is infinitely divisible,
E(X) = 0 and Var(X) = 1. Show that
E(X 4 ) = 3 ⇐⇒ X ∼ N (0, 1) .
4 3
E(Xn1 )= .
n2
Use Exc 8.14.
Exercise 8.17. A coin with probability of head p ∈ (0, 1) is tossed infinitely
many times. Let Xn be the number of the toss on which the n-th head is obtained.
Show that
−1/2 n
n Xn − ⇒Z,
p
where Z ∼ N (0, σ 2 ) for some σ 2 . Calculate σ 2 .
Exercise 8.18. There are k boxes numbered 1, . . . , k and an infinite supply of
balls. The balls are thrown, one by one, randomly into one of the boxes. Let
Xn1 , . . . , Xnk denote the number of balls in Boxes 1, . . . , k, respectively, after
the first n balls are thrown. Show that
n n
n−1/2 Xn1 − , . . . , Xnk − ⇒ (Z1 , . . . , Zk ) ,
k k
where (Z1 , . . . , Zk ) ∼ Nk (0, Σ) for some k × k matrix Σ. Calculate Σ.
Exercise 8.19. If Xn ∼ Binomial(n, pn ) and
134
Exercise 8.20. Suppose that X1 , X2 , . . . are i.i.d. random variables with den-
sity
f (x) = e−1 x−2 1(x > e−1 ), x ∈ R .
Show that as n → ∞, √
1/ n
(X1 . . . Xn ) ⇒Z,
where Z follows the log-normal distribution, that is, log Z follows standard nor-
mal.
Exercise 8.21. If X1 , X2 . . . are i.i.d. with zero mean and finite positive vari-
ance, show that there does not exist a random variable Z such that
n
1 X P
√ Xi −→ Z , n → ∞ .
n i=1
Exercise 8.22. Suppose Xn are random variables with all moments finite such
that
lim E(Xnk ) = mk ∈ R , k ∈ {1, 2, . . .} .
n→∞
If there exists a unique probability measure µ on R such that
Z
xk µ(dx) = mk , k = 1, 2, . . . ,
R
Hint. Write
|fn − f | = fn + f − 2(fn ∧ f ) .
135
9 Appendix
9.1 Proof of Fact 1.3
Proof of Fact 1.3. Since k·k is the L∞ norm, it suffices to show that the absolute
value of each entry of the d × 1 vector T (x) − T (y) − J(x)(x − y) is at most.
dαkx − yk. In other words, it suffices to show that if f : U → R is continuously
differentiable, and
where
∂f (x)
fi (x) = , x ∈ U, i = 1, . . . , d ,
∂xi
then
d
X
f (x) − f (y) − fi (x)(xi − yi ) ≤ dαkx − yk , x, y ∈ R .
i=1
fi (ξ˜i ) − fi (x) ≤ α , i = 1, . . . , d .
136
Therefore,
d
X
f (x) − f (y) − fi (x)(xi − yi )
i=1
d
X
= f x0 − f xd −
fi (x)(xi − yi )
i=1
d
X d
X
f xi−1 − f xi −
= fi (x)(xi − yi )
i=1 i=1
d
X
= fi (ξ˜i ) − fi (x) (xi − yi )
i=1
d
X
≤ fi (ξ˜i ) − fi (x) |xi − yi |
i=1
≤ dα max |xi − yi |
1≤i≤d
= dαkx − yk .
This completes the proof.
and
∆R F ≥ 0 for all R ∈ H , (9.2)
where
H = {(a1 , b1 ] × . . . × (ad , bd ] : −∞ < ai < bi < ∞ for i = 1, . . . , d} ,
X
∆R F = (−1)#{i:xi =ai } F (x1 , . . . , xd ) , (9.3)
(x1 ,...,xd )∈{a1 ,b1 }×...×{ad ,bd }
137
That is, sgn(x, R) is zero unless x is a vertex of R.
Rewrite (9.3) as
X
∆R F = sgn(x, R)F (x) .
x=(x1 ,...,xd )∈{a1 ,b1 }×...×{ad ,bd }
Let
d
Y
Rk1 ,...,kd = (ai,ki −1 , ai,ki ] , 1 ≤ k1 ≤ n1 , . . . , 1 ≤ kd ≤ nd . (9.4)
i=1
X n1
X nd
X
F (x) ... sgn (x, Rk1 ,...,kd ) , (9.6)
x∈A k1 =1 kd =1
Qd
where A = i=1 {ai,0 , ai,1 , . . . , ai,ni }. Let A0 = {a1 , b1 } × . . . × {ad , bd } and
observe that for x ∈ A0 , there exists unique k1 , . . . , kd such that
X X n1
X nd
X
sgn(x, R)F (x) + F (x) ... sgn (x, Rk1 ,...,kd ) .
x∈A0 x∈A\A0 k1 =1 kd =1
Since the first term above is the same as ∆R F , (9.5) would follow once it is
shown that
Xn1 nd
X
... sgn (x, Rk1 ,...,kd ) = 0 , x ∈ A \ A0 . (9.7)
k1 =1 kd =1
138
Thus for 1 ≤ k1 ≤ n1 , . . . , 1 ≤ kd ≤ nd , x is not a vertex of Rk1 ,...,kd by (9.4),
unless ki equals either ui or ui + 1, that is,
Further,
sgn x, Rk1 ,...,ki−1 ,ui ,ki+1 ,...,kd = − sgn x, Rk1 ,...,ki−1 ,ui +1,ki+1 ,...,kd .
(9.5) being used again in the last line. This completes the proof of Step 1.
Step 2. If R1 , R2 ∈ H and R1 ⊂ R2 , then ∆R1 F ≤ ∆R2 F .
Proof of Step 2. Follows from (9.2) and Step 1 by observing that R2 \ R1 =
S1 ∪ . . . ∪ Sn for some disjoint S1 , . . . , Sn ∈ H.
Step 3. If R = (a1 , b1 ] × . . . × (ad , bd ] ∈ H and for ε > 0, Rε = (a1 , b1 + ε] ×
. . . × (ad , bd + ε], then
lim ∆Rε F = ∆R F .
ε↓0
139
For the next several steps, fix n = (n1 , . . . , nd ) ∈ Zd and let
Ωn = (n1 − 1, n1 ] × . . . × (nd − 1, nd ] ,
and
Sn = {∅} ∪ {R ∈ H : R ⊂ Ωn } .
Step 5. The collection Sn is a semi-field on Ωn and µn : Sn → [0, ∞) defined
by
µn (R) = ∆R F , ∅ =
6 R ∈ Sn ,
and µn (∅) = 0 is a finitely additive set function.
Proof of Step 5. That Sn is a semi-field is immediate. Finite additivity of µn
follows from Step 1.
Step 6. Let Fn = {A1 ∪ . . . ∪ Ak : A1 , . . . , Ak ∈ Sn are disjoint}. Then Fn is a
field on Ωn . Extend µn to Fn by
k
X
µn (A1 ∪ . . . ∪ Ak ) = µn (Ai ) , A1 , . . . , Ak ∈ Sn are disjoint .
i=1
Then µn is well defined on Fn , that is, different representations yield the same
definition, is finitely additive on Fn , monotone on Fn , that is, µn (A) ≤ µn (B)
for A, B ∈ Fn with A ⊂ B and finitely sub-additive on Fn , that is,
k
X
µn (A1 ∪ . . . ∪ Ak ) ≤ µn (Ai ) , A1 , . . . , Ak ∈ Fn .
i=1
Proof of Step 6. That Fn is a field follows from Step 5 which says Sn is a semi-
field. If A1 , . . . , Ak ∈ Sn are disjoint and so are B1 , . . . , Bl ∈ Sn such that
A1 ∪ . . . ∪ Ak = B1 ∪ . . . ∪ Bl ,
then Step 5 shows
k
X k X
X l l
X
µn (Ai ) = µn (Ai ∩ Bj ) = µn (Bj ) .
i=1 i=1 j=1 j=1
140
Step 7. The set function µn is countably additive on Sn .
Proof of Step 7. Let R1 , R2 , . . . ∈ Sn be disjoint such that
R = R1 ∪ R2 ∪ . . . ∈ Sn .
Fix δ > 0. Use Step 3 to get εi > 0 such that ∆R̃i F ≤ ∆Ri F + 2−i δ where
141
Monotonicity and finite sub-additivity of µn shown in Step 6 implies
k
X
µn (R0 ) ≤ µn (R̃i ∩ Ωn )
i=1
k
X
= ∆R̃i ∩Ωn F
i=1
k
X
(Step 2) ≤ ∆R̃i F
i=1
X∞
∆Ri F + 2−i δ
(choice of εi ) ≤
i=1
∞
X
=δ+ µn (Ri ) .
i=1
Letting a01 ↓ a1 , . . . , a0d ↓ ad and using Step 4, (9.8) follows. This completes the
proof of Step 7.
Step 8. The set function µn can be extended to a measure on (Ωn , σ(Sn )).
Proof of Step 8. Follows from Step 7 and Corollary 1.1 of the Caratheodory
extension theorem.
Step 9. If X
µ(A) = µn (A ∩ Ωn ) , A ∈ B(Rd ) ,
n∈Zd
142
showing (9.9). To see that µ is Radon, for any compact set K ⊂ Rd , there exists
n ∈ N such that R = (−n, n]d ⊃ K. Thus
µ(K) ≤ µ(R) = ∆R F ,
by (9.9). This shows µ is a Radon measure and completes the proof of Step
9.
Step 10. The measure µ is the only measure on (Rd , B(Rd )) satisfying (9.9).
Proof of Step 10. Suppose µ0 is a measure on (Rd , B(Rd )) such that (9.9) holds
with µ replaced by µ0 . Then µ and µ0 agree on H, and hence on
( d
)
Y
d
S= R ∩ (ai , bi ] : −∞ ≤ ai ≤ bi ≤ ∞ ,
i=1
because for every set in S there exist sets in H increasing to the former. Further,
µ and µ0 are σ-finite on H and hence on S which is a semi-field that generates
B(Rd ). Corollary 1.1 shows µ and µ0 agree on B(Rd ), as claimed in Step 10.
Steps 9 and 10 complete the proof of the fact.
Remark 6. A function F satisfying (9.1) and (9.2) is not necessarily mono-
tonic. For example, F : R2 → R defined by
F (x, y) = xy ,
satisfies (9.1) and (9.2), and in fact induces the Lebesgue measure on R2 , though
F is not monotonic because
143
Applying Theorem 5.7 to the function f : R2 → R defined by f (x, y) = 1(x = y)
and using the independence of X and Y yields
where Z ∞
g(y) = 1(x = y)P (X ∈ dx) , y ∈ R .
−∞
A moment’s thought reveals that for all y ∈ R,
g(y) = P (X = y) = 0 ,
(10.1) implying the second equality. Thus g(Y ) = 0. Taking expectation of both
sides of (10.2) and using the tower property of conditional expectation show
E(Z) = 0 ,
HW 5/20
Suppose that X and Y are integrable random variables defined on a probability
space (Ω, A, P ) such that
E (X|σ(Y )) = Y, a.s. ,
E (Y |σ(X)) = X, a.s.
Thus,
Therefore,
144
because (Y − X)1(X ≤ c < Y ) ≥ 0.
Reversing the roles of X and Y , it can be shown that
Exc 4.9
If X1 , . . . , Xn are i.i.d. from standard normal, P is an m × n matrix with 1 ≤
m < n and P P T = Im , and
(Y1 , . . . , Ym ) = Y = P X ,
show that
m
X
Yi2 ∼ χ2m ,
i=1
n
X m
X
Xi2 − Yi2 ∼ χ2n−m ,
i=1 i=1
and
m
X n
X m
X
Yi2 , Xi2 − Yi2 are independent.
i=1 i=1 i=1
T
Soln.: The assumption P P = Im means that the m rows of P form an
orthonormal set in Rn . This orthonormal set can be extended to an orthonormal
basis of Rn . In other words, there exists an (n − m) × n matrix Q such that
P
R = [Q ] is an orthogonal matrix, that is RRT = I.
Define
[Z1 Z2 . . . Zn ]T = RX .
145
Since R is an orthogonal matrix, Z1 , . . . , Zn are i.i.d. from standard normal.
The definition of R implies Y1 = Z1 , . . . , Ym = Zm . Thus,
m
X m
X
Yi2 = Zi2 ∼ χ2m ,
i=1 i=1
Exc. 5.3
A random variable X is infinitely divisible if for all fixed n = 1, 2, . . ., there
exist i.i.d. random variables Xn1 , . . . , Xnn defined on some probability space
such that
d
X = Xn1 + . . . + Xnn .
If X is an infinitely divisible random variable with mean zero and variance one,
show that
E(X 4 ) ≥ 3 .
Soln.: Assume WLOG that
E(X 4 ) < ∞ , (10.4)
as there is nothing to prove otherwise. Fix n ≥ 2 and write
d
X = Xn1 + Y ,
where Y = Xn2 + . . . + Xnn . Use Exc 5.2 with p = 4, (10.4) and the fact that
Xn1 is independent of Y to infer
4
E(Xn1 ) < ∞.
d d
Since Xn1 = . . . = Xnn , it follows that
4
E(Xni ) < ∞ , i = 1, . . . , n .
An immediate consequence of the above is that Xn1 has finite mean and
variance. Further,
0 = E(X) = nE(Xn1 ) ,
146
shows E(Xn1 ) = 0. That Xn1 , . . . , Xnn are i.i.d. shows
1 = Var(X) = nVar(Xn1 ) ,
2 1
that is, E(Xn1 )= n.Proceed like in (6.3)-(6.4) to obtain
!4
Xn
E(X 4 ) = E Xni
i=1
4 2
2
= nE(Xn1 ) + 3n(n − 1) E(Xn1 ) (10.5)
2
2
≥ 3n(n − 1) E(Xn1 )
n−1
=3 .
n
Letting n → ∞ shows E(X 4 ) ≥ 3.
Exc 6.4
If Xn → Y in Lp for some 1 ≤ p < ∞ and Xn → Z a.s., show that
Y = Z a.s.
P
Soln.: Since Xn → Y in Lp , it follows that Xn −→ Y . As Xn → Z a.s.,
P
Xn −→ Z. Thus, Y and Z are both limits in probability of Xn . Hence Y = Z
a.s.
Exc 6.12
If X1 , X2 , X3 , . . . are independent random variables, then show that
∞
X
Xn → X a.s. ⇐⇒ P (|Xn − X| > ε) < ∞ for all ε > 0 .
n=1
Thus, X is σ(Xn , Xn+1 , Xn+2 , . . .)-measurable for every n. That is, X is mea-
surable with respect to
∞
\
T = σ(Xn , Xn+1 , Xn+2 , . . .) .
n=1
147
Kolmogorov’s zero-one law implies T is a trivial σ-field, and hence X is a de-
generate random variable. That is, there exists a ∈ R with X = a a.s.
The assumption thus becomes
Xn → a a.s.
Exc 6.14
If X1 , X2 , . . . are i.i.d. from standard exponential, and X(n,1) , . . . , X(n,n) are the
order statistics of X1 , . . . , Xn , then show that
Denoting
Zi = 1(Xi ≤ log 2 − ε) , i = 1, 2, . . . ,
it thus follows that
n
!
X
P X(n,[n/2]) ≤ log 2 − ε = P Zi ≥ [n/2]
i=1
n
!
eε
X
−(log 2−ε)
µ = E(Z1 ) = 1 − e =1− =P (Zi − µ) ≥ [n/2] − nµ
2 i=1
n
!
1X [n/2]
=P (Zi − µ) ≥ −µ .
n i=1 n
Since ε > 0,
1 [n/2]
µ< = lim ,
2 n→∞ n
148
[n/2]
µ< n for large n. For such n,
n
!
1X [n/2]
P (Zi − µ) ≥ −µ
n i=1 n
!4
n 4
1 X [n/2]
≤P (Zi − µ) ≥ −µ
n i=1 n
!4
−4 n
[n/2] X
≤ −µ n−4 E (Zi − µ) (Markov inequality)
n i=1
−4
[n/2]
n−4 nE (Z1 − µ)4 + 3n(n − 1)(Var(Z1 ))2 ,
= −µ
n
which shows
∞
X
P X(n,[n/2]) ≤ log 2 − ε < ∞ .
n=1
The above two inequalities in conjunction with Theorem 6.8 show that
Exc 6.15
1. If X1 , X2 , . . . are independent and αn → ∞ such that
n
1 X P
Xi −→ X as n → ∞ ,
αn i=1
149
2. Hence or otherwise, prove that if X1 , X2 , . . . are i.i.d. from standard nor-
mal, then there does not exist a random variable Z such that
n
1 X P
√ Xi −→ Z , n → ∞ .
n i=1
Soln.:
1. Since
n
1 X P
Xi −→ X as n → ∞ ,
αn i=1
there exist 1 ≤ n1 < n2 < . . . such that as k → ∞,
nk
1 X
Xi → X a.s.
αnk i=1
WLOG, assume
nk
1 X
X = lim sup Xi .
k→∞ αnk i=1
Since αn → ∞, for any fixed n = 1, 2, . . .,
n−1
1 X
Xi → 0 , k → ∞ .
αnk i=1
Hence
nk
1 X
X = lim sup Xi ,
k→∞ αnk i=n
150
converges in probability and hence in distribution to a degenerate random
variable. This is a contradiction because
n
1 X
√ Xi ∼ N (0, 1) for n = 1, 2, . . . ,
n i=1
Exc 7.5
Use Corollary 7.3 to give a fifth proof of the fact
Z ∞ √
2
e−x /2 dx = 2π .
−∞
Soln.: Suppose for this exercise that we didn’t know the value of
Z ∞
2
e−x /2 dx .
−∞
Thus,
1 −x2 /2
f (x) = e ,x ∈ R,
c
is a density.
Let µ be the probability measure on R whose density is f . Then, the MGF
of µ is
Z ∞
ψ(t) = etx f (x) dx
−∞
1 ∞ tx−x2 /2
Z
= e dx
c −∞
Z ∞
1 2 2
= et /2 e−(x−t) /2 dx
c −∞
2
= et /2
.
151
Since µ is a probability measure with a continuous density f and the CHF
φ of µ is integrable on R, Corollary 7.3 implies
Z ∞
eιtx φ(t) dt = 2πf (x) , x ∈ R .
−∞
Putting x = 0 yields
Z ∞
2 2π
e−t /2
dt = 2πf (0) = .
−∞ c
Since the extreme left hand side equals c, the above implies
2π
c= ,
c
√
that is, c = 2π. This completes the solution.
Exc 7.6
Show that a Nd (µ, Σ) distribution has a density if and only if Σ is p.d.
Soln.: If Σ is p.d., then it is known that the density of Nd (µ, Σ) is
−1/2 1
f (x) = (2π)d det(Σ) exp − (x − µ)T Σ−1 (x − µ) , x ∈ Rd .
2
If X ∼ Nd (µ, Σ) and Σ is not p.d., then there exists λ ∈ Rd \ {0} with
λT Σλ = 0 .
In other words, Var(λT X) = 0. Thus,
X ∈ {x ∈ Rd : λT x = λT µ} a.s.
Since {x ∈ Rd : λT x = λT µ} is a set of zero Lebesgue measure in Rd , X cannot
have a density.
Exc 8.14
Show that the Lindeberg condition (8.29) is implied by the Lyapunov condition
n
X
E |Xni |2+δ = 0 for some δ > 0 .
lim
n→∞
i=1
152
Remark 7. Convince yourself, using a variant of Example 6.2, that the Lya-
punov condition is not implied by the Lindeberg condition.
Exc 8.16
Suppose X is as in Exc 5.3, that is, it is infinitely divisible, E(X) = 0 and
Var(X) = 1. Show that
E(X 4 ) = 3 ⇐⇒ X ∼ N (0, 1) .
Soln.: The “⇒ part” is the only one which needs a proof, as we know that the
fourth moment of standard normal is 3. Assume E(X 4 ) = 3. For n ≥ 1, let
Xn1 , . . . , Xnn be i.i.d. such that
d
X = Xn1 + . . . + Xnn .
Clearly, Xni has mean zero and variance 1/n for n = 1, 2, . . . and i = 1, . . . , n.
Recall (10.5) to conclude
4 1 n−1
E(Xn1 )= E(X 4 ) − 3
n n
3
= 2.
n
Thus,
n
X
4 3
E(Xni )= → 0,n → ∞.
i=1
n
Exc 8.14 shows that the conditions of Lindberg CLT hold with σ 2 = 1. There-
fore,
Xn
Xni ⇒ Z ,
i=1
d
where Z ∼ N (0, 1). It is obvious that Z = X, that is, X follows standard
normal. This proves the “⇒ part”.
Exc 8.17
A coin with probability of head p ∈ (0, 1) is tossed infinitely many times. Let
Xn be the number of the toss on which the n-th head is obtained. Show that
n
n−1/2 Xn − ⇒Z,
p
Zi = Xi − Xi−1 , i ∈ N .
153
Thus for n = 1, 2, . . ., Xn = Z1 + . . . + Zn . For n ≥ 1 and k1 , . . . , kn ∈ N, the
event [Z1 = k1 , . . . , Zn = kn ] simply means that in the first k1 + . . . + kn tosses,
the heads occur at the tosses number k1 , k1 + k2 , . . . , k1 + . . . + kn and rest are
tails. Thus,
P (Z1 = k1 , . . . , Zn = kn ) = pn q k1 +...+kn −n , k1 , . . . , kn ∈ N ,
Since
∞
X p
pq k−1 = = 1,
1−q
k=1
P (Z1 = k) = pq k−1 , k ∈ N ,
Exc 8.18
There are k boxes numbered 1, . . . , k and an infinite supply of balls. The balls
are thrown, one by one, randomly into one of the boxes. Let Xn1 , . . . , Xnk
denote the number of balls in Boxes 1, . . . , k, respectively, after the first n balls
are thrown. Show that
n n
n−1/2 Xn1 − , . . . , Xnk − ⇒ (Z1 , . . . , Zk ) ,
k k
where (Z1 , . . . , Zk ) ∼ Nk (0, Σ) for some k × k matrix Σ. Calculate Σ.
154
Soln.: Define
Exc 8.22
Suppose Xn are random variables with all moments finite such that
155
The hypothesis is that
Z
lim xk µn (dx) = mk , k ≥ 1 . (10.8)
n→∞ R
which is equivalent to
inf µn ([−T, T ]) ≥ 1 − ε .
n≥1
µnij ⇒ ν , j → ∞ , (10.9)
for some probability measure ν on R. The claim would follow once it is shown
that ν = µ. Showing that m1 , m2 , m3 , . . . are the moments of ν would suffice for
that because the hypothesis includes that µ is the unique measure with those
as moments. The first step, however, is to show that all the moments of ν are
finite, towards which we shall now proceed.
To show that all moments of ν are finite, it suffices to prove it for even
moments, that is, for k = 1, 2, . . .,
Z ∞
x2k ν(dx) < ∞ .
−∞
An implication of (10.9) is
Z Z
lim f (x)µnij (dx) = f (x)ν(dx) ,
j→∞ R R
156
for all bounded continuous f : R → R. Fix k = 1, 2, . . . and 0 < T < ∞ and
derive from the above that
Z Z
2k 2k
(|x| ∧ T ) ν(dx) = lim (|x| ∧ T ) µnij (dx)
R j→∞ R
Z
≤ lim x2k µnij (dx)
j→∞ R
= m2k ,
MCT implies
Z Z
2k 2k
x ν(dx) = lim (|x| ∧ T ) ν(dx) ≤ m2k < ∞ .
R T →∞ R
Thus, the even moments of ν are finite, and hence so are odd moments as well.
In the next step, we shall show that for k = 1, 2, . . ., the k-th moment of ν
is mk . Fix k = 1, 2, . . . and define for all T > 0, fT : R → R by
k
x ,
|x| ≤ T ,
k
fT (x) = (−T ) , x < −T ,
k
T , x>T.
Since |fT (x)| ≤ |x|k and |x|k is integrable with respect to ν, DCT shows that
Z Z
lim fT (x)ν(dx) = xk ν(dx) .
T →∞ R R
In order to infer from (10.10) that the k-th moment of µnij converges to
that of ν, an estimate of the above form with ν replaced by µnij is also needed,
uniformly over j. To that end, observe that for any x ∈ R and T > 0,
157
Thus, for n = 1, 2, . . .,
Z Z
k
fT (x) − x µn (dx) ≤ 2 |x|k µn (dx)
R {x:|x|>T }
Z
≤ 2T −k x2k µn (dx) ,
R
Thus (10.10) and (10.11), in conjunction with the above and an approximation
1/k
of xk by fT (x) for T = T0 ∨ 2Cε−1 , imply that
Z Z
lim sup xk µnij (dx) − xk ν(dx) ≤ 2ε .
j→∞ R R
References
Billingsley, P. (1995). Probability and Measure. Wiley, New York, 3rd edition.
Rudin, W. (1987). Real and Complex Analysis. McGraw Hill Book Company,
third edition.
158