Probability II Upload Week 9
Probability II Upload Week 9
Let {Xn }n≥1 be a sequence of random variables, and X be another random variable defined on
the same probability space. We will discuss only real-valued random variables.
Convergence in quadratic mean (q.m.): Suppose that {Xn }n≥1 and X are such that E[Xn2 ] < ∞
for all n ≥ 1 and E[X 2 ] < ∞. Then, the sequence {Xn }n≥1 converges to X in q.m. if
h i
lim E (Xn − X)2 = 0.
n→∞
q.m. L2
This is denoted by Xn → X (or, Xn → X).
q.m. q.m.
By definition, Xn → X if and only if Xn − X → 0 as n → ∞.
Convergence in probability (P): The sequence {Xn }n≥1 converges to X in probability, denoted
P
by Xn → X if
lim P {ω : |Xn (ω) − X(ω)| > ε} = 0
n→∞
Exercise 1. Let {Xn }n≥1 be a sequence of random variables with E (Xn ) = µn and Var (Xn ) = σn2
P
for n = 1, 2, . . . Suppose that limn→∞ µn = µ and limn→∞ σn2 = 0. Then, Xn → µ as n → ∞.
1
Transformations: Let h : R → R be a continuous function.
P P
(1) If Xn −
→ X, then h(Xn ) −
→ h(X) as n → ∞.
q.m.
(2) If Xn −−→ X, then h(Xn ) does not necessarily converge to h(X) in q.m.
(Think of a counter-example!)
2
2. W EAK LAW OF LARGE NUMBERS
Let X1 , X2 , . . . be i.i.d random variables (independent random variables each having the same
marginal distribution). Assume that the second moment of X1 is finite. Then, µ = E[X1 ] and
σ 2 = Var(X1 ) are well-defined. (Why?)
Sn X1 +···+Xn
Let Sn = X1 + · · · + Xn (partial sums) and X̄n = n = n (sample mean). Then, by the
properties of expectation and variance, we have
σ2
E[Sn ] = nµ, Var(Sn ) = nσ12 , E[X̄n ] = µ, and Var(X̄n ) = .
n
√
In particular, s.d.(X̄n ) = σ/ n decreases with n. If we apply Chebyshev’s inequality to X̄n , we
get for any δ > 0 that
σ2
0 ≤ P{|X̄n − µ| ≥ δ} ≤ .
δ2n
This goes to zero as n → ∞ (with δ > 0 being fixed). This means that for large n, the sample mean
is unlikely to be far from µ (sometimes called the “population mean”). This is consistent with our
intuitive idea that if we toss a coin (with probability of head p) many times, we can get a better
guess of what the value of p is.
Weak law of large numbers (Jacob Bernoulli): With the above notations, for any δ > 0, we have
σ2
0 ≤ P{|X̄n − µ| ≥ δ} ≤ → 0 as n → ∞.
δ2n
This is very general, in the sense that we only assume the existence of variance. If Xk are assumed
to have more moments, one can get better bounds. For example, when Xk are i.i.d. Ber(p), we
have the following theorem.