0% found this document useful (0 votes)

14 views7 pages

Lec 6

Central Limit Theorem

Uploaded by

nnguyen22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views7 pages

Lec 6

Central Limit Theorem

Uploaded by

nnguyen22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

36-705: Intermediate Statistics Fall 2019

Lecture 6: September 9
Lecturer: Siva Balakrishnan

Today we will start off by deriving some of the implications between the different modes of
convergence. Then we will “prove” the CLT.

6.1 Quadratic mean =⇒ convergence in probability

Suppose that X1 , . . . , Xn converges in quadratic mean to X, then fix an > 0,

2 E(Xn − X)2
2
P(|Xn − X| ≥ ) = P(|Xn − X| ≥ ) ≤ → 0,
2
showing convergence in probability.
At a high-level the convergence in qm requirement penalizes Xn for having large deviations
from X by both how frequent the deviation is but also by the magnitude of the deviation. On
the other hand convergence in probability only penalizes you for how frequent the deviation
is and hence is a weaker notion of convergence.
√
Counterexample to reverse: Suppose we take U ∼ U [0, 1] and define Xn = nI[0,1/n] (U ),
then Xn converges in probability to 0 but does not converge in quadratic mean to 0.
To see this:
√ 1
P(|Xn | ≥ ) = P( nI[0,1/n] (U ) ≥ ) = P(U ∈ [0, 1/n]) = → 0.
n
On the other hand,

E(Xn − X)2 = EXn2 = nP(U ∈ [0, 1/n]) = 1.

Observe that most of the time the RV Xn takes the value 0, but when it does not it takes a
huge value.

6.1.1 Convergence in probability =⇒ convergence in distribution

This one is a little bit involved but perhaps also useful to know. The idea roughly is to trap
the CDF of Xn by the CDF of X with an interval whose length converges to 0.

6-1
6-2 Lecture 6: September 9

We fix a point x where the CDF FX (x) is continuous. Choose an arbitrary > 0. We have
that,
FXn (x) = P(Xn ≤ x) = P(Xn ≤ x, X ≤ x + ) + P(Xn ≤ x, X ≥ x + )
≤ P(X ≤ x + ) + P(|Xn − X| ≥ )
= FX (x + ) + P(|Xn − X| ≥ ).

Now,
FX (x − ) = P(X ≤ x − ) = P(X ≤ x − , Xn ≤ x) + P(X ≤ x − , Xn ≥ x)
≤ FXn (x) + P(|Xn − X| ≥ ).

Putting these two together we have,

FX (x − ) − P(|Xn − X| ≥ ) ≤ FXn (x) ≤ FX (x + ) + P(|Xn − X| ≥ ).
Intuitively, now as n gets large the two probabilities converge to 0, and since was chosen
arbitrarily we can let → 0 and use the continuity of FX (x) at x to conclude that FXn (x) →
FX (x).
Slightly more rigorously, we cannot assume that the limit of FXn (x) exists so we instead
need to use lim infs and lim sups (do not worry about this if you have not seen it before).
Formally, we would take the lim sup of the first half to obtain that,
lim sup FXn (x) ≤ FX (x + ),
n→∞

and similarly that,

lim inf FXn (x) ≥ FX (x − ),
n→∞

and conclude that,

FX (x − ) ≤ lim inf FXn (x) ≤ lim sup FXn (x) ≤ FX (x + ).
n→∞ n→∞

Now since > 0 was arbitrary, we can take the limit as → 0 and use continuity to conclude
the desired convergence in distribution.
Counterexample to reverse: This of course is almost trivial since two random variables
having the same distribution does not in any sense mean that they are close (see Lecture 4
notes for an example).
An important caveat: An important exception is that when X is deterministic then
convergence in distribution implies convergence in probability. Concretely, fix > 0, consider
the case when X = c, then
P(|Xn − c| > ) = P(Xn > + c) + P(Xn < c − )
= FXn (c − ) + 1 − FXn (c + )
→ FX (c − ) + 1 − FX (c + ) = 0.
Lecture 6: September 9 6-3

using convergence in distribution and the fact that at both c + , and c − , the distribution
function FX is continuous.

6.2 Other things that are very useful to know

6.2.1 Continuous mapping theorem

If a sequence X1 , . . . , Xn converges in probability to X then for any continuous function

h, h(X1 ), . . . , h(Xn ) converges in probability to h(X). The same is true for convergence in
distribution.
This is useful because often we will have a consistent estimator for some parameter, and this
theorem allows to construct estimators for some function of the parameter in a straightfor-
ward way.

6.2.2 Slutsky’s theorem

There are some important consequences of the fact that convergence in distribution is weaker
than convergence in probability.
Concretely, for convergence in probability (and stronger forms of convergence) it is the case
that, if Xn converges in probability to X and Yn converges in probability to Y then Xn + Yn
converges in probability to X + Y , and the same is true of products, i.e. Xn Yn converges in
probability to XY .
These statements are not true for convergence in distribution, i.e. if Xn converges in dis-
tribution to X and Yn converges in distribution to Y then Xn + Yn does not necessarily
converge in distribution to X + Y .
The one exception to this is known as Slutsky’s theorem. It says that if Yn converges in
distribution to a constant c, and X converges in distribution to X: then Xn + Yn converges
in distribution to X + c and Xn Yn converges in distribution to cX.

6.2.3 Convergence of moments is not implied by convergence in

probability

Convergence in probability is actually quite weak as a form of convergence. We have seen

previously that it does not imply quadratic mean convergence. Now we will see that it does
not even imply something much simpler.
6-4 Lecture 6: September 9

If we have Xn converges in probability to some constant c, then it is not the case that E[Xn ]
converges to c.
Here is an example of this non-convergence. Let Xn be 0 with probability 1 − 1/n and n2
with probability 1/n. Then Xn converges to 0 in probability, but E[Xn ] = n → ∞.
This is a manifestation of the same phenomena as we saw in the counterexample to qm
convergence. On the events when |Xn | ≥ it has a huge value and this affects the moments
but does not affect the convergence in probability (which only cares about how frequent this
violation is).

6.3 The central limit theorem

We will now state and prove a form of the central limit theorem, which is one of the most
famous examples of convergence in distribution.
Let X1 , X2 , . . . , Xn be a sequence of independent random variables with mean µ and variance
σ 2 . Assume that the mgf E[exp(tXi )] is finite for t in a neighborhood around zero. Let
√
µ − µ)
n(b
Sn = ,
σ
then Sn converges in distribution to Z ∼ N (0, 1).
Comments:

1. The central limit theorem is incredibly general. It does not matter what the distribution
of Xi is, the average Sn converges in distribution to a Gaussian (under fairly mild
assumptions).
2. The most general version of the CLT does not require any assumption about the mgf.
It just requires that the mean and variance are finite. We will prove this weaker version
in lecture.

6.3.1 Use Case

We should try to understand why the CLT might be useful. Roughly, the CLT allows to
make approximate probability statements about averages using corresponding statements
about standard normals. At a high-level instead of using a different tail bound for different
types of averages (sub-Gaussian, sub-exponential, bounded etc.) we can now just use the
Gaussian CDF although our results will only be approximate.
I will introduce a simple use case: we will discuss this idea again later on in more detail
when we discuss confidence intervals.
Lecture 6: September 9 6-5

Suppose for now that we are averaging i.i.d. RVs with known variance (and unknown mean
µ). Typically one would also estimate the variance but this will not change much. We would
like to construct a confidence interval for the unknown mean. For some parameter α this is
an interval Cα such that,

P(µ ∈ C) ≥ 1 − α.

One might guess that we would center such an interval around the sample average µ
b but the
main difficulty is that we do not know the distribution of µ
b. We can see that,

P(µ ∈ [b
µ − t, µ µ − µ| ≤ t).
b + t]) = P(|b

So we would like to choose t to make this probability at least 1 − α. One can construct such
intervals using tail bounds (see HW3) but we will instead construct an approximate interval
using the CLT. Using the CLT we know that distribution of µ b − µ converges to a normal
2
with mean 0, and variance σ /n, i.e.
√
nt
µ − µ| ≤ t) ≈ P |Z| ≤
P(|b .
σ

Now, if we let Φ(x) = P(Z ≤ x) denote

√ the standard
√ normal CDF, then we can see that we
−1
need to choose t = σΦ (1 − α/2)/ n := σzα/2 / n.
To summarize, the interval:

σzα/2 σzα/2
Cα = µb− √ ,µ b+ √ ,
n n

has the property that,

P(µ ∈ C) ≈ 1 − α,

where we appealed to the CLT to justify this construction.

6.3.2 Preliminaries

Sanity Checks: Before we prove the theorem there are two very simple sanity checks that
one might consider. The random variable Sn has mean 0, and variance:
n
E[Sn2 ] = µ − µ)2 = 1.
E(b
σ2
√
So in some sense the normalizations (of subtracting µ and √ dividing by σ/ n) make sense.
You should convince yourself that if you did not multiply by n this would have a degenerate
6-6 Lecture 6: September 9

√
limit (i.e. would converge in distribution to a point mass at 0). Multiplying by n is
enlarging the fluctuations of the average around the expectations at just the right rate.
The other sanity check is to just notice that if X1 , . . . , Xn ∼ N (µ, σ 2 ), then Sn would have
distribution exactly equal to that of Z. Roughly, if there was going to be a “universal
limit” i.e. if the average was going to converge to a single distribution (irrespective of the
distribution of X) then it has a to be a Gaussian distribution (just because we know that
the average of Gaussians is Gaussian).
Calculus with mgfs: We need a few simple facts about mgfs that we will quickly prove.
Fact 1: If X and Y are independent with mgfs MX and MY then Z = X + Y has mgf
MZ (t) = MX (t)MY (t).
Proof: We note that,

MZ (t) = E[exp(t(X + Y )] = E[exp(tX)]E[exp(tY )],

using independence.
Fact 2: If X has mgf MX then Y = a + bX has mgf, MY (t) = exp(at)MX (bt).
Proof: We just use the definition,

MY (t) = E[exp(at + btX)] = exp(at)E[exp(btX)].

Fact 3: We will not prove this one (strictly speaking one needs to invoke the dominated
convergence theorem) but it should be familiar to you. The derivative of the mgf at 0 gives
us moments, i.e.
(r)
MX (0) = E[X r ].

Fact 4: The most important result that we also will not prove is that we can show
convergence in distribution by showing convergence of the mgfs.
Formally, let X1 , . . . , Xn be a sequence of RVs with mgfs MX1 , . . . , MXn . If for all t in an
open interval around 0 we have that, MXn (t) → MX (t), then Xn converges in distribution
to X.

6.3.3 Proof

We will follow the proof from John Rice’s (Math Stat and Data Analysis) textbook. Larry’s
notes have a nearly identical proof. First we recall that the mgf of a standard normal is
simply MZ (t) = exp(t2 /2).
Lecture 6: September 9 6-7

Note that,
n
t
MSn (t) = M(X−µ) √ ,
σ n

√
using Facts 1 and 2. Now, one should imagine t as small and fixed so t/(σ n) is quite close
to 0. Taylor expanding the mgf around 0, and using Fact 3 we obtain
n
t2 t3

t 2 3
MSn (t) = 1 + √ E(X − µ) + E(X − µ) + 3/2 3 E(X − µ) + . . .
σ n 2nσ 2 6n σ
2 n

t
≈ 1+ → exp(t2 /2),
2n

using the fact that,

lim (1 + x/n)n → exp(x).

n→∞

Shreve Stochcal4fin 2
No ratings yet
Shreve Stochcal4fin 2
99 pages
Chapter 11 Network Models: Quantitative Analysis For Management, 11e (Render)
No ratings yet
Chapter 11 Network Models: Quantitative Analysis For Management, 11e (Render)
32 pages
Placer Gold Operations Manual
100% (1)
Placer Gold Operations Manual
178 pages
Đ Án CSXS
No ratings yet
Đ Án CSXS
28 pages
Convergence of Random Variables
No ratings yet
Convergence of Random Variables
11 pages
Limiting Distributions
No ratings yet
Limiting Distributions
10 pages
Lecture Notes 4 Convergence (Chapter 5) 1 Random Samples: 1 N N 1 N N I
No ratings yet
Lecture Notes 4 Convergence (Chapter 5) 1 Random Samples: 1 N N 1 N N I
12 pages
Lecture 7: Convergence and Limit Theorems
No ratings yet
Lecture 7: Convergence and Limit Theorems
23 pages
Chap 4
No ratings yet
Chap 4
12 pages
Ee5110 Lecture Limit Theorems
No ratings yet
Ee5110 Lecture Limit Theorems
9 pages
Lecture 1: Stochastic Convergence and CLT
No ratings yet
Lecture 1: Stochastic Convergence and CLT
102 pages
Chapter3 Asymtotic Stats
No ratings yet
Chapter3 Asymtotic Stats
114 pages
Section 53
No ratings yet
Section 53
35 pages
ch5 Handout
No ratings yet
ch5 Handout
6 pages
Convergence Concepts: 2.1 Convergence of Random Variables
No ratings yet
Convergence Concepts: 2.1 Convergence of Random Variables
6 pages
Lect 05
No ratings yet
Lect 05
22 pages
Math556 11 ModesOfConvergence
No ratings yet
Math556 11 ModesOfConvergence
9 pages
Lecture Note 4
No ratings yet
Lecture Note 4
8 pages
Various Modes of Convergence: Definitions
No ratings yet
Various Modes of Convergence: Definitions
6 pages
Đ Án CSXS
No ratings yet
Đ Án CSXS
89 pages
BSDS Slides-Week9
No ratings yet
BSDS Slides-Week9
6 pages
Recitation 1
No ratings yet
Recitation 1
10 pages
Covergence
No ratings yet
Covergence
18 pages
Math5846 Chapter6
No ratings yet
Math5846 Chapter6
85 pages
Math408 Lecture 9 10
No ratings yet
Math408 Lecture 9 10
17 pages
Random Variables
No ratings yet
Random Variables
8 pages
Lec 4
No ratings yet
Lec 4
8 pages
Lec 7
No ratings yet
Lec 7
7 pages
Asymptotic OLS
No ratings yet
Asymptotic OLS
44 pages
Chapter 5 Limit Theorems
No ratings yet
Chapter 5 Limit Theorems
31 pages
CH 2
No ratings yet
CH 2
24 pages
4 Convergence and Simulation
No ratings yet
4 Convergence and Simulation
55 pages
Central Limit Theorem
No ratings yet
Central Limit Theorem
7 pages
Two Proofs of The Central Limit Theorem
No ratings yet
Two Proofs of The Central Limit Theorem
13 pages
Convergence of Random Variables - Wikipedia
No ratings yet
Convergence of Random Variables - Wikipedia
17 pages
CLT PDF
No ratings yet
CLT PDF
13 pages
Data Analysis Slides
No ratings yet
Data Analysis Slides
43 pages
Consistency of Estimators: Guy Lebanon May 1, 2006
No ratings yet
Consistency of Estimators: Guy Lebanon May 1, 2006
2 pages
Introduction To Probability Theory
No ratings yet
Introduction To Probability Theory
12 pages
Random Variables: 1.1 Elementary Examples
No ratings yet
Random Variables: 1.1 Elementary Examples
14 pages
Chapter 7
No ratings yet
Chapter 7
17 pages
Approximations To Probability Distributions: Limit Theorems
No ratings yet
Approximations To Probability Distributions: Limit Theorems
15 pages
확통1 LectureNote06 on Limit Theorems
No ratings yet
확통1 LectureNote06 on Limit Theorems
36 pages
Introduction To Probability Theory
No ratings yet
Introduction To Probability Theory
8 pages
Econ 623 AsymptoticTheory 2023
No ratings yet
Econ 623 AsymptoticTheory 2023
74 pages
Convergence of Random Variables
No ratings yet
Convergence of Random Variables
7 pages
CF Notes
No ratings yet
CF Notes
7 pages
Notes 2
No ratings yet
Notes 2
10 pages
03 Asym - Ipynb Econ Prob
No ratings yet
03 Asym - Ipynb Econ Prob
3 pages
Random Signals: 1 Kolmogorov's Axiomatic Definition of Probability
No ratings yet
Random Signals: 1 Kolmogorov's Axiomatic Definition of Probability
14 pages
Probability and Stochastic Process 51
No ratings yet
Probability and Stochastic Process 51
11 pages
Central Limit Theorem
No ratings yet
Central Limit Theorem
5 pages
Lec 3
No ratings yet
Lec 3
2 pages
Introduction To Probability Theory
No ratings yet
Introduction To Probability Theory
13 pages
December 2, 2020
No ratings yet
December 2, 2020
38 pages
Stochastic Convergence
No ratings yet
Stochastic Convergence
20 pages
Convergence in Probability
No ratings yet
Convergence in Probability
10 pages
MIT14 30s09 Lec17
No ratings yet
MIT14 30s09 Lec17
9 pages
ემპირიული პროცესები
No ratings yet
ემპირიული პროცესები
131 pages
CH 4
No ratings yet
CH 4
39 pages
Converge On Probability Converges Almost Certainly Weak Convergence
No ratings yet
Converge On Probability Converges Almost Certainly Weak Convergence
113 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Decision Trees in Managerial Decision Making
No ratings yet
Decision Trees in Managerial Decision Making
5 pages
Accounting - Study Plan
No ratings yet
Accounting - Study Plan
1 page
Schneider Ecofit - Low and Medium Voltage Distribution Switchboards FPX
No ratings yet
Schneider Ecofit - Low and Medium Voltage Distribution Switchboards FPX
150 pages
SEP-OPE-OHSF1-SC03-00001 Ohaji South EPF Heater Treater Installation Project - FEED SOW - A01 - Signed
100% (1)
SEP-OPE-OHSF1-SC03-00001 Ohaji South EPF Heater Treater Installation Project - FEED SOW - A01 - Signed
36 pages
Wholesalers in Ethiopia
No ratings yet
Wholesalers in Ethiopia
25 pages
L9 Adsorption 2024
No ratings yet
L9 Adsorption 2024
15 pages
Mve 200 - 3
No ratings yet
Mve 200 - 3
2 pages
Course Catalogue and Timetable 1st-2nd Semester - Dept of Engineering Enzo Ferrari - AA2023-2024
No ratings yet
Course Catalogue and Timetable 1st-2nd Semester - Dept of Engineering Enzo Ferrari - AA2023-2024
2 pages
Holacracy - The New Management System
No ratings yet
Holacracy - The New Management System
11 pages
KRK-rpg2 Manual PDF
No ratings yet
KRK-rpg2 Manual PDF
20 pages
Robotics Perception Week 3 Assignment
No ratings yet
Robotics Perception Week 3 Assignment
6 pages
Control Room - Updated 18th May PDF
No ratings yet
Control Room - Updated 18th May PDF
1 page
Cat Global Catalog Loctite
100% (1)
Cat Global Catalog Loctite
47 pages
Circlet For Edinburgh
100% (1)
Circlet For Edinburgh
2 pages
EE 432/532 Diffusion Examples - 1
No ratings yet
EE 432/532 Diffusion Examples - 1
13 pages
Alnpp0187h 2024
No ratings yet
Alnpp0187h 2024
8 pages
97-680 Multiprime
No ratings yet
97-680 Multiprime
2 pages
Exim Bank Claim Form
No ratings yet
Exim Bank Claim Form
9 pages
Eir December 2019
No ratings yet
Eir December 2019
1,937 pages
Retail Management
No ratings yet
Retail Management
8 pages
Spring 2023 INT 500 - Syllabus (Marketing - Sales)
No ratings yet
Spring 2023 INT 500 - Syllabus (Marketing - Sales)
22 pages
Chapter 4 Flexural Design - (Part 3)
No ratings yet
Chapter 4 Flexural Design - (Part 3)
37 pages
Httpssimplifydays.s3.Us West 2.amazonaws - Comsimplifybook Video4 PDF
No ratings yet
Httpssimplifydays.s3.Us West 2.amazonaws - Comsimplifybook Video4 PDF
7 pages
IASC Template
No ratings yet
IASC Template
7 pages
KDP Amazon
100% (1)
KDP Amazon
7 pages
Exercises On Connectors
No ratings yet
Exercises On Connectors
4 pages
Type VR Vacuum Circuit Breaker Interruptor Automático Al Vacío Tipo VR Disjoncteur Sous Vide Type VR
No ratings yet
Type VR Vacuum Circuit Breaker Interruptor Automático Al Vacío Tipo VR Disjoncteur Sous Vide Type VR
113 pages
AMV in Pharma
No ratings yet
AMV in Pharma
13 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lec 6

Uploaded by

Lec 6

Uploaded by

36-705: Intermediate Statistics Fall 2019

6.1 Quadratic mean =⇒ convergence in probability

E(Xn − X)2 = EXn2 = nP(U ∈ [0, 1/n]) = 1.

6.1.1 Convergence in probability =⇒ convergence in distribution

Putting these two together we have,

and similarly that,

and conclude that,

6.2 Other things that are very useful to know

6.2.1 Continuous mapping theorem

If a sequence X1 , . . . , Xn converges in probability to X then for any continuous function

6.2.2 Slutsky’s theorem

6.2.3 Convergence of moments is not implied by convergence in

Convergence in probability is actually quite weak as a form of convergence. We have seen

6.3 The central limit theorem

6.3.1 Use Case

Now, if we let Φ(x) = P(Z ≤ x) denote

has the property that,

where we appealed to the CLT to justify this construction.

MZ (t) = E[exp(t(X + Y )] = E[exp(tX)]E[exp(tY )],

MY (t) = E[exp(at + btX)] = exp(at)E[exp(btX)].

using the fact that,

lim (1 + x/n)n → exp(x).

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.