Lecture 4 - Part III
Lecture 4 - Part III
1 / 49
Outline
References
Chapter 2: Weak Law of Large Numbers and the Central Limit Theorem
2 / 49
Textbooks
Larsen, R., & Marx, M. (2012). Introduction to mathematical statistics and its
applications. Pearson.
Stock, J., & Watson, M. (2015). Introduction to econometrics. Pearson.
3 / 49
Textbook Clarification
I Larsen and Marx (2012) is a more advanced auxiliary text if you want to go
beyond the material here.
4 / 49
Chapter 2: Normal, χ-Squared, t and F Distributions
5 / 49
The Normal Distribution
6 / 49
The Normal Distribution
2πσX2
I and 95% of its probability fall into the interval [µX − 1.96σX ; µX + 1.96σX ]
7 / 49
8 / 49
The Standard Normal Distribution
9 / 49
10 / 49
The Standard Normal Distribution
I What does that mean?
c−µY
I and d is the standardized value of c: d = σY
13 / 49
Probability Density Functions of χ2 Distributions
14 / 49
The Student t Distribution
15 / 49
Probability Density Functions of t Distributions
16 / 49
The F Distribution
17 / 49
Probability Density Functions of F Distributions
18 / 49
Chapter 2: Random Sampling and the Distribution of the
Sample Average
19 / 49
Sample vs Population
The key idea in statistics is that we use information from a sample in order to
obtain insights about the population!
20 / 49
I Almost all the techniques discussed later on involve averages or weighted
averages of data from a sample.
I This section first talks about random sampling and then the distribution of the
sample mean
21 / 49
Random Sampling
I The simplest form of random sampling is to randomly select n units from the
population and each unit is selected with the same probability
I Example:
I If we randomly select days on which we record our commuting time, ...
I then we obtain a series of (y1 , . . . , yn ) records of commuting time.
I Each yi is a realization of random variable Yi which is the commuting time on
day i.
I and since we chose the days at random (in advance), the commuting time on
one day provides no information about commuting time on another day.
I Which means that the sequence of random variables (Y1 , . . . , Yn ) are
independent.
22 / 49
Independently and identically distributed (i.i.d)
I The previous example explains why simple random sampling yields a sample
which observations are independently and identically distributed
23 / 49
The Sampling Distribution of the Sample Average
24 / 49
Mean and Variance of y
I Since y is a random variable we can calculate the expected value (mean) and
variance:
n
1X 1
E[y ] = E[yi ] = nµY = µY
n n
i=1
I Here we use the fact that each yi was drawn from the same distribution and
thus has the same mean µY
25 / 49
For the variance we have:
n
" #
1X
Var [y ] = Var yi =
n
i=1
n n n n
1 X 1 X X 1 X
= Var [y i ] + Cov [y ,
i jy ] = Var [yi ]
n2 n2 n2
i=1 i=1 j=1,j6=i i=1
1 2 σY2
= nσ =
n2 Y n
σY
For the standard deviation we take the square root: σy = √
n
26 / 49
Mean and Variance of y
Let’s state the results again:
I The expected value of the sample mean (y ) is given by:
E[y ] = µY
I the variance and standard deviation of the sample mean (y ) are given by:
σY2
Var [y ] = σy2 =
n
σY
Std.Dev [y ] = σy = √
n
I These results hold whatever the common distribution of the random variables
Y1 , ..., Yn is! If it where a normal distribution we could even say that
" #
σY2
y ∼ N µY ,
n
27 / 49
Mean and Variance of y
28 / 49
Chapter 2: Weak Law of Large Numbers and the Central
Limit Theorem
29 / 49
The sampling distribution of y
I We said that only if we know that Y1 , ..., Yn are each normally distributed that
we can say that we know the distribution of y . Namely:
" #
σY2
y ∼ N µY ,
n
I If we don’t know the exact distribution for each Y1 , ..., Yn we can only derive
two characteristics of the distribution of y , namely its mean and its variance:
E[y ] = µY
σY2
Var [y ] = σy2 =
n
I But if we want to make statements about a likely range within y falls we know
more than E[y ] and Var [y ]. We need to know the (sampling) distribution of y
30 / 49
The asymptotic distribution of y
I There are two remarkable result in probability theory which allow us to learn
more about how y if we assume that the sample becomes very large: n → ∞
31 / 49
The Law of Large Numbers
I The first of these results is the (weak) law of large numbers which says that
if the sample size is large, y will be very close to µY with high probability.
32 / 49
Central Limit Theorm
I The second of these results is the central limit theorm (CLT). It says that ,
when the samplesize is large, the sampling distribution of the standardized
sample average y −µ
σ
Y
is approximately normal.
y
I Remarkably the CLT holds independent of how the random variables Y are
distributed.
I This result will allow us to perform hypothesis tests and make statements
about how certain we are about results obtained from our sample!
33 / 49
The CLT in Action
I Let’s carry out a small Monte Carlo experiment to see whether this is all true!
I The setting:
I Draw a random sample of size n from a uniform distribution over the interval
[1, 100] (for that type of distribution we have µ = 50.5 and σ 2 = 816.75) and
compute the standardized sample average.
I generate 10,000 draws (samples) and see how the standardized sample
averages are distributed
I So a practical question is how large should each of these 10,000 samples be?
What should we choose for n? Let’s start with n=5.
34 / 49
10,000 samples, each of size n = 5
35 / 49
10,000 samples, each of size n = 20
36 / 49
10,000 samples, each of size n = 50
37 / 49
What happens if we sample from a very uneven distribution
38 / 49
Richest 100 observations from a Pareto vs Normal Distribution
39 / 49
Richest 100 observations from a Pareto vs Normal Distribution
40 / 49
10,000 samples, each of size n = 5
41 / 49
10,000 samples, each of size n = 20
42 / 49
10,000 samples, each of size n = 100
43 / 49
10,000 samples, each of size n = 2000
44 / 49
Central Limit Theorem
45 / 49
Further Thoughts on Sampling
46 / 49
Sampling Problems
I Selection bias results when a subset of the units in the population is
less/more likely to be included in the sample (and the research design does
not or cannot take this into account)
I pure internet surveys
I survivorship bias (in firm data)
I differential non-response bias results for example when the probability of
participating in a survey is systematically linked to some characteristics of the
units in the population. (wealthy households less likely to participate in surveys
on household finances)
I population characteristics
I e.g.: the true proportion intending to vote for a certain party in the next election
(can be more complicated like an average treatment effect)
I it is unknown and we try to estimate it from the population
I sample characteristics
I values we obtain from the sample (like the proportion of people intending to vote
for X)
I Use it to make statements about the population (statistical significance of an
effect, margin of error around voting intention we observe in the sample)
If your sample is biased (see above) your conclusions about the population will be
biased!
49 / 49