ECO 313A - 2025 - 01 Inference Distributions and CIs
ECO 313A - 2025 - 01 Inference Distributions and CIs
• Eish – what does it mean? It means I sum up the values of X (eg the
ages), and then divide by the number of observations.
• So 21 + 25 + 24 + 22 + 22 + 24 + 23 = 161
• 161 / 7 = 23 = ഥ
X
10
Probability distribution of age
11
Now look at them together…
Basically the
same picture…
How can we use the probability distribution?
Suppose I select
someone from the class
at random…. What is:
• Prob(Age = 21)?
• Prob(Age = 23)?
• Prob(Age = 20)?
13
What about Prob(22 ≤ Age ≤ 24)? In other words, the probability
that someone selected at random is 22 or 23 or 24 years old)
14
What about Prob(22 ≤ Age ≤ 24)? In other words, the probability
that someone selected at random is 22 or 23 or 24 years old)
18
Uniform distribution – example 1
21
The question of inference
• Suppose you want to know the mean age of every person who
lives in Alice
– Alice is your population, therefore…
– you want to know the population mean, μ (‘mu’)
• There are 15 143 people (supposedly)
• How are you going to do it?
• What are your options?
22
The question of inference…
• Basically there are three options:
• First, you can collect data from everyone (i.e. conduct a census)
so that you can calculate the population mean, μ, directly.
We can write:
N
1
μ = Xi
N
i=1
where N represents the size of the population. (Compare this to our
formula for the sample mean:
n
1
ഥ
X = Xi )
n
i=1
23
• Second, we can stop one person on the street and ask him/her
his/her age. Problem – how ‘representative’ is one person?
• Third, we can draw a sample, calculate the sample mean, then
declare that our sample mean is an estimate of the (unknown)
population mean, ie X ഥ is an estimate of μ.
• (You can see that the so-called second option is a special case
of the 3rd option in which n=1)
• The big hairy question in statistics is, how good an estimate is
𝑋ത of μ?
– However, we don’t yet have the tools to answer that…
24
The standard
Standardising
variable (2)
distributed
a normally
normal
distribution
(1)
Confidence
intervals
etc.
distributions
Sampling
The Central
(3)
Limit
Theorem (4)
25
The normal distribution
• The normal distribution plays a central role in the answer, but we
have to get there through a circuitous route.
• First, the so-called normal distribution is actually a family of
distributions which follow a common distribution function:
1 − 𝑋−𝜇 2 /2𝜎2
𝑓 𝑋 = ∙𝑒
𝜎 2𝜋
• Don’t worry about the detail or try to ‘understand’ this function. All
it means is that if you have a random variable X which is ‘normally
distributed’ with mean μ and variance σ2, then f(X) tells you the
height of the graph for different possible values of X.
26
Pause: but where does the normal distribution ‘come from’?
[see Lane, n.d. (on BlackBoard)]
• Different explanations, eg the normal distribution “occurs naturally
in many situations”. An early case was measurement errors in
astronomy, which Galileo (17th c.) studied and characterised.
• Abraham de Moivre, an “18th century statistician and consultant to
gamblers”, in effect discovered the idea of the Central Limit Theorem
(see below), and then sought a formula that closely matched what
he observed.
• Independently, mathematicians Ardrian and Gauss developed
formulae for the normal distribution, in 1808 and 1809, respectively.
• However, why some variables in the world follow a normal
distribution is somewhat contentious (see e.g. Lyon, 2014).
27
Welcome to the family…
28
• In general we write 𝑋~ℕ(𝜇, 𝜎 2 )
• Which we read as ‘X follows a normal distribution with mean μ
and variance σ2 ’.
• (The convention is that, for any distribution, the Greek letter μ
(‘mu’) represents the population mean, and the Greek letter σ
(‘sigma’), when squared, represents the population variance.)
• So we can imagine an infinite number of normal distributions
by supposing different values of μ and σ2, eg 𝑋1 ~ℕ(4, 5),
𝑋2 ~ℕ(−250, 12), 𝑋3 ~ℕ(78, 99), etc. etc. etc.
29
The Standard Normal Distribution
• But there is one member of the family of normal distributions that is
regarded as special, namely the ‘standard normal distribution’, which
is X~ℕ(0, 1), that is, where the mean of X is 0 and the variance is 1.
• The convention is to write 𝐙~ℕ(0, 1)
30
• A few useful facts about normal distributions:
– They are infinite in both directions
– They are symmetric
– The total area ‘under them’ is 1 (as with any probability
distribution, by definition)
• We can apply these to the standard normal distribution…
31
Prob(-∞ ≤ Z ≤ ∞) = ? Prob(0 ≤ Z ≤ ∞) = ?
37
• Or in respect of the die, what is Prob(3 ≤ x ≤ 5)?
Prob(0 ≤ Z ≤ 1.51) = ?
40
41
In other words, the
probability that a
std normal variable
Z is between 0 and
1.51 is 0.4345.
42
In other words, the
probability that a
std normal variable
Z is between 0 and
1.51 is 0.4345.
Prob(0 ≤ Z ≤ 1.51) =
0.4345
43
• Let’s mess around a bit. If this is true, what
about…
Prob(Z ≤ 1.51) = ?
Prob(Z ≥ 1.51) = ?
44
• Let’s mess around a bit. If this is true, what
about…
Prob(Z ≤ 1.51) =
0.4345 + 0.5 =
0.9345
Prob(Z ≥ 1.51) = ?
45
• Let’s mess around a bit. If this is true, what
about…
Prob(Z ≤ 1.51) =
0.4345 + 0.5 =
0.9345
Prob(Z ≥ 1.51) =
1 – 0.9345 =
0.5 – 0.4345 =
0.0646
46
Let’s try one more, but the other way around...
• Prob(-k ≤ Z ≤ k) = 0.95; in other words we want to find the value k (and
thus -k) such that there is a 95% chance that a randomly selected value
Z is between -k and k.
47
• Given how our table is constructed, we change this into an
equivalent question:
Prob(0 ≤ Z ≤ k) = 0.95/2 = 0.475
48
Now we work
backwards; we find
the value in the table
as close as possible to
0.475, and determine
the value of k from
that…
49
Now we work
backwards; we find
the value in the table
as close as possible to
0.475, and determine
the value of k from
that, i.e. 1.96
50
• So we can write: Prob(-1.96 ≤ Z ≤ 1.96) = 0.95
51
‘Standardising’ a normally distributed variable
• Suppose you have a variable X that is normally distributed X~ℕ(3,14)
• The problem is, we can’t determine Prob(0 ≤ X ≤ 1.42) because there is
no probability table for this distribution, nor for X~ℕ(−2, 5), nor for
X~ℕ 301, 7.5 , etc., etc.
• However, if we know the mean and variance of X, in principle we can
standardise the distribution of X to create a standard normal variable.
This is how:
X − μX
= Z~ℕ(0,1)
σX
(But what is this σX thing? It’s the ‘standard deviation’ of X = the square
root of the variance of X.)
• Standardisation will become very useful to us in a short while….
52
ഥ
The Sampling Distribution of 𝐗
• Apart from being the sample mean, X ഥ is a random variable!
Why? Because every time I draw a different sample I end up
ഥ.
with a different X
ഥ
• So for a given sample size n, one can imagine that, like X, X
has its own distribution. We call this a ‘sampling distribution’.
• What may not be at all obvious is that the sampling
distribution may look very different from the distribution of X
itself.
53
• Let’s recall our frequency distribution of ECO 313 students’ Age:
• What if I draw 5000 samples of size n=5 from this distribution, and
calculate a ഥ
X for each one?
54
ഥ, n=5
Sampling distribution of X
55
ഥ, n=15
Sampling distribution of X
56
ഥ, n=30
Sampling distribution of X
57
Original distribution of Age Sampling distribution of mean Age, n=5
Sampling distribution of mean Age, n=30 Sampling distribution of mean Age, n=15
• The larger the sample size, the less the sampling distribution
ഥ resembles the original distribution of Age, and the
of X
more it resembles … (wait for it…) a normal distribution.
• What is not so visible from these graphs is that the variance
of the sampling distribution declines as the sample size goes
up:
Sample size (n) 𝐕𝐚𝐫(ഥ
X)
5 2.31
15 0.61
30 0.22
These two features help explain why, if you can afford it,
you would prefer a larger sample size!
59
The Central Limit Theorem
• As the sample size increases, the distribution of the sample mean
(ഥ
X) approaches (‘→’) the normal distribution with mean μX and
variance σX 2/n. Or,
σ2X
ഥ
X → ℕ(μX , )
n
• [Hmm, why all this fuss about the sample mean? Because the same
principles apply to econometric estimates, which we’ll get to
soon…]
• [For a proof of the CLT, see Filmus, 2010 (on BlackBoard).]
60
That’s nice, but recall, we only have probability tables for
the standard normal distribution!
• True, but just as we standardised a normally distributed
variable, we can standardise an almost-normally distributed
sample mean.
• Perform the same trick as before – subtract the mean (in this
ഥ), and divide by the standard deviation (again, of X
case of X ഥ):
(𝑋ത − 𝜇𝑋ത ) (𝑋ത − 𝜇𝑋 ) 𝑛
= = (𝑋ത − 𝜇𝑋 ) ∙
𝜎𝑋ത 𝜎𝑋 Τ 𝑛 𝜎𝑋
= 𝑍ഥ → ℕ(0, 1)
61
• Question: the previous slide implied that 𝜇𝑋ത = 𝜇𝑋 , but
𝜎𝑋ത ≠ 𝜎𝑋 ; what’s going on?
62
Putting it all together – Confidence Intervals!
• Recall that Prob(-1.96 ≤ Z ≤ 1.96) = 0.95
• This holds true approximately for our Zത because our Zത only
approximates the standard normal distribution, so:
0.95 ≈ Prob(-1.96 ≤ Zത ≤ 1.96)
• Now, let’s substitute:
n
ഥ − μX ) ∙
0.95 ≈ Prob(-1.96 ≤ (X ≤ 1.96)
σX
63
• And now we simplify:
n
0.95 ≈ Prob(−1.96 ≤ (ഥ
X − μX ) ∙ ≤ 1.96)
σX
σX σX
= Prob(−1.96 ∙ ഥ
≤ X − μX ≤ 1.96 ∙ )
n n
σX σX
= Prob(−ഥ
X − 1.96 ∙ ≤ −μX ≤ −ഥ
X + 1.96 ∙ )
n n
σX σX
= Prob(ഥ
X + 1.96 ∙ ≥ μX ≥ ഥ
X − 1.96 ∙ )
n n
σX σX
= Prob(ഥ
X − 1.96 ∙ ≤ μX ≤ ഥ
X + 1.96 ∙ )
n n
≈ 0.95
64
• And now we simplify:
n
0.95 ≈ Prob(−1.96 ≤ (ഥ
X − μX ) ∙ ≤ 1.96)
σX
σX σX
= Prob(−1.96 ∙ ഥ
≤ X − μX ≤ 1.96 ∙ )
n n
σX σX
= Prob(−ഥ
X − 1.96 ∙ ≤ −μX ≤ −ഥ
X + 1.96 ∙ )
n n
σX σX
= Prob(ഥ
X + 1.96 ∙ ≥ μX ≥ ഥ
X − 1.96 ∙ )
n n
σX σX
= Prob(ഥ
X − 1.96 ∙ ≤ μX ≤ ഥ
X + 1.96 ∙ )
n n
≈ 0.95 But please note: in practice you don’t need to do this
derivation over and over again. All you need to do is apply
the formula at the end. 65
• But what does it mean?
σX σX
ഥ ഥ
Prob X − 1.96 ∙ ≤ μX ≤ X + 1.96 ∙ ≈ 0.95
n n
67
Brief aside for future use…
• Question 1: If ‘a’ is a constant, and μX is the mean of X, then
what is the mean of aX?
• Answer:
𝑁
1 1
𝜇𝑎𝑋 = 𝑎𝑋𝑖 = 𝑎𝑋1 + 𝑎𝑋2 + ⋯ + 𝑎𝑋𝑁
𝑁 𝑁
𝑖=1
1
= 𝑎 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁
𝑁
𝑁
1
= 𝑎 ∙ 𝑋𝑖 = 𝑎𝜇𝑋 . (𝐿𝑖𝑘𝑒 𝑑𝑢ℎ!)
𝑁
𝑖=1
68
• Question 2: If ‘a’ is a constant, and 𝜎𝑋2 is the variance of X,
then what is the variance of aX?
• Answer:
𝑛
2
1
Define 𝜎𝑎𝑋 = (𝑎𝑋𝑖 − 𝑎𝜇)2
𝑁
𝑖=1