0% found this document useful (0 votes)
15 views69 pages

ECO 313A - 2025 - 01 Inference Distributions and CIs

The document covers fundamental concepts in statistics, including variables, distributions, and inference. It explains how to calculate descriptive statistics such as mean and variance, and introduces the normal distribution and its significance in statistical inference. Additionally, it discusses the Central Limit Theorem and confidence intervals, emphasizing their importance in estimating population parameters from sample statistics.

Uploaded by

veliswadetsha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views69 pages

ECO 313A - 2025 - 01 Inference Distributions and CIs

The document covers fundamental concepts in statistics, including variables, distributions, and inference. It explains how to calculate descriptive statistics such as mean and variance, and introduces the normal distribution and its significance in statistical inference. Additionally, it discusses the Central Limit Theorem and confidence intervals, emphasizing their importance in estimating population parameters from sample statistics.

Uploaded by

veliswadetsha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

ECO 313A

Variables, distributions, inference, and confidence intervals


– What is a variable?
– Means, variances and distributions
– Theoretical probability distributions
– Inference – using sample stats to estimate population parameters
• Normal / standard normal distributions
• Standardisation of a normally distributed variable
• Sampling distributions
• The Central Limit Theorem
• Confidence intervals
1
What is a variable?
• Some defined quality/condition that varies between one person and another, one country
and another, over time, etc.
– The price of bread?
– Household income?
– Personality?
– The number of kilos in a ton?
– People’s age?
– ECO 313 students’ age?
– Gross domestic product?
– Kg of fertilizer used per hectare?
– The distance between East London and KWT?
– Well-being?
• It has to vary, and it has to be defined/definable….
• Also note: different types of variables – gender, number of children, vs income [i.e.
categorical, discrete, continuous; ordinal, cardinal…] 2
What is a variable?
• Some defined quality/condition that varies between one person and another, one country
and another, over time, etc.
– The price of bread? √
– Household income? √
– Personality? √? X?
– The number of kilos in a ton? X
– People’s age? √
– ECO 313 students’ age? √
– Gross domestic product? √
– Kg of fertilizer used per hectare? √
– The distance between East London and KWT? X
– Well-being? √? X?
• It has to vary, and it has to be defined/definable….
• Also note: different types of variables – gender, number of children, vs income [i.e.
categorical, discrete, continuous; ordinal, cardinal…] 3
How can we ‘know a variable’?
• We can look at its values
• We can calculate its average mean value
• We can calculate its variance
• … and other ‘descriptive statistics’….
• Broadly, we can know/measure/estimate its distribution

• But to begin, we need to look at some of the values of that


variable….
4
Suppose our variable of interest is the Age of
students in ECO 313…
• We can imagine a table:
Observation Age
1 ?
2 ?
3 ?
4 ?
5 ?
6 ?
7 ?
5
• Suppose:
Observation Age
1 21
2 25
3 24
4 22
5 22
6 24
7 23
6
And we can calculate some descriptive statistics
• The mean:
n
1

X = ෍ Xi
n
i=1

• Eish – what does it mean? It means I sum up the values of X (eg the
ages), and then divide by the number of observations.
• So 21 + 25 + 24 + 22 + 22 + 24 + 23 = 161
• 161 / 7 = 23 = ഥ
X

• But what does it mean????! MEASURE OF ‘CENTRAL TENDENCY’ 7


• And variance?
𝑛
1
𝑆𝑋2 ത 2
= ෍(𝑋𝑖 − 𝑋)
𝑛
𝑖=1

Obs Age Age – AvgAge (Age – AvgAge)2


1 21 -2 4
2 25 2 4
3 24 1 1
4 22 -1 1
5 22 -1 1
6 24 1 1
7 23 0 0
Sum 12

So Var = 12/7 = 1.71; but what does it ‘mean’???! MEASURE OF ‘DISPERSION’8


But we can also summarise using a different
type of table, ie a frequency table
Age Frequency/
count
21 1
22 2
23 1
24 2
25 1

This table is one way of representing the distribution


of age. Another way is graphically…
9
Frequency distribution of age

10
Probability distribution of age

11
Now look at them together…

Basically the
same picture…
How can we use the probability distribution?

Suppose I select
someone from the class
at random…. What is:
• Prob(Age = 21)?
• Prob(Age = 23)?
• Prob(Age = 20)?

13
What about Prob(22 ≤ Age ≤ 24)? In other words, the probability
that someone selected at random is 22 or 23 or 24 years old)

14
What about Prob(22 ≤ Age ≤ 24)? In other words, the probability
that someone selected at random is 22 or 23 or 24 years old)

Prob(22 ≤ Age ≤ 24) = 0.23 + 0.14 + 0.15 = 0.52 15


What about Prob(21 ≤ Age ≤ 42)?

Prob(21 ≤ Age ≤ 42) = ? 16


What about Prob(22 ≤ Age ≤ 42)?

Prob(22 ≤ Age ≤ 42) = 0.23 + 0.14 + 0.15 + …. = ?? 17


‘Theoretical probability distributions’

18
Uniform distribution – example 1

We can write Prob(x=heads) = 0.5; but it so happens that we can


also write Prob(x=tails) = 0.5
But in what sense is this distribution ‘theoretical’? 19
Uniform distribution – example 2

We can write Prob(x=1) = Prob(x=2) = … =1/6


So now what is Prob(3 ≤ x ≤ 5)?
20
Normal distribution (but we’ll come back
to this just now…)

21
The question of inference
• Suppose you want to know the mean age of every person who
lives in Alice
– Alice is your population, therefore…
– you want to know the population mean, μ (‘mu’)
• There are 15 143 people (supposedly)
• How are you going to do it?
• What are your options?

22
The question of inference…
• Basically there are three options:
• First, you can collect data from everyone (i.e. conduct a census)
so that you can calculate the population mean, μ, directly.
We can write:
N
1
μ = ෍ Xi
N
i=1
where N represents the size of the population. (Compare this to our
formula for the sample mean:
n
1

X = ෍ Xi )
n
i=1
23
• Second, we can stop one person on the street and ask him/her
his/her age. Problem – how ‘representative’ is one person?
• Third, we can draw a sample, calculate the sample mean, then
declare that our sample mean is an estimate of the (unknown)
population mean, ie X ഥ is an estimate of μ.
• (You can see that the so-called second option is a special case
of the 3rd option in which n=1)
• The big hairy question in statistics is, how good an estimate is
𝑋ത of μ?
– However, we don’t yet have the tools to answer that…

24
The standard

Standardising
variable (2)
distributed
a normally
normal
distribution
(1)

Confidence
intervals
etc.
distributions
Sampling

The Central
(3)

Limit
Theorem (4)

25
The normal distribution
• The normal distribution plays a central role in the answer, but we
have to get there through a circuitous route.
• First, the so-called normal distribution is actually a family of
distributions which follow a common distribution function:
1 − 𝑋−𝜇 2 /2𝜎2
𝑓 𝑋 = ∙𝑒
𝜎 2𝜋
• Don’t worry about the detail or try to ‘understand’ this function. All
it means is that if you have a random variable X which is ‘normally
distributed’ with mean μ and variance σ2, then f(X) tells you the
height of the graph for different possible values of X.
26
Pause: but where does the normal distribution ‘come from’?
[see Lane, n.d. (on BlackBoard)]
• Different explanations, eg the normal distribution “occurs naturally
in many situations”. An early case was measurement errors in
astronomy, which Galileo (17th c.) studied and characterised.
• Abraham de Moivre, an “18th century statistician and consultant to
gamblers”, in effect discovered the idea of the Central Limit Theorem
(see below), and then sought a formula that closely matched what
he observed.
• Independently, mathematicians Ardrian and Gauss developed
formulae for the normal distribution, in 1808 and 1809, respectively.
• However, why some variables in the world follow a normal
distribution is somewhat contentious (see e.g. Lyon, 2014).
27
Welcome to the family…

28
• In general we write 𝑋~ℕ(𝜇, 𝜎 2 )
• Which we read as ‘X follows a normal distribution with mean μ
and variance σ2 ’.
• (The convention is that, for any distribution, the Greek letter μ
(‘mu’) represents the population mean, and the Greek letter σ
(‘sigma’), when squared, represents the population variance.)
• So we can imagine an infinite number of normal distributions
by supposing different values of μ and σ2, eg 𝑋1 ~ℕ(4, 5),
𝑋2 ~ℕ(−250, 12), 𝑋3 ~ℕ(78, 99), etc. etc. etc.

29
The Standard Normal Distribution
• But there is one member of the family of normal distributions that is
regarded as special, namely the ‘standard normal distribution’, which
is X~ℕ(0, 1), that is, where the mean of X is 0 and the variance is 1.
• The convention is to write 𝐙~ℕ(0, 1)

30
• A few useful facts about normal distributions:
– They are infinite in both directions
– They are symmetric
– The total area ‘under them’ is 1 (as with any probability
distribution, by definition)
• We can apply these to the standard normal distribution…

31
Prob(-∞ ≤ Z ≤ ∞) = ? Prob(0 ≤ Z ≤ ∞) = ?

Prob(-∞ ≤ Z ≤ 0) = ? Prob(0 ≤ Z ≤ 1.51) = ? 32


Prob(-∞ ≤ Z ≤ ∞) = 1 Prob(0 ≤ Z ≤ ∞) = ?

Prob(-∞ ≤ Z ≤ 0) = ? Prob(0 ≤ Z ≤ 1.51) = ? 33


Prob(-∞ ≤ Z ≤ ∞) = 1 Prob(0 ≤ Z ≤ ∞) = 0.5

Prob(-∞ ≤ Z ≤ 0) = ? Prob(0 ≤ Z ≤ 1.51) = ? 34


Prob(-∞ ≤ Z ≤ ∞) = 1 Prob(0 ≤ Z ≤ ∞) = 0.5

Prob(-∞ ≤ Z ≤ 0) = 0.5 Prob(0 ≤ Z ≤ 1.51) = ? 35


Prob(-∞ ≤ Z ≤ ∞) = 1 Prob(0 ≤ Z ≤ ∞) = 0.5

Prob(-∞ ≤ Z ≤ 0) = 0.5 Prob(0 ≤ Z ≤ 1.51) = ?????


36
• Conceptually, this is similar to what we’re doing when we ask
what is Prob(22 ≤ Age ≤ 24)? In effect, we add the ‘areas’ within
the distribution so long as they conform to the condition that
22 ≤ Age ≤ 24:

Prob(22 ≤ Age ≤ 24) =


= 0.23 + 0.14 + 0.15 = 0.5

37
• Or in respect of the die, what is Prob(3 ≤ x ≤ 5)?

Prob(3 ≤ x ≤ 5) = 1/6 + 1/6 + 1/6 = 3/6 = 0.5 38


• One big difference between these normal distributions and the
distributions presented previously (e.g. Age, coin, die) is that this
assumes a continuous variable whereas those assumed a discrete
variable.
• This creates a weird situation. For a discrete distribution, we can ask
what is the probability that a person chosen at random from a
distribution is aged 22, and write Prob(Age=22) = 0.23.
• But for a continuous distribution such as that of Z, Prob(Z=2) = 0.
Why? What about Prob(Z=2.0000000000000001)? See the problem?
There are an infinite number of values that Z could take on between
any two other numbers; if they each have a positive probability, then
the total probability would necessarily exceed 1. By a lot.
• So in general, for a continuous distribution we can ask what is
Prob(0 ≤ G ≤ 0.005), but we cannot ask what is Prob(G = 0.005)….
39
Let’s return to the situation below – how do we answer it?

Prob(0 ≤ Z ≤ 1.51) = ?

• Almost every statistics textbook in the world has a probability table


for the standard normal distribution. The one we use in this course
happens to look as follows:

40
41
In other words, the
probability that a
std normal variable
Z is between 0 and
1.51 is 0.4345.

42
In other words, the
probability that a
std normal variable
Z is between 0 and
1.51 is 0.4345.

Prob(0 ≤ Z ≤ 1.51) =
0.4345

43
• Let’s mess around a bit. If this is true, what
about…

Prob(Z ≤ 1.51) = ?

Prob(Z ≥ 1.51) = ?

44
• Let’s mess around a bit. If this is true, what
about…

Prob(Z ≤ 1.51) =
0.4345 + 0.5 =
0.9345

Prob(Z ≥ 1.51) = ?

45
• Let’s mess around a bit. If this is true, what
about…

Prob(Z ≤ 1.51) =
0.4345 + 0.5 =
0.9345

Prob(Z ≥ 1.51) =
1 – 0.9345 =
0.5 – 0.4345 =
0.0646
46
Let’s try one more, but the other way around...
• Prob(-k ≤ Z ≤ k) = 0.95; in other words we want to find the value k (and
thus -k) such that there is a 95% chance that a randomly selected value
Z is between -k and k.

47
• Given how our table is constructed, we change this into an
equivalent question:
Prob(0 ≤ Z ≤ k) = 0.95/2 = 0.475

48
Now we work
backwards; we find
the value in the table
as close as possible to
0.475, and determine
the value of k from
that…

49
Now we work
backwards; we find
the value in the table
as close as possible to
0.475, and determine
the value of k from
that, i.e. 1.96

50
• So we can write: Prob(-1.96 ≤ Z ≤ 1.96) = 0.95

51
‘Standardising’ a normally distributed variable
• Suppose you have a variable X that is normally distributed X~ℕ(3,14)
• The problem is, we can’t determine Prob(0 ≤ X ≤ 1.42) because there is
no probability table for this distribution, nor for X~ℕ(−2, 5), nor for
X~ℕ 301, 7.5 , etc., etc.
• However, if we know the mean and variance of X, in principle we can
standardise the distribution of X to create a standard normal variable.
This is how:
X − μX
= Z~ℕ(0,1)
σX
(But what is this σX thing? It’s the ‘standard deviation’ of X = the square
root of the variance of X.)
• Standardisation will become very useful to us in a short while….
52

The Sampling Distribution of 𝐗
• Apart from being the sample mean, X ഥ is a random variable!
Why? Because every time I draw a different sample I end up
ഥ.
with a different X

• So for a given sample size n, one can imagine that, like X, X
has its own distribution. We call this a ‘sampling distribution’.
• What may not be at all obvious is that the sampling
distribution may look very different from the distribution of X
itself.

53
• Let’s recall our frequency distribution of ECO 313 students’ Age:

• What if I draw 5000 samples of size n=5 from this distribution, and
calculate a ഥ
X for each one?
54
ഥ, n=5
Sampling distribution of X

55
ഥ, n=15
Sampling distribution of X

56
ഥ, n=30
Sampling distribution of X

57
Original distribution of Age Sampling distribution of mean Age, n=5

What do you observe?

Sampling distribution of mean Age, n=30 Sampling distribution of mean Age, n=15
• The larger the sample size, the less the sampling distribution
ഥ resembles the original distribution of Age, and the
of X
more it resembles … (wait for it…) a normal distribution.
• What is not so visible from these graphs is that the variance
of the sampling distribution declines as the sample size goes
up:
Sample size (n) 𝐕𝐚𝐫(ഥ
X)
5 2.31
15 0.61
30 0.22
These two features help explain why, if you can afford it,
you would prefer a larger sample size!
59
The Central Limit Theorem
• As the sample size increases, the distribution of the sample mean
(ഥ
X) approaches (‘→’) the normal distribution with mean μX and
variance σX 2/n. Or,
σ2X

X → ℕ(μX , )
n

• [Hmm, why all this fuss about the sample mean? Because the same
principles apply to econometric estimates, which we’ll get to
soon…]
• [For a proof of the CLT, see Filmus, 2010 (on BlackBoard).]

60
That’s nice, but recall, we only have probability tables for
the standard normal distribution!
• True, but just as we standardised a normally distributed
variable, we can standardise an almost-normally distributed
sample mean.
• Perform the same trick as before – subtract the mean (in this
ഥ), and divide by the standard deviation (again, of X
case of X ഥ):
(𝑋ത − 𝜇𝑋ത ) (𝑋ത − 𝜇𝑋 ) 𝑛
= = (𝑋ത − 𝜇𝑋 ) ∙
𝜎𝑋ത 𝜎𝑋 Τ 𝑛 𝜎𝑋

= 𝑍ഥ → ℕ(0, 1)
61
• Question: the previous slide implied that 𝜇𝑋ത = 𝜇𝑋 , but
𝜎𝑋ത ≠ 𝜎𝑋 ; what’s going on?

62
Putting it all together – Confidence Intervals!
• Recall that Prob(-1.96 ≤ Z ≤ 1.96) = 0.95
• This holds true approximately for our Zത because our Zത only
approximates the standard normal distribution, so:
0.95 ≈ Prob(-1.96 ≤ Zത ≤ 1.96)
• Now, let’s substitute:
n
ഥ − μX ) ∙
0.95 ≈ Prob(-1.96 ≤ (X ≤ 1.96)
σX

63
• And now we simplify:
n
0.95 ≈ Prob(−1.96 ≤ (ഥ
X − μX ) ∙ ≤ 1.96)
σX
σX σX
= Prob(−1.96 ∙ ഥ
≤ X − μX ≤ 1.96 ∙ )
n n
σX σX
= Prob(−ഥ
X − 1.96 ∙ ≤ −μX ≤ −ഥ
X + 1.96 ∙ )
n n
σX σX
= Prob(ഥ
X + 1.96 ∙ ≥ μX ≥ ഥ
X − 1.96 ∙ )
n n
σX σX
= Prob(ഥ
X − 1.96 ∙ ≤ μX ≤ ഥ
X + 1.96 ∙ )
n n
≈ 0.95
64
• And now we simplify:
n
0.95 ≈ Prob(−1.96 ≤ (ഥ
X − μX ) ∙ ≤ 1.96)
σX
σX σX
= Prob(−1.96 ∙ ഥ
≤ X − μX ≤ 1.96 ∙ )
n n
σX σX
= Prob(−ഥ
X − 1.96 ∙ ≤ −μX ≤ −ഥ
X + 1.96 ∙ )
n n
σX σX
= Prob(ഥ
X + 1.96 ∙ ≥ μX ≥ ഥ
X − 1.96 ∙ )
n n
σX σX
= Prob(ഥ
X − 1.96 ∙ ≤ μX ≤ ഥ
X + 1.96 ∙ )
n n
≈ 0.95 But please note: in practice you don’t need to do this
derivation over and over again. All you need to do is apply
the formula at the end. 65
• But what does it mean?
σX σX
ഥ ഥ
Prob X − 1.96 ∙ ≤ μX ≤ X + 1.96 ∙ ≈ 0.95
n n

• It means that we are approximately 95% certain


that the true, unknown population mean, is
within the interval indicated, i.e.
𝛔𝐗 𝛔𝐗
ഥ − 𝟏. 𝟗𝟔 ∙
𝐗 ഥ
𝐗 ഥ + 𝟏. 𝟗𝟔 ∙
𝐗
𝐧 𝐧

Approx 95% confident that 𝛍𝐗 is here

Finally! This is how we answer the question of how good 𝑿 ഥ


is as an estimate of 𝝁𝑿 ! This also helps explain why you
would tend to want a large n…. (How so?) 66
Question: how do we know if our ഥX’s & Zത ’s approach a
normal distribution ‘enough’?
• Generally we do not know; remember, we usually don’t observe
the sampling distribution, we simply know that our particular
sample must come from such a distribution.
• There is a common rule of thumb in statistics that a minimum
sample size should be n = 30, implying that this is enough for the
benefits of the Central Limit Theorem to kick in; but this does not
have a solid theoretical basis.
• However, when we get to econometrics, we will discuss a couple of
ways of determining whether the sampling distribution is
adequately close to ‘normal’.

67
Brief aside for future use…
• Question 1: If ‘a’ is a constant, and μX is the mean of X, then
what is the mean of aX?
• Answer:
𝑁
1 1
𝜇𝑎𝑋 = ෍ 𝑎𝑋𝑖 = 𝑎𝑋1 + 𝑎𝑋2 + ⋯ + 𝑎𝑋𝑁
𝑁 𝑁
𝑖=1
1
= 𝑎 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁
𝑁
𝑁
1
= 𝑎 ∙ ෍ 𝑋𝑖 = 𝑎𝜇𝑋 . (𝐿𝑖𝑘𝑒 𝑑𝑢ℎ!)
𝑁
𝑖=1
68
• Question 2: If ‘a’ is a constant, and 𝜎𝑋2 is the variance of X,
then what is the variance of aX?
• Answer:
𝑛
2
1
Define 𝜎𝑎𝑋 = ෍(𝑎𝑋𝑖 − 𝑎𝜇)2
𝑁
𝑖=1

𝑁𝑜𝑡𝑒: (𝑎𝑋1 − 𝑎𝜇)2 = 𝑎2 𝑋12 + 𝑎2 𝜇2 − 2𝑎2 𝑋1 𝜇


= 𝑎2 (𝑋1 − 𝜇)2
𝑛 𝑛
2
1 𝑎2
So 𝜎𝑎𝑋 = ෍ 𝑎2 (𝑋𝑖 − 𝜇)2 = ෍(𝑋𝑖 − 𝜇)2 = 𝑎2 𝜎𝑋2
𝑁 𝑁
𝑖=1 𝑖=1
69

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy