002 Probability-and-Statistics-Part-4-Statistics
002 Probability-and-Statistics-Part-4-Statistics
PART 4 - STATISTICS
Statistics
What is Statistics?
● A parameter is a characteristic of a
population. Often we want to understand
parameters.
● A statistic is a characteristic of a sample.
Often we apply statistical inferences to the
sample in an attempt to describe the
population.
Variable
Selection Bias
Undercoverage Bias: making too few
observations or omitting entire
segments of a population
Sampling Bias
Selection Bias
Self-selection Bias: people who
volunteer may differ significantly from
those in the population who don’t
Sampling Bias
Selection Bias
Healthy-user Bias: the sample may come
from a healthier segment of the overall
population – people who walk/jog, work
outside, follow healthier behaviors, etc.
Undercoverage Bias
Survivorship Bias
If a population improves over time,
it may be due to lesser members leaving
the population due to death, expulsion,
relocation, etc.
A Classic Puzzle
● At the start of World War I, British soldiers
wore cloth caps.
● The war office
became alarmed
at the high number
of head injuries, so
they issued metal
helmets to all soldiers.
A Classic Puzzle
● Random
● Stratified Random
● Cluster
Random Sampling
stratum
strata
Stratified Random Sampling Example
𝑥ҧ 𝑥ҧ 𝑥ҧ 𝑥ҧ
𝑥ҧ 𝑥ҧ 𝑥ҧ 𝑥ҧ 𝑥ҧ 𝑥ҧ
Central Limit Theorem
Population mean
𝑥ҧ 𝑥ҧ 𝜇 = 3.5
𝑥ҧ 𝑥ҧ As we collect multiple samples,
𝑥ҧ 𝑥ҧ each mean will fall somewhere
close to the population mean
𝑥ҧ 𝑥ҧ 𝑥ҧ 𝑥ҧ
𝑥ҧ 𝑥ҧ 𝑥ҧ 𝑥ҧ
𝑥ҧ 𝑥ҧ 𝑥ҧ 𝑥ҧ 𝑥ҧ 𝑥ҧ
Proof of CLT Available on Wikipedia
For those who are curious, the full proof of
the Central Limit Theorem is available at
https://en.wikipedia.org/wiki/Central_limit_theorem
Standard Error
Standard Error
POPULATION = 10,000
𝑁 = # population members
𝑃 = population parameter
𝜎 = pop. standard deviation
SAMPLE
= 100
𝑛 = # sample members
𝑝Ƹ = sample statistic
𝑆𝐸𝑝ො = standard error of the
sample
Standard Error POPULATION = 10,000
SAMPLE
● If for the population of Australia, = 100
−2𝑆𝐸𝑝ො 𝑝Ƹ 2𝑆𝐸𝑝ො
confidence
interval
P
Point Estimators
𝐻1 : ≠ 𝑛𝑢𝑙𝑙
𝑍 = −𝟏. 𝟗𝟔 𝑍 = 𝟏. 𝟗𝟔
0.025 0.025
Tests of Mean vs. Proportion
● Mean
when we look to find an average, or
specific value in a population we are
dealing with means
● Proportion
whenever we say something like "35%" or
“most” we are dealing with proportions
Test Statistics
In a P-value test:
● take the test statistic
● use it to determine the P-value
● compare the P-value to the
level of significance 𝛼
Hypothesis Testing – P-value Test
● One-sample t-test
Tests the null hypothesis that the
population mean is equal to a specified
value 𝜇 based on a sample mean 𝑥ҧ
Types of Student’s t-test
(repeated measurements)
o two samples have been matched or
"paired"
One-Sample Student’s t-test
● Compare to a t-score
𝑡 ≶ 𝑡𝑛−1,𝛼
𝑡 = t-statistic
𝑡𝑛−1,𝛼 = t-critical
𝑛 − 1 = degrees of freedom
𝛼 = significance level
Independent Two-Sample t-test
variance
Independent Two-Sample t-test
𝑑𝑓 = 𝑛1 + 𝑛2 − 2
Student’s t-Distribution
Z-Distribution
t-Distribution
Student’s t-Distribution
Z-Distribution
t-Distribution
Student’s T-Distribution
Example Exercise
Student’s t-test Example