Summary Week 2
Summary Week 2
SAMPLING
CONFIDENCE INTERVALS FOR m
Ana Amaro
TOPICS
• SAMPLING (our environment) vs CENSUS
•A - Technical language
• B - Confidence Intervals
A - TECHNICAL LANGUAGE
some… a recap from Statistics I or bachelors
1. Data/Variable types
2. Random variables
3. Distribution
4. Statistical Population vs Sample
5. Sample size (n) vs Statistical Population size (N or infinit)
6. Parameters and Estimates
7. Estimators
1. VARIABLE TYPES
• Metric OR Quantitative
• measured with numbers
• Continuous
• they have an infinite number of values between any two values
• e.g. Weight
• Discrete
• they are obtained by counting
• e.g. # siblings
• Categorical OR Qualitative
• measured with two or more categories
• Nominal
• no intrinsic categories’ ordering
• e.g. Gender
• Ordinal
• The categories are intrinsically ordered
• e.g. Perception
2. RANDOM VARIABLES
• A random variable is a
• variable such that ‘next value’ is unknown
• e.g.
• the weight of a person is a random variable (we could name it Weight or even X)
• when you put yourself on the top of the scale you are not 100% sure of the outcome
• The size
• of the sample (n) is always less than
• the size of the statistical population (N or infinit).
5. PARAMETERS AND ESTIMATES
(FOR A SPECIFIC RANDOM VARIABLE)
• PARAMETERS
• statistical population characteristics of interest
• e.g
• the mean weight of European Women (µ)
• the standard deviation of weight of European Women (σ)
• the proportion of Women in Europe (π)
e.g. the mean computed with the sample mean is an estimate of the
statistical population mean
6. ESTIMATORS
• Rules for calculating an estimate of a given parameter
• e.g
• ഥ = 1 σ𝑛𝑖=1 X𝑖
X 𝑛
1 𝑛
• 𝑠2 = 𝑖=1 X𝑖 − 𝑋ത 2
𝑛−1
1
𝑁 X SAMPLING
µ = 𝑥𝑖
𝑁
𝑖=1
𝑁 𝑛
1 𝑛
2 2 1 1
𝜎 = 𝑥𝑖 − µ ഥ = X𝑖
X xത = x𝑖
𝑁
𝑖=1 𝑋ത 𝑛
𝑖=1 𝑛
𝑛 𝑖=1
𝑛
1 1
S2 = X𝑖 − 𝑋ത 2
s2 = x𝑖 − xത 2
Parameters S2 𝑛−1
𝑖=1 𝑛−1
𝑖=1
Estimators Estimates
Random Variables!!!
B - CONFIDENCE INTERVALS FOR µ
𝑛
1
xത = x𝑖
1. Confidence level vs level of significance 𝑛
𝑖=1
𝑛
1
2. Sampling distributions: Normal / t-Student s2 =
𝑛−1
x𝑖 − xത 2
𝑖=1
3. Central Limit Theorem Estimates
4. Confidence Intervals
a. standard error of the mean (for infinite and finite statistical population)
b. margin of error
c. lower confidence limit (LCL) and upper confidence limit (UCL)
d. width/range
1
𝑛 1. CONFIDENCE LEVEL (1-a)x100%
xത =
𝑛
x𝑖
𝑖=1
VS
LEVEL OF SIGNIFICANCE 0 a1
• We will compute a RANGE of numbers where m CAN most likely fit
• BUT we will not be sure…
• we SAMPLE from X 𝜎
• ഥ 𝑖𝑠 𝑎𝑙𝑚𝑜𝑠𝑡 𝑁 𝜇;
If n is at least 30 then 𝐗
• otherwise 𝑛
ഥ distribution format
• nothing can be said about 𝐗
4. CONFIDENCE INTERVALS
mandatory ഥ ~𝑁
𝑿
• ഥ
Standard Error of the mean = Standard Deviation of 𝐗
𝝈 ഥ~𝑁 𝜇;
𝝈 𝑵−𝒏
ഥ~𝑁 𝜇;
𝐗 𝐗
𝒏 𝒏 𝑵−𝟏
• Margin of Error
𝜎 for finite statistical populations
𝑀𝐸 = 𝒛𝜶Τ𝟐
𝑛 𝒛𝜶Τ𝟐
• Estimate 𝒕𝒏−𝟏;𝜶Τ𝟐
the reliability factor
𝑠
𝑀𝐸 = 𝒕𝒏−𝟏;𝜶Τ𝟐
𝑛
• LCL = 𝑥ҧ − 𝑀𝐸 and UCL = 𝑥ҧ + 𝑀𝐸
• Range = UCL-LCL=2ME
A summary to support the Margin of Error computation So IF the GOAL is to compute a Confidence Interval for μ (the mean value of X), with a specific confidence level,
I need to compute the Margin of Error.
I NEED a SAMPLE (randomly selected from the statistical population of the X random variable) and
𝑛
with the sample I will compute 1
xത = x𝑖 • the mean value, xത
𝑛 • the variance, s2 (and its square root, s)
𝑖=1
𝑛
1 both estimates of the parameters μ, σ2 and σ
s2 = x𝑖 − xത 2
𝑛−1
𝑖=1
• if X is Normally distributed
• 𝐗 ഥ is also Normally distributed
• if σ is known (not likely to occur) I will use a z-score
• if σ is UNknown I will use s instead and a t-score
• mandatory for small samples (n<30)
• for big samples as the format of a t-Student with many degrees of freedom is close to a Standard
Normal I CAN use a z-score (as it is simpler) instead the t-score
•if X NOT Normally distributed (e.g. a Bernoulli or something else)
• WE NEED a BIG SAMPLE (n>30) so that the Central Limit Theorem applies:
ഥ is aproximately Normally distributed (this is enough to proceed!)
𝐗
• if σ is known (again, not likely to occur) I will use a z-score
• if σ is UNknown I will also use a z-score