0% found this document useful (0 votes)
4 views17 pages

Summary Week 2

The document covers key concepts in sampling and confidence intervals, including definitions of data types, random variables, and the distinction between statistical populations and samples. It explains the importance of parameters and estimates, along with the rules for calculating estimates (estimators). Additionally, it discusses confidence intervals for the mean, including confidence levels, sampling distributions, and the Central Limit Theorem.

Uploaded by

manelpaismelo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views17 pages

Summary Week 2

The document covers key concepts in sampling and confidence intervals, including definitions of data types, random variables, and the distinction between statistical populations and samples. It explains the importance of parameters and estimates, along with the rules for calculating estimates (estimators). Additionally, it discusses confidence intervals for the mean, including confidence levels, sampling distributions, and the Central Limit Theorem.

Uploaded by

manelpaismelo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

WEEK 2

SAMPLING
CONFIDENCE INTERVALS FOR m
Ana Amaro
TOPICS
• SAMPLING (our environment) vs CENSUS
•A - Technical language
• B - Confidence Intervals
A - TECHNICAL LANGUAGE
some… a recap from Statistics I or bachelors
1. Data/Variable types
2. Random variables
3. Distribution
4. Statistical Population vs Sample
5. Sample size (n) vs Statistical Population size (N or infinit)
6. Parameters and Estimates
7. Estimators
1. VARIABLE TYPES
• Metric OR Quantitative
• measured with numbers
• Continuous
• they have an infinite number of values between any two values
• e.g. Weight
• Discrete
• they are obtained by counting
• e.g. # siblings

• Categorical OR Qualitative
• measured with two or more categories
• Nominal
• no intrinsic categories’ ordering
• e.g. Gender
• Ordinal
• The categories are intrinsically ordered
• e.g. Perception
2. RANDOM VARIABLES
• A random variable is a
• variable such that ‘next value’ is unknown
• e.g.
• the weight of a person is a random variable (we could name it Weight or even X)
• when you put yourself on the top of the scale you are not 100% sure of the outcome

• function that assigns values to each of an experiment's outcomes


(unknown outcome, also)
• e.g.
• X = f(the outcome of tossing a coin)
• If you get “head” then X=1
• If you get “tail” then X=2
3. DISTRIBUTION
(FOR A SPECIFIC RANDOM VARIABLE)
• is a function or a listing that shows the random variable
• possible values and
• how often they occur.
4. STATISTICAL POPULATION VS SAMPLE
• The Statistical Population is the entire set of values that a
specific RANDOM VARIABLE can assume;
• it is the entire pool of values from which the sample is drawn

• The Sample is a subset of the Statistical Population


• The Sample is selected RANDOMLY from the Statistical Population.

• The size
• of the sample (n) is always less than
• the size of the statistical population (N or infinit).
5. PARAMETERS AND ESTIMATES
(FOR A SPECIFIC RANDOM VARIABLE)
• PARAMETERS
• statistical population characteristics of interest
• e.g
• the mean weight of European Women (µ)
• the standard deviation of weight of European Women (σ)
• the proportion of Women in Europe (π)

• ESTIMATES (SAMPLE STATISTIC)


• the corresponding characteristic of interest computed with the SAMPLE
• the mean weight of European Women (ഥ 𝒙)
• the standard deviation of weight of European Women (s)
• the proportion of Women in Europe (p)

e.g. the mean computed with the sample mean is an estimate of the
statistical population mean
6. ESTIMATORS
• Rules for calculating an estimate of a given parameter
• e.g
• ഥ = 1 σ𝑛𝑖=1 X𝑖
X 𝑛
1 𝑛
• 𝑠2 = ෌𝑖=1 X𝑖 − 𝑋ത 2
𝑛−1

• These Rules are usually described mathematically BUT they


can be described in regular language
• They are RANDOM VARIABLES
IN SHORT AND CONSIDERING X

1
𝑁 X SAMPLING
µ = ෍ 𝑥𝑖
𝑁
𝑖=1
𝑁 𝑛
1 𝑛
2 2 1 1
𝜎 = ෍ 𝑥𝑖 − µ ഥ = ෍ X𝑖
X xത = ෍ x𝑖
𝑁
𝑖=1 𝑋ത 𝑛
𝑖=1 𝑛
𝑛 𝑖=1
𝑛
1 1
S2 = ෍ X𝑖 − 𝑋ത 2
s2 = ෍ x𝑖 − xത 2
Parameters S2 𝑛−1
𝑖=1 𝑛−1
𝑖=1

Estimators Estimates
Random Variables!!!
B - CONFIDENCE INTERVALS FOR µ
𝑛
1
xത = ෍ x𝑖
1. Confidence level vs level of significance 𝑛
𝑖=1
𝑛
1
2. Sampling distributions: Normal / t-Student s2 =
𝑛−1
෍ x𝑖 − xത 2

𝑖=1
3. Central Limit Theorem Estimates
4. Confidence Intervals
a. standard error of the mean (for infinite and finite statistical population)
b. margin of error
c. lower confidence limit (LCL) and upper confidence limit (UCL)
d. width/range
1
𝑛 1. CONFIDENCE LEVEL (1-a)x100%
xത =
𝑛
෍ x𝑖
𝑖=1
VS
LEVEL OF SIGNIFICANCE 0 a1
• We will compute a RANGE of numbers where m CAN most likely fit
• BUT we will not be sure…

• That “RANGE” of values will be called


• CONFIDENCE INTERVAL for m and
• A “trust” measure will need to be assigned to this RANGE
• We will call it CONFIDENCE LEVEL (from 0% to 100%)
• A good one: above 90% (below 90% is poor)
• 95% the most common

• For a relevant number of reasons


• Let us describe the level of confidence as (1-a)x100%
• a is probability
2. SAMPLING DISTRIBUTIONS
• X distribution can be Normal OR not Normal
• either case X has a mean value (mX) and a standard deviation (sX)

• ഥ is our NEW random variable with


and 𝐗
𝑛
1 • m𝑋ത = mX
ഥ=
𝐗 ෍ X𝑖 𝜎𝑋
𝑛 • 𝜎𝑋ത =
𝑛
𝑖=1
𝜎
• ഥ~𝑁 𝜇;
and we need it to be NORMAL 𝐗
𝑛
ഥ−𝜇
𝐗
𝑛 ~ 𝑁(0,1)
𝜎
ഥ−𝜇
𝐗
𝑛 ~tn−1
𝑆
3. CENTRAL LIMIT THEOREM
• 𝐗 distribution can be Normal OR not Normal (e.g. Bernoulli or other)
• let’s focus on X NOT Normal
or anything else NOT Normal
• and let’s SAMPLE from X X
• Sample size ? n
𝜎
• IF n is HUGE ഥ 𝑖𝑠 𝑎𝑙𝑚𝑜𝑠𝑡 𝑁 𝜇;
𝐗
𝑛
• As n increase
𝜎 ഥ
ഥ 𝑔𝑒𝑡𝑠 𝑐𝑙𝑜𝑠𝑒𝑟 𝑡𝑜 𝑁 𝜇;
𝐗 𝐗
𝑛
3. CENTRAL LIMIT THEOREM PRACTICAL
• 𝐗 is not Normal
or anything else NOT Normal

• we SAMPLE from X 𝜎
• ഥ 𝑖𝑠 𝑎𝑙𝑚𝑜𝑠𝑡 𝑁 𝜇;
If n is at least 30 then 𝐗
• otherwise 𝑛
ഥ distribution format
• nothing can be said about 𝐗
4. CONFIDENCE INTERVALS
mandatory ഥ ~𝑁
𝑿
• ഥ
Standard Error of the mean = Standard Deviation of 𝐗
𝝈 ഥ~𝑁 𝜇;
𝝈 𝑵−𝒏
ഥ~𝑁 𝜇;
𝐗 𝐗
𝒏 𝒏 𝑵−𝟏
• Margin of Error
𝜎 for finite statistical populations
𝑀𝐸 = 𝒛𝜶Τ𝟐
𝑛 𝒛𝜶Τ𝟐
• Estimate 𝒕𝒏−𝟏;𝜶Τ𝟐
the reliability factor
𝑠
𝑀𝐸 = 𝒕𝒏−𝟏;𝜶Τ𝟐
𝑛
• LCL = 𝑥ҧ − 𝑀𝐸 and UCL = 𝑥ҧ + 𝑀𝐸
• Range = UCL-LCL=2ME
A summary to support the Margin of Error computation So IF the GOAL is to compute a Confidence Interval for μ (the mean value of X), with a specific confidence level,
I need to compute the Margin of Error.

I NEED a SAMPLE (randomly selected from the statistical population of the X random variable) and
𝑛
with the sample I will compute 1
xത = ෍ x𝑖 • the mean value, xത
𝑛 • the variance, s2 (and its square root, s)
𝑖=1
𝑛
1 both estimates of the parameters μ, σ2 and σ
s2 = ෍ x𝑖 − xത 2
𝑛−1
𝑖=1
• if X is Normally distributed
• 𝐗 ഥ is also Normally distributed
• if σ is known (not likely to occur) I will use a z-score
• if σ is UNknown I will use s instead and a t-score
• mandatory for small samples (n<30)
• for big samples as the format of a t-Student with many degrees of freedom is close to a Standard
Normal I CAN use a z-score (as it is simpler) instead the t-score
•if X NOT Normally distributed (e.g. a Bernoulli or something else)
• WE NEED a BIG SAMPLE (n>30) so that the Central Limit Theorem applies:
ഥ is aproximately Normally distributed (this is enough to proceed!)
𝐗
• if σ is known (again, not likely to occur) I will use a z-score
• if σ is UNknown I will also use a z-score

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy