5-6.sampling Error and Confidence Interval 1
5-6.sampling Error and Confidence Interval 1
抽样误差与置信区间
Haomin Yang
School of Public Health
Fujian Medical University
1
Content
• Sampling error and Sampling distribution
• Central limit theorem
• Standard error
• t distribution
• Point estimation
• Confidence Interval estimation
2 2
Population and sample
• Population: The whole individuals that one
intends to study.
3
The relationship between the population and sample
Population
(The complete set) inference
sampling Sample
(The subset of
the population)
4
Why use sample?
• Cost
• Time
• Possibility to find all individuals
5 5
Aims of sampling
7
Importance of sampling
• sampling?
8
Importance of sampling
Even if the data that you have is very “big”, it might represent only a part
of the population and may not be representative of the whole!
9
10
Importance of sampling
11
• Generally, it is costly and
labor-intensive to study the
entire population, and in
some cases even
impossible because the
Advantages of sampling
• lower cost number of the whole
• faster data collection individuals may be infinite.
• High quality of data
12
Population
14
Sample size
15
16
Sampling techniques
• Random sampling
Simple random sampling
Systemic sampling
Stratified random sampling
Cluster sampling
• Non-random sampling
Convenience Sampling
Judgement Sampling
Snowball sampling
17
Simple random sampling
20
Systemic sampling
23
Stratified random sampling
• The population is first split into groups.
The members from each group are
chosen randomly.
• Groups should not be overlapped
• Equal importance and variance of data in
each group
• Draw more precise conclusions by
ensuring that every subgroup is properly
represented in the sample.
24
25
Eg. Work condition of doctors
26
Cluster sampling
• The entire population is divided into clusters
or sections and then the clusters are
randomly selected. All the elements of the
cluster are used for sampling. Clusters are
identified using details such as age, sex,
location (geographic cluster)
• Give all the clusters equal chances of being
selected
• Instead of sampling individuals from each
subgroup, you randomly select entire
subgroups. 27
Cluster sampling
28
29
Eg. Work condition of doctors
30
Non-random sampling
31
32
Exercise
33
Sampling error
34
Sampling research
• Statistical inference refers to reach conclusions about
population based on a sample.
36
• Sample surveys take into account the
study of a tiny segment of a population,
so, there is always a particular amount of
inaccuracy in the information obtained
Sampling Error =
(Response Error) +
(Frame Error) +
(Chance Error)
37
How to Reduce Sampling Error?
• Increasing Sample Size
the size of the sample increases, the chance of
occurrence of the sampling error will be less. No error if
the sample size and the population size coincide
• Stratification
Stratified sampling: all the groups are defined in the
sample, the sampling error is reduced.
38
• Sampling error is the reason why we have to use
statistics.
39
Sampling distribution
40
Sampling distribution
41
Sampling Distribution
• A sampling distribution is a distribution of a statistic over
all possible samples.
43
Simulation test
Frequency
X ~N (165.70,3.212 )
Sample mean
44
Simulation test indicate that:
45
Simulation test indicate that:
46
Central limit theorem 1
If a population is a normal distribution, with mean
equal to μ and standard deviation equal to σ, the
sampling distribution of the sample mean x is also
normal distribution with mean equal to μ and
standard deviation equal to the population standard
deviation divided by the square root of the sample
2
size. X ~ N ( , 2 ) X ~ N ( , )
n
47
Central limit theorem 1
• The population is
normal distribution.
48
Central limit theorem 2
For simple random samples of n observations taken
from a population with mean equal to μ and standard
deviation equal to σ, regardless of the population’s
distribution, provided the sample size n is sufficiently
large, the distribution of the sample mean x will be
approximately normal with mean equal to μ and standard
deviation equal to the population standard deviation
divided by the square root of the sample size.
2
X ~ N ( , )
n 49
Central limit theorem 2
• The population is
uniform distribution.
S
X SX
n n
58
• SE gives us a way to quantify how much variability we
expect to see in a sampling distribution
59
60
Exercise
• Random samples of size 225 are drawn from a population
with mean 100 and standard deviation 20. Find the mean and
standard deviation of the sample mean.
61
Exercise
1.Random samples of size 121 are taken. Find the mean and
standard deviation of the sample mean.
62
Exercise
63
True/false
• The standard error of the mean is smaller when N=20 than when N=10
• You choose 20 students from the population and calculate the mean of their test
scores. You repeat this process 100 times and plot the distribution of the means. In
this case, the sample size is 100
• In your school, 40% of students watch TV at night. You randomly ask 5 students
every day if they watch TV at night. Every day, you would find that 2 of the 5 do
watch TV at night
64
What is the point of all this
• Why looking at properties of repeated samples from a
population?
• We also don’t know anything about the population
parameter of interest.
65