0% found this document useful (0 votes)
16 views65 pages

5-6.sampling Error and Confidence Interval 1

Uploaded by

yanghm669
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views65 pages

5-6.sampling Error and Confidence Interval 1

Uploaded by

yanghm669
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Sampling Error and Confidence Interval

抽样误差与置信区间

Haomin Yang
School of Public Health
Fujian Medical University

1
Content
• Sampling error and Sampling distribution
• Central limit theorem
• Standard error
• t distribution
• Point estimation
• Confidence Interval estimation

2 2
Population and sample
• Population: The whole individuals that one
intends to study.

--- Homogeneity but with Variation.

• Sample: A representative part of the population. It


is a subset of the population.

3
The relationship between the population and sample

Population
(The complete set) inference

sampling Sample
(The subset of
the population)

Samples are taken from populations to provide estimates of population


parameters. Then we use sample data to make an inference about a population.

4
Why use sample?

• Cost
• Time
• Possibility to find all individuals

• Further questions for using samples


– How to select sample from the population?
– How many are enough?

5 5
Aims of sampling

• Reduces cost of research

• Generalize about a larger population

• In some cases (e.g. industrial production) analysis


may be destructive, so sampling is needed
Sampling research
• In statistics, sampling is the selection of a subset (a statistical
sample) of individuals from within a statistical population to
estimate characteristics of the whole population.

• Need to evaluate the precision of our estimation→ aim of this


lecture

7
Importance of sampling

• Traditionally, the marginal costs of data collection and


processing were high

• In the Era of Big Data: easier and faster to collect, store


and process lots of data

• sampling?

8
Importance of sampling

Even if the data that you have is very “big”, it might represent only a part
of the population and may not be representative of the whole!
9
10
Importance of sampling

• Throwing computational resources at a problem may not


always solve the problem

• Focus on reducing the sampling bias, not only increasing the


sample size

• Sometimes, less is more!

11
• Generally, it is costly and
labor-intensive to study the
entire population, and in
some cases even
impossible because the
Advantages of sampling
• lower cost number of the whole
• faster data collection individuals may be infinite.
• High quality of data

12
Population

All people or items with the characteristic one


wishes to understand
• Eg. All people in FJMU
• Dimensions: time? Space?

• Broad or narrow:carefully define


- Demographically mixed and geographically
dispersed→ difficult to gain access
13
Sampling frame

The sampling frame is the actual list of individuals that


the sample will be drawn from. Ideally, it should include
the entire target population.

Eg. Work condition of doctors in hospital A

14
Sample size

The number of individuals in your sample depends on the


size of the population, and on how precisely you want the
results to represent the population as a whole.

• sample size calculator (www.openepi.com)


• the larger the sample size, the more accurately and
confidently

15
16
Sampling techniques

• Random sampling
 Simple random sampling
 Systemic sampling
 Stratified random sampling
 Cluster sampling
• Non-random sampling
 Convenience Sampling
 Judgement Sampling
 Snowball sampling
17
Simple random sampling

• Representativeness: All individuals in a


population have an equal chance of being
selected

• Set a random number for each individual

• Limitation: need complete list of all


individualls. If not possible, use other
sampling approach
18
19
Eg. Work condition of doctors

Assign a number to every doctor in the


hospital database from 1 to 1000, and
use a random number generator to
select 100 numbers.

20
Systemic sampling

• Members of the population are put in some


order. A starting point is selected at random,
and every nth superscript member is selected
to be in the sample.
• Evenly sampled

• Hidden periodic trait within the population


and sampling coincidently consistent with
that periodic trait--- not representative
• Hospitalization in the first week of Oct 21
22
Eg. Work condition of doctors

All doctors are listed in alphabetical order.


From the first 10 numbers, you randomly
select a starting point: number 6. From
number 6 onwards, every 10th person on
the list is selected (6, 16, 26, 36, and so on),
and you end up with a sample of 100.

23
Stratified random sampling
• The population is first split into groups.
The members from each group are
chosen randomly.
• Groups should not be overlapped
• Equal importance and variance of data in
each group
• Draw more precise conclusions by
ensuring that every subgroup is properly
represented in the sample.
24
25
Eg. Work condition of doctors

The hospital has 600 female doctors and 400


male doctors. You want to ensure that the
sample reflects the gender balance of the
hospital, so you sort the population into two
strata based on gender. Then you use random
sampling on each group, selecting 60 women
and 40 men.

26
Cluster sampling
• The entire population is divided into clusters
or sections and then the clusters are
randomly selected. All the elements of the
cluster are used for sampling. Clusters are
identified using details such as age, sex,
location (geographic cluster)
• Give all the clusters equal chances of being
selected
• Instead of sampling individuals from each
subgroup, you randomly select entire
subgroups. 27
Cluster sampling

This method is good for dealing with large


and dispersed populations, but there is
more risk of error in the sample, as there
could be substantial differences between
clusters. It’s difficult to guarantee that the
sampled clusters are really representative
of the whole population.

28
29
Eg. Work condition of doctors

The hospital group has clinics in 10


communities across the city (all with roughly
the same number of doctors in similar roles).
You don’t have the time to go to every clinic
to collect your data, so you use random
sampling to select 3 clinics – these are your
clusters.

30
Non-random sampling

31
32
Exercise

Suppose you are going to be conducting a study on


FJMU students, asking for their opinion on influenza
vaccination. First, formulate your research question.
Then, describe how you would carry out the sampling of
students using the following methods:
(a) simple random sampling
(b) stratified sampling
(c) cluster sampling

33
Sampling error

34
Sampling research
• Statistical inference refers to reach conclusions about
population based on a sample.

• The sampling error exists in any sampling research.


Sampling error
• The difference between statistics from different
samples, as well as the difference between sample
statistics and population parameter, is called
sampling error.
• It can not be avoided but can be estimated.

36
• Sample surveys take into account the
study of a tiny segment of a population,
so, there is always a particular amount of
inaccuracy in the information obtained

Sampling Error =
(Response Error) +
(Frame Error) +
(Chance Error)

37
How to Reduce Sampling Error?
• Increasing Sample Size
the size of the sample increases, the chance of
occurrence of the sampling error will be less. No error if
the sample size and the population size coincide

• Stratification
Stratified sampling: all the groups are defined in the
sample, the sampling error is reduced.

38
• Sampling error is the reason why we have to use
statistics.

• sampling error is a consequence of


– the population distribution of the variables

– the sampling method used to investigate the population.

39
Sampling distribution

40
Sampling distribution

• a probability distribution of a statistic obtained from a


larger number of samples(with sample size N) drawn
from a specific population, usually the mean

41
Sampling Distribution
• A sampling distribution is a distribution of a statistic over
all possible samples.

• To get a sampling distribution,


– 1. Take a sample of size N (a given number like 5, 10, or 1000) from
a population
– 2. Compute the statistic (e.g., the mean) and record it.
– 3. Repeat 1 and 2 a lot (infinitely for large pops).
– 4. Plot the resulting sampling distribution, a distribution of a
statistic over repeated samples.
Simulation test
• Population: X ~N (165.70,3.212 )
• Repeatedly draw 100 independent, random
samples from the same population with
sample size equal to 20.
1、 165.82, 3.06
Population: 2、 164.98, 3.04
Normal 3、 165.75, 3.07
distribution ┆ 100 samples
99、165.82,3.14
 =165.70
100、165.92, 3.18
 =3.21 n =20

43
Simulation test
Frequency

X ~N (165.70,3.212 )

Sample mean
44
Simulation test indicate that:

• The sample means are different from


population mean.

• The sample means differ from each other.

• The mean of the sample means is equal


to the population mean.

45
Simulation test indicate that:

• The range of the sample means is narrower


than that of the original population
distribution.

• The sample mean is symmetric about the


population mean, taller around center, shorter
on two sides. It is normal distribution.

46
Central limit theorem 1
If a population is a normal distribution, with mean
equal to μ and standard deviation equal to σ, the
sampling distribution of the sample mean x is also
normal distribution with mean equal to μ and
standard deviation equal to the population standard
deviation divided by the square root of the sample
2
size. X ~ N ( , 2 ) X ~ N ( , )
n

47
Central limit theorem 1
• The population is
normal distribution.

• The variation of the


sample means
decreases as the
sample size n
increases.

48
Central limit theorem 2
For simple random samples of n observations taken
from a population with mean equal to μ and standard
deviation equal to σ, regardless of the population’s
distribution, provided the sample size n is sufficiently
large, the distribution of the sample mean x will be
approximately normal with mean equal to μ and standard
deviation equal to the population standard deviation
divided by the square root of the sample size.
2
X ~ N ( , )
n 49
Central limit theorem 2
• The population is
uniform distribution.

• The variation of the


sample means
decreases as the
sample size n
increases.
50
Central limit theorem 2
• The population is
exponential distribution.

• The variation of the


sample means
decreases as the
sample size n
increases.
51
Central limit theorem 2
• The population is U-
shaped distribution.

• The variation of the


sample means
decreases as the
sample size n
increases.
52
53
Sampling Distribution
• The sampling distribution shows the relation
between the probability of a statistic and the
statistic’s value for all possible samples of
size N drawn from a population.
f(M) Hypothetical Distribution of Sample Means
Sampling Distribution Mean and SD
• The Mean of the sampling distribution is defined
the same way as any other distribution
(expected value).
• The SD of the sampling distribution is the
Standard Error. Important and useful.
• Variance of sampling distribution is the expected
value of the squared difference – a mean
square.
Standard error
• The variation of the sample mean, or
the standard deviation of the sample
mean, is called the standard error of

the mean SE, denoted by X  .
n

• The standard deviation of the initial


variable: σ.
56
Standard error
• The standard error is used to measure the
sampling error. It is affected by both standard
deviation and sample size.

• The standard deviation is a fixed level we


cannot change. In order to minimize the
standard error , the only thing we can do is to

increase sample size. X 
n
57
Standard error

• In practice, the population standard deviation σ is


usually unknown and replaced by the sample
standard deviation s approximately.

 S
X  SX 
n n

58
• SE gives us a way to quantify how much variability we
expect to see in a sampling distribution

• A point estimate is useless without some kind of


associated measure of uncertainty. A standard error is
one such measure

59
60
Exercise
• Random samples of size 225 are drawn from a population
with mean 100 and standard deviation 20. Find the mean and
standard deviation of the sample mean.

• Random samples of size 64 are drawn from a population


with mean 32 and standard deviation 5 . Find the mean and
standard deviation of the sample mean

61
Exercise

A population has mean 75 and standard deviation 12.

1.Random samples of size 121 are taken. Find the mean and
standard deviation of the sample mean.

2.How would the answers to part (a) change if the size of


the samples were 400 instead of 121?

62
Exercise

• If the standard error of the mean is 10 for N=12 ,


what is the standard error of the mean for N=22 ?

• If the standard error of the mean is 50 for N=25 ,


what is it for N=64 ?

63
True/false
• The standard error of the mean is smaller when N=20 than when N=10

• You choose 20 students from the population and calculate the mean of their test
scores. You repeat this process 100 times and plot the distribution of the means. In
this case, the sample size is 100

• The median has a sampling distribution

• In your school, 40% of students watch TV at night. You randomly ask 5 students
every day if they watch TV at night. Every day, you would find that 2 of the 5 do
watch TV at night

64
What is the point of all this
• Why looking at properties of repeated samples from a
population?
• We also don’t know anything about the population
parameter of interest.

• how point estimates behave under repeated sampling


(i.e. sampling distributions),
• how ‘sampling error’ and ‘standard error’ relate to
sampling distributions.

65

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy