Unit 5 - Selecting Sample From Population
Unit 5 - Selecting Sample From Population
(MAT 202)
JUNYMER C. PLANTADO, PHD
2
Objectives
1. Differentiate parametric and non-parametric statistics; sample and population, probability and
non-probability sampling techniques.
2. Determine the appropriate sampling size for a specified research objective.
3. Enumerate the different probability sampling techniques.
4. Enumerate the different non-probability sampling techniques.
5. Determine the appropriate sampling technique for a specified research objective.
3
Parametric and Non-Parametric Statistics
❑ Parametric Statistics – branch of Statistics which assumes that sample data comes from a
population that follows a probability distribution based on a fixed set of parameters. It is most well-
known elementary statistical methods (e.g., Student’s t-test).
❑ Non-parametric Statistics – refer to a statistical method wherein data is not required to fit a
normal distribution. Usually, it uses ordinal data (e.g., Wilcoxon test).
❑ In parametric statistics, the information about the distribution of the population is known and is
based on a fixed set of parameters. In nonparametric statistics, the information about the
distribution of a population is unknown, and the parameters are not fixed, which makes is
necessary to test the hypothesis for the population.
❑ To decide whether to use parametric or nonparametric statistics, you should consider several
criteria about the sample data and the assumptions, and carefully evaluate the validity of those
assumptions.
4
Concept of Sampling
Population
Target Population
Accessible
Population Sample
Target Population Accessible Sample
(determined by Population (identified Sample
some criteria) (available to the with sampling Frame
researcher) method)
Research Participants/
Respondents, Subject or
Informant
5
Sampling: Definition & Purpose
❑ Sampling
• It is the process of selecting a number of
individuals for a study in such a way that they
represent the larger group from which they
were selected.
• The process which involves taking a part of the
population, making observations on this
representative groups, and then generalizing
the findings to the bigger groups, and then
generalizing the findings to the bigger.
6
Important Concepts and Terminologies
❑ Population
• refers to a group that has one or more characteristics in common.
❑ Target population
• consist of all the people with a common characteristics to whom investigators plan to
generalize their result.
❑ Accessible population
• a population where researchers delineate within the target population to which they have
access.
❑ Sample
• comprise of individuals, items, or events selected from a larger group (population).
❑ Sample frame
• actual list of sampling units from which the sample, or some stage of the sampling, is selected.
7
Important Concepts and Terminologies
cont.
❑ Observation unit
• the actual respondents of a study by which information is collected.
❑ Parameter
• the summary of description of a given variable in a population.
❑ Sampling error
• refers to the chance variations that occur in sampling; doe not suggest that a mistake has
been made in sampling process.
❑ Generalizability
• the extent to which the results of one study can be applied to other populations or situations
(similar to the concept of external validity).
8
Reasons for Sampling
Research Study. An investigation on the percentage of Filipino students that has reliable access to
internet.
9
Important Sampling Principles
10
Problems with Sampling
11
Basic Steps in Sampling
Identification
of Population
Determine
How big should the sample be?
Sample Size
12
How Large Should the Sample Be?
This is a difficult question to answer but the safe answer is ”it should be large
enough”
Sample accuracy versus sample precision
Key issues:
1. Heterogeneity of population
Determine 2. Number of subgroups
Sample Size
3. Size of subgroups
4. Precision of sample statistics
If the sample is too small, results of the study may not be generalizable to the
population.
For estimation, sample size depends on:
1. Maximum error estimate
2. Population standard deviation
3. Degree of confidence
13
How Large Should the Sample Be?
The larger the population size, the smaller the percentage of the population
required to get a representative sample.
For smaller population, say N = 100 or fewer, there is little point in sampling;
survey the entire population.
If the sample size is around 500 (N = 500), give or take 100, 50% should be
Determine sampled.
Sample Size
If the population size is around 1,500, 20% should be sampled.
Beyond a certain point (about N = 5,000), the population size is almost
irrelevant and sample size of 400 will be adequate.
14
How Large Should the Sample Be?
15
How Large Should the Sample Be?
16
How Large Should the Sample Be?
Avoiding Sampling Error and Sampling Bias
▪ Selecting random samples does not guarantee that they will be representative
of the population.
▪ Sampling error, which is beyond the control of researcher, is a reality of
random sampling.
▪ No sample will have a composition precisely identical to that of a population. But
Determine if we selected sufficiently large, the chances are that the sample will closely
Sample Size represent the population.
▪ By chance, a sample will differ significantly from the population on some
important variable. If this happens, we should stratify on that variable.
▪ Sampling bias is different from sampling error. It does not result from the
random differences between samples and populations.
▪ Sampling bias is non-random and is generally the fault of a researcher (i.e.,
some aspects in sampling creates bias in the data).
17
Guidelines in Determining Adequate Sampling
When the population is more or less homogeneous and only the typical, normal
or average is desired to be known, a smaller sample is enough. However, if
differences are desired to be known, a larger sample is needed.
When the population is more or less heterogeneous and only the typical,
normal, or average is desired to be known, a larger sample is needed. However,
Determine if only their differences are desired to be known, a smaller sample is sufficient.
Sample Size The size of a sample varies inversely as the size of the population. A larger
proportion is required of a smaller population and a smaller proportion may do
for a bigger population.
For greater accuracy and reliability of results, a bigger sample is desired.
• In biological and chemical experiments such as testing the effects of drugs
and other substances. The use of few persons is more desirable to
determine the reactions of humans to such drug and other substances
being tested.
18
Determining Sample Size
Determine
Sample Size
Source: R.V. Krejcie and D.W. Morgan (1970). Determining sample size for research activities. Educational and
Psychology Measurement. Copyright 1970 by Sage Publication
19
Determining Sample Size
Computed sample sizes for Different Population (N) at 0.01 Level of Probability to a
Population of 0.50.
Determine
Sample Size
20
Determining Sample Size
Determine
Sample Size
21
Determining Sample Size
Slovin’s Formula
Formula (based on Slovin, 1960)
𝑁
𝑛= Eq. 20
1 − 𝑁𝑒 2
where n = sample size
Determine N = population
Sample Size e = desired margin of error (% allowance for non-precision)
Example 1.
If in your research the population (N) is 9,000 and the margin of error that you will
allow is 2%, what will be your sample size (n)?
Solution
Using Eq. 20,
𝑁 9000
𝑛 = 1−𝑁𝑒 2 = 1− 9000 = 𝟏𝟗𝟓𝟕
.02 2
22
Determining Sample Size
Note:
Cochran’s formula is considered especially appropriate in situations with large
populations. A sample of any given size provides more information about a smaller
population than a larger one, so there’s a ‘correction’ through which the number
given by Cochran’s formula can be reduced if the whole population is relatively small.
Example 2.
Suppose we are doing a study on the inhabitants of a large town, and want
to find out how many households serve breakfast in the mornings. We don’t have
much information on the subject to begin with, so we’re going to assume that half of
the families serve breakfast: this gives us maximum variability. So p = 0.5. Now let’s
say we want 95% confidence, and at least 5 percent—plus or minus—precision. A 95
Determine % confidence level gives us Z values of 1.96, per the normal tables, so we get
Sample Size
Using Eq. 21,
𝑍 2 𝑝𝑞
𝑛0 = 2
𝑒
1.96 20.5 0.5
𝑛0 = = 385
0.05 2
A random sample of 385 households in our target population should be enough to give us the
confidence levels we need.
Example 3.
In the previous example, if there were just 1000 households in the target
population, we would calculate
Source: Charan, J., & Biswas, T. (2013). How to calculate sample size for different study designs in medical research? Indian
Journal of Psychological Medicine, 35(2), 121. https://doi.org/10.4103/0253-7176.116232
27
Determining Sample Size
Example 4.
Let us assume that a researcher wants to estimate proportion of patients
having hypertension in pediatric age group in a city. According to previously
published studies actual number of hypertensives may not be more than 15%. The
researcher wants to calculate this sample size with the precision/absolute error of
5% and type I error of 5%.
Determine
Sample Size Using Eq. 22,
2
𝑍1−𝛼/2 𝑝 1 − 𝑝
𝑛=
𝑑2
1.96 2 0.15 1 − 0.15
𝑛0 = = 196
0.05 2
It means, for this cross-sectional study researcher has to take at least 196 subjects. If the
researcher wants to increase the error (decrease precision) then denominator will increase and
hence sample size will decrease..
Source: Charan, J., & Biswas, T. (2013). How to calculate sample size for different study designs in medical research? Indian
Journal of Psychological Medicine, 35(2), 121. https://doi.org/10.4103/0253-7176.116232
28
Determining Sample Size
Source: Charan, J., & Biswas, T. (2013). How to calculate sample size for different study designs in medical research? Indian
Journal of Psychological Medicine, 35(2), 121. https://doi.org/10.4103/0253-7176.116232
29
Determining Sample Size
Example 5.
Suppose the same researcher is interested in knowing average systolic
blood pressure of children of the same city. So, if the researcher is interested in
knowing the average systolic blood pressure in pediatric age group of that city at 5%
of type of 1 error and precision of 5 mmHg of either side (more or less than mean
systolic BP) and standard deviation, based on previously done studies, is 25 mmHg
Determine then formula for sample size calculation will be
Sample Size
Using Eq. 23,
2
𝑍1−𝛼/2 𝑆𝐷2
𝑛=
𝑑2
1.96 2 25 2
𝑛0 = = 96
5 2
So the researcher will have to take the blood pressure of at least 96 children to know the
average systolic blood pressure in pediatric age group.
Source: Charan, J., & Biswas, T. (2013). How to calculate sample size for different study designs in medical research? Indian
Journal of Psychological Medicine, 35(2), 121. https://doi.org/10.4103/0253-7176.116232
30
Determining Sample Size
Other Formulae:
Formula:
𝑁𝑋
𝑛= Eq. 24
𝑋+𝑁−1
2
𝑍𝛼/2 ∙ 𝑝 1−𝑝
Determine where 𝑋 = Eq. 24.a
𝑀𝑂𝐸 2
Sample Size and
𝑍𝛼/2 = critical value of the normal distribution at 𝛼/2
MOE = margin of error Here are the z-scores for the most
common confidence levels:
p = sample proportion
Confidence Level Critical Value
N = population size
90% - z score 1.645
95% - z score 1.96
00% - z score 2.576
Source: Daniel WW (1999). Biostatistics: A Foundation for Analysis in the Health Sciences. 7th edition. New York:
John Wiley & Sons.
31
Determining Sample Size
Other Formulae:
Formula for the Minimum sample Size for an Interval Estimate of the Population
Mean:
2
𝑍𝛼/2 ∙ 𝜎
𝑛= Eq. 25
𝐸
Determine
Sample Size where 𝑍𝛼/2 = critical value of the normal distribution at 𝛼/2
E = maximum error estimate
𝜎 = population standard deviation
Here are the z-scores for the most
common confidence levels:
Confidence Level Critical Value
90% - z score 1.645
95% - z score 1.96
00% - z score 2.576
Source: Daniel WW (1999). Biostatistics: A Foundation for Analysis in the Health Sciences. 7th edition. New York:
John Wiley & Sons.
32
Determining Sample Size
Example 6.
A scientist wishes to estimate the average depth of a river. He wants to be
99% confident that the estimate is accurate within 2 ft. From the previous study, the
standard deviation of the depths measured was 4.38 ft.
Since 𝛼 = 0.01, 𝑍𝛼/2 = 2.58 and 𝐸 = 2, plugging in these values to Eq. 25,
Determine 2
Sample Size 𝑍𝛼/2 ∙ 𝜎
𝑛=
𝐸
2
2.58 4.38
𝑛0 = = 31.92
2
Thus, at 99% level of confidence, the scientist needs at least a sample of 32 measurements.
33
Sampling Techniques
❑ Probability Sampling
▪ Type of sample in which "every person, object, or event in the population
has a nonzero chance of being selected."
▪ When probability sampling is used, inferential statistics allow estimation of
the extent to which the findings based on the sample are likely to differ from
Selection of the total population.
Sample
34
Sampling Techniques
35
Sampling Techniques
36
Sampling Techniques
▪ Advantages
(1) Sample mean is an unbiased estimate of the population mean
(2) Simple and easy method
Selection of ▪ Disadvantages
Sample
(1) The sample chosen may be widely spread, thus, entailing high
transportation cost (geographically dispersed)
(2) A population frame or list is needed
(3) Hard to achieve in practice
37
Sampling Techniques
▪ When to used?
➢ If the population is not widely dispersed
➢ If the population is more or less homogeneous with respect to the
Selection of characteristics under study (use stratified sampling if individuals are
Sample heterogeneous)
38
Sampling Techniques
Systematic Sampling
▪ It is a method of selecting a sample by taking every kth unit from an ordered
population, the first unit being selected at random.
▪ Steps:
39
Sampling Techniques
Systematic Sampling
▪ Let’s assume that we have N = 100 and that you want n = 20.
▪ To use systematic sampling,
(1) The population must be listed in random order. Number the units in the
population from 1 to N.
Selection of
Sample (2) Decide on the sample size (n) that you want or need ➔ n = 20
(3) The sampling interval (𝑘) using the formula:
100
𝑘= =5
20
(4) Randomly select an integer (starting point) ➔ let say randomly you select 4.
(5) Then, select the 4th unit as the starting point and take every 𝑘 𝑡ℎ unit thereafter.
40
Sampling Techniques
Systematic Sampling
Starting
point
Selection of
Sample
41
Sampling Techniques
Systematic Sampling
▪ Advantages
(1) Selection of the sample is administratively easier, quicker, and
cheaper
Selection of (2) It is possible to select a sample in the field without a sampling frame
Sample (3) It may also be more precise than simple random sampling.
(4) There is simply no easier way to do random sampling in some
situations
▪ Disadvantages
(1) A systematic sample may give poor precision when unsuspected
periodicity is present in the population (e.g. every 25th household in
the sample or population is rich)
(2) If the population is not in random order, one cannot validly If the
population is not in random order, one cannot validly. 42
Sampling Techniques
Stratified Sampling
▪ There may often be factors which divide up the population into sub-
populations (groups/strata) and we may expect the measurement of
interest to vary among the different sub-populations.
▪ This has to be accounted for when we select a sample from the population
Selection of in order that we obtain a sample that is representative of the population.
Sample This is achieved by stratified sampling.
▪ A stratified sampling is obtained by taking samples from each stratum or
sub-group of a population.
▪ Stratified sampling can be done thru:
(1) Proportional allocation
(2) Quota or non-proportional allocation
43
Sampling Techniques
Stratified Sampling
▪ Example: Suppose that in a company there are the following staff:
(A)male, full time = 90; (B) male, part-time = 18; (C) female, full-
time = 9; and (D) female, part-time = 63. You want to have a
sample size of n = 80.
Selection of A = 90/180 x 80 = 40 C = 9/80 x 80 = 4
Sample
B = 18/80 x 80 = 8 D = 63/80 x 80 = 28
▪ Advantages
(1) Ensures adequate sample size for subgroups
(2) Inferences can be drawn about population and about specific
subgroups
(3) Very likely more efficient statistical estimate; precision increased if
variability within strata is smaller (homogenous) between strata
(4) Because strata are independent, different approaches for subgroups
44
Sampling Techniques
Stratified Sampling
▪ Disadvantages
(1) Complexity in implementation and estimation; different strata can be
difficult to identify; problem if the strata is not clearly defined.
(2) can be difficult to identify; problem if the strata is not clearly
Selection of
Sample (3) can be difficult to identify; problem if the strata is not clearly
45
Sampling Techniques
Cluster Sampling
▪ Cluster sampling is a sampling technique where the entire population is
divided into groups population is divided into groups these clusters are
selected. All observations in the selected observations in the selected.
Selection of ▪ A requirement of a cluster sample is inclusion of all the members of the
Sample cluster.
▪ Cluster sampling is typically used when the researcher cannot get a
complete list of the members of a population they wish to study but can get
a complete list of groups or “clusters” of the population.
▪ It is also used when a random sample would produce a list of subjects so
widely scattered that surveying them would prove to be too far too
expensive.
▪ The sample cluster may be chosen by random sampling or systematic
sampling with random start.
46
Sampling Techniques
Cluster Sampling
▪ Advantages
(1) No need to construct list of elements in the population; the frame for
cluster sampling is simply a list of cluster.
(2) It is cheaper because field cost would be minimized by the elements
Selection of being physically closer together than elements selected by random or
Sample stratified sampling.
▪ Disadvantages
(1) It is not as effective as random sampling and stratified .sampling in
ensuring representativeness.
(2) Larger sampling error than simple random sampling.
47
Sampling Techniques
Multi-Stage Sampling
▪ In multistage sampling, we combine several techniques of sampling into
two or more phases of selection.
For example, in a household survey, you wish to select a sample of 30
households from a certain town. Suppose further that the town can be
Selection of divided into N=10 blocks of M=10 households per block. The units need not
Sample have the same number of elements. A sample of N = 5 blocks is selected
either by simple random sampling or by systematic sampling with a random
start. Then from each selected block, a sample of m = 6 households is
selected. Note that sampling is done at each stage. This is an example of two-
stage sampling with blocks as first stage or primary units, and households as
second stage or secondary units.
48
Sampling Techniques
Multi-Stage Sampling
▪ Advantages
(1) It is more efficient and flexible than single stage sampling.
(2) Except for the first stage units, a sampling frame is required only for
Selection of those units selected in order to sample the sub-units.
Sample (3) Transportation costs are greatly reduced especially when first stage
units are geographically distant from one another.
▪ Disadvantages
(1) Complexity in theory, which maybe difficult to apply in the field
(2) Estimation procedures are difficult for non-statisticians to follow.
49
Sampling Techniques
❑ Non-probability Sampling
▪ Type of sample in which “not all element of the population have equal
chance of being selected.”
▪ Samples are selected on the basis of investigator’s judgments about
achieving particular research objectives.
Selection of
Sample ▪ Has a less chance of obtaining a representative sample because they are
not derived through equal chances.
50
Sampling Techniques
51
Sampling Techniques
52
Sampling Techniques
Snowball Sampling
▪ Highly specialized method of sampling.
▪ It involves starting a process with one individual or group and using their
contacts to develop the sample.
Selection of ▪ Selecting participants by finding one or two participants and then asking
Sample them to refer you to others.
▪ Chosen for specific purpose.
53
Sampling Techniques
Quota Sampling
▪ Similar to stratified sampling.
▪ Subjects are recruited because they match a requirement or quota.
▪ The researcher pre-defines attributes of the elements in the population that
Selection of should be sampled. For example, it could be all women in a company, or it
Sample could be voters in the age range of 18-22 years in a region, etc. This is a
way of collecting samples in a fast way but leaves space for bias.
▪ Chosen for specific purpose.
54
How to use Excel Sampling to find a Sample
Example 7.
For the sake of discussion, let us use again data set of the temperatures of 50 provinces.
Solution
Step 1. Enter your data items into Excel. You can
enter your data into rows or columns. Ensure the
rows and columns are even; for example, enter data
into column A to cell 12 and column B to cell 12.