Sampling
Sampling
In sampling, we select a group of individuals from a target population. This group of individuals
forms a sample. Why? As the population is large (say, all people in the country), it will not be possible
to study each individual in the population. To make it manageable, we select individuals that represent
the population. By studying and analyzing this sample, we want to characterize the whole population.
• Population: Based on the scope of the study, the population includes all possible outcomes.
• Sampling Frame: Contains the accessible target population under study. We derive a sample from
the sampling frame.
• Sample: Subset of a population, selected through various techniques that we will cover in this
guide.
Advantages of Sampling
Sampling brings many advantages in terms of speed and accuracy. While we are inclined to think that
studying each individual on the whole population will lead to accuracy, we tend to overlook the many
sources of errors that can happen in a study of the whole population. Further, in most cases, it is just
not feasible to study the whole population.
A sample can provide accuracy as we will be able to deploy trained field workers on whom we can
rely to collect the observations, scientifically monitor the biases and remove them and since we are
collecting limited observations, we reduce the possibility of mistakes that come from processing the
data. Moreover, the smaller size of the sample means that we can supervise with efficacy and have
clean, usable data.
Errors in sample selection
Selecting a sample that closely represents the population is critical to business problem-solving. Here
are some of the errors:
• Cyclical business induced errors — If we are looking at buying behaviour, taking samples around
Christmas and Diwali will not be representative of the overall behaviour.
• Specification error — If the study is around sales of toys, and we survey the mothers only, that
may not be accurate as children influence the buying behaviour.
• Sample frame error — This error happens when we select the wrong sub-population. For instance,
our study was to understand if the population favours a new policy that has been introduced in
India. We survey everyone who speaks English. It may not be accurate as ~90% of the country’s
population does not speak English.
Let’s understand the sampling process
1. Define target population: Based on the objective of the study, clearly scope the target population.
For instance, if we are studying a regional election, the target population would be all people who
are domiciled in the region that are eligible to vote.
2. Define Sampling Frame: The sampling frame is the approachable members from the overall
population. In the above example, the sampling frame would consist of all the people from the
population who are in the state and can participate in the study.
3. Select Sampling Technique: Now that we have the sampling frame in place, we want to select an
appropriate sampling technique.
4. Determine Sample Size: To ensure that we have an unbiased sample, free from errors and that
closely represents the whole population, our sample needs to be of an appropriate size. What is an
appropriate size? Well, this is dependent on factors like the complexity of the population under
study.
5. Collect the Data: Data collection is critical to solving the business case. We should attempt to
ensure that we don’t have too many empty fields in our data, and we document the reasons in cases
where the data is missing.
6. Assess response rate: It is important to closely monitor the response rate to ensure you make
timely changes to your sample collection approach and ensure you achieve your determined
sample collection.
Popular Sampling Techniques
The various ways in which we can select samples can be divided into two types:
1. Probability Sampling: Some researchers refer to this as random sampling.
2. Non-Probability sampling: This is also referred to as non-random sampling.
1. Probability sampling
1.1 Simple Random sampling:
Here, as the name suggests, we pick the sample, at random. There is no pattern, and it’s a purely
random selection. For instance, you wanted to survey vaccination uptake. You could put 100 names
of all eligible people in a hat and pull out a few to sample them.
Let’s look at the two subtypes of simple random sampling:
1.1.1 Simple random sampling with replacement
Here, in a sample size N, you select an element of the population and return it to the population. This
implies that each element of the population could theoretically be selected more than once. Each time
we select an individual, we have the whole selected population available to select from. Typically,
when the population itself is small, we use this technique.
1.1.2 Simple random sampling without replacement
Here, once you select an individual from the population, you don’t return it. With each passing
selection, the available population decreases by 1. This also implies that for a sample size N, we repeat
the selection process N times. When the population size is large, we go for this without-replacement
method of simple random sampling.
1.2 Stratified random sampling
When we have supplemental information available to aid with the sample design, we can consider
using stratified random sampling. As the name suggests, we divide the population into strata or groups
based on certain characteristics by which we can identify the groups. Now, we select the elements
from these groups to create a sample. These subgroups are formed based on attributes like a particular
age group, gender, occupation. If your population has a lot of variation, you want to use stratified
random sampling.
For instance, suppose the government wants feedback on a new education policy they are going to
pursue. It will not be sufficient to survey only the stakeholders of government schools, which might
be easier to accomplish. The sample would need representation from all strata on which the policy
might have implications like private, semi-private, minority, international schools, in addition to
government schools.
We have three types of stratified random sampling:
1.2.1 Proportionate Stratified Random Sampling
Here, we divide each stratum in proportion to its representation in the whole population under study.
For instance.
Non-probability sampling
In this kind of sampling, we intentionally do not assign importance to each element in the population
having an equal chance of being picked up in the sample.
2.1 Quota sampling
In this, we divide the population into quotas that represent the population, and this forms the basis of
the elements we select in the sample. This might look similar to random sampling, but the important
difference is that we first divide the population into fixed quotas. From these fixed quotas, we select
the sample. Quota could be something like all males above 20 or children between 12 and 18 years of
age. Using quota sampling saves time and resources and is a quick way to get the study started.
2.2 Snowball sampling
This is one of the most interesting non-probabilistic techniques. You first select, at random, members
for the sample. Suppose you selected 3 members. Now, these three will suggest more names for the
study, and this creates a chain effect. Snowball sampling is useful in cases where it is difficult to
locate people, or they do not wish to be identified. For instance, in medical research where you are
studying a rare disease, you might find that snowball sampling is the only way you can get to the
desired sample size.
There are three sub-categories in snowball sampling:
2.2.1 Linear snowball sampling
The chain grows linearly. Each member in the sample refers to one more member.
2.2.2 Exponential non-discriminative snowball sampling
One to many relationships. Each member in the study refers to multiple members, and all are selected
in the study. As you can imagine, this creates an exponential effect on the size of the sample. As you
might have guessed, this may introduce bias into the sampling and researchers have no idea if the
sample is representative of the population under study.
2.2.3 Exponential discriminative snowball sampling
Here, while we will request the member to provide multiple referrals, we will select only one out of
these and nullify the remaining referrals. By doing this, researchers attempt to reduce the chances of
bias in the sampling technique.
2.3 Judgment sampling
Here, the researcher brings forth their qualified opinion and judgment on who should be part of the
sample. This is typically used where you want to select experts or highly intellectual individuals in
your sample. The best approach is to identify the experts and form the sample.
2.4 Convenience sampling
Here, we prioritize the accessibility of the element above other considerations. The researcher selects
the elements based on convenience. This is typically used in the initial phases of the survey, where
the researcher intends to gain quick feedback on the design of the survey. It helps to quickly prototype
the survey design.