What Is Sampling
What Is Sampling
(Sampling)
What is Sampling
Sampling is a process used in statistical analysis in which a predetermined number of
observations are taken from a larger population. The methodology used to sample from a larger
population depends on the type of analysis being performed but may include simple random
sampling or systematic sampling.
1. Time saving: as it is difficult to contact each and every individual of the whole population
2. Cost saving: The cost or expenses of studying all the items (objects or individual) in a
population may be prohibitive
3. Physically Impossible: Some population is infinite, so it will be physically impossible to
check all items in the population, such as populations of fish, birds, snakes, mosquitoes.
Similarly, it is difficult to study the populations that are constantly moving, being born, or
dying.
4. Destructive Nature of items: Some items, objects etc. are difficult to study as during testing
(or checking) they destroyed, for example, a steel wire is stretched until it breaks and
breaking point is recorded to have a minimum tensile strength. Similarly, different electric
and electronic components are check and they are destroyed during testing, making
impossible to study the entire population as time, cost and destructive nature of different
items prohibit to study the entire population.
5. Qualified and expert staff: For enumeration purposes, highly qualified and expert staff is
required which is sometimes impossible. National and International research organizations,
agencies and staff are hired for enumeration purposive which is some time costly, need more
time (as rehearsal of activity is required), and some time it is not easy to recruiter or hire a
highly qualified staff.
6. Reliability: Using a scientific sampling technique the sampling error can be minimized and
the non-sampling error committed in the case of a sample survey is also minimum, because
qualified investigators are included.
Types/techniques/methods of sampling
There are lot of sampling techniques which are grouped into two categories as
1. Probability Sampling
1
The difference lies between the above two is whether the sample selection is based on
randomization or not. With randomization, every element gets equal chance to be picked up and
to be part of sample for study.
1. Probability Sampling
This Sampling technique uses randomization to make sure that every element of the population
gets an equal chance to be part of the selected sample. It’s alternatively known as random
sampling. Probability sampling can again be divided in five parts:
a) Simple Random Sampling: Every element has an equal chance of getting selected to be the
part sample. It is used when we don’t have any kind of prior information about the target
population. For example: Random selection of 20 students from class of 50 student. Each
student has equal chance of getting selected. Here probability of selection is 1/50
2
Simple Random Sampling
b) Stratified Sampling: This technique divides the elements of the population into small
subgroups (strata) based on the similarity in such a way that the elements within the group are
homogeneous and heterogeneous among the other subgroups formed. And then the elements
are randomly selected from each of these strata. We need to have prior information about the
population to create subgroups.
Stratified Sampling
c) Cluster Sampling: Our entire population is divided into clusters or sections and then the
clusters are randomly selected. All the elements of the cluster are used for sampling. Clusters
are identified using details such as age, sex, location etc.
3
Cluster sampling can be done in following ways:
Here first we randomly select clusters and then from those selected clusters we randomly select
elements for sampling
4
d) Systematic Sampling: Here the selection of elements is systematic and not random except the
first element. Elements of a sample are chosen at regular intervals of population. All the
elements are put together in a sequence first where each element has the equal chance of being
selected.
For a sample of size n, we divide our population of size N into subgroups of k elements.
We select our first element randomly from the first subgroup of k elements.
If we select n1= 3
n2 = n1+k = 3+4 = 7
n3 = n2+k = 7+4 = 11
Systematic Sampling
5
e) Multi-Stage Sampling: It is the combination of one or more methods described above.
Population is divided into multiple clusters and then these clusters are further divided and grouped
into various sub groups (strata) based on similarity. One or more clusters can be randomly
selected from each stratum. This process continues until the cluster can’t be divided anymore. For
example country can be divided into states, cities, urban and rural and all the areas with similar
characteristics can be merged together to form a strata.
Multi-Stage Sampling
2. Non-Probability Sampling
It does not rely on randomization. This technique is more reliant on the researcher’s ability to
select elements for a sample. Outcome of sampling might be biased and makes difficult for all the
elements of population to be part of the sample equally. This type of sampling is also known as
non-random sampling.
a) Convenience Sampling
b) Purposive Sampling
c) Quota Sampling
d) Referral /Snowball Sampling
a) Convenience Sampling: Here the samples are selected based on the availability. This method
is used when the availability of sample is rare and also costly. So based on the convenience
samples are selected. For example: Researchers prefer this during the initial stages of survey
research, as it’s quick and easy to deliver results.
b) Purposive Sampling: This is based on the intention or the purpose of study. Only those
elements will be selected from the population which suits the best for the purpose of our
study.
6
For Example: If we want to understand the thought process of the people who are interested in
pursuing master’s degree then the selection criteria would be “Are you interested for Masters
in..?” All the people who respond with a “No” will be excluded from our sample.
c) Quota Sampling: This type of sampling depends of some pre-set standard. It selects the
representative sample from the population. Proportion of characteristics/ trait in sample should
be same as population. Elements are selected until exact proportions of certain types of data is
obtained or sufficient data in different categories is collected. For example: If our population
has 45% females and 55% males then our sample should reflect the same percentage of males
and females.
d) Referral /Snowball Sampling: This technique is used in the situations where the population
is completely unknown and rare. Therefore, we will take the help from the first element which
we select for the population and ask him to recommend other elements who will fit the
description of the sample needed. So, this referral technique goes on, increasing the size of
population like a snowball.
For example: It’s used in situations of highly sensitive topics like HIV Aids where people will
not openly discuss and participate in surveys to share information about HIV Aids.
7
Errors in research design
Several potential sources of error can affect a research design. A good research design attempts
to control the various sources of error.
8
Nonsampling Error
Nonsampling errors can be attributed to sources other than sampling, and they may be random or
nonrandom. They result from a variety of reasons, including errors in problem definition,
approach, scales, questionnaire design, interviewing methods, and data preparation and analysis.
For example, the researcher designs a poor questionnaire, which contains several questions that
lead the respondents to give biased answers. Nonsampling errors consist
of nonresponse errors and response errors.
Nonresponse error arises when some of the respondents included in the sample do not respond.
The primary causes of nonresponse are refusals and not-at-homes.
Response error arises when respondents give inaccurate answers or their answers are
misrecorded or misanalyzed.
Errors made by the researcher include surrogate information, measurement, population
definition, sampling frame, and data analysis errors.
Surrogate information error may be defined as the variation between the information needed
for the marketing research problem and the information sought by the researcher. For example,
instead of obtaining information on consumer choice of a new brand (needed for the marketing
research problem), the researcher obtains information on consumer preferences since the choice
process cannot be easily observed.
Measurement error may be defined as the variation between the information sought and the
information generated by the measurement process employed by the researcher. While seeking to
measure consumer preferences, the researcher employs a scale that measures perceptions rather
than preferences.
Population definition error may be defined as the variation between the actual population
relevant to the problem at hand and the population as defined by the researcher. The problem of
appropriately defining the population may be far from trivial, as illustrated by the case of affluent
households.
Sampling frame error may be defined as the variation between the population defined by the
researcher and the population as implied by the sampling frame (list) used. For example, the
telephone directory used to generate a list of telephone numbers does not accurately represent the
population of potential consumers because of unlisted, disconnected, and new numbers in
service.
Data analysis error encompasses errors that occur while raw data from questionnaires are
transformed into research findings. For example, an inappropriate statistical procedure is used,
resulting in incorrect interpretation and findings. Response errors made by the interviewer
include respondent selection, questioning, recording, and cheating errors.
Respondent selection error occurs when interviewers select respondents other than those
specified by the sampling design or in a manner inconsistent with the sampling design. For
example, in a readership survey, a nonreader is selected for the interview but is classified as a
9
reader of the Wall Street Journal in the 15- to 19-years category in order to meet a difficult quota
requirement.
Questioning error denotes errors made in asking questions of the respondents or in not probing
when more information is needed. For example, while asking questions an interviewer does not
use the exact wording given in the questionnaire.
Recording error arises due to errors in hearing, interpreting, and recording the answers given by
the respondents. For example, a respondent indicates a neutral response (undecided), but the
interviewer misinterprets that to mean a positive response (would buy the new brand).
Cheating error arises when the interviewer fabricates answers to a part or all of the interview.
For example, an interviewer does not ask the sensitive questions related to respondent’s debt but
later fills in the answers based on personal assessment. Response errors made by the respondent
are comprised of inability and unwillingness
errors.
Inability error results from the respondent’s inability to provide accurate answers. Respondents
may provide inaccurate answers because of unfamiliarity, fatigue, boredom, faulty recall,
question format, question content, and other factors. For example, a respondent cannot recall the
brand of yogurt purchased four weeks ago.
Unwillingness error arises from the respondent’s unwillingness to provide accurate information.
Respondents may intentionally misreport their answers because of a desire to provide socially
acceptable answers, avoid embarrassment, or please the interviewer. For example, a respondent
intentionally misreports reading Time magazine in order to impress the interviewer.
10
11