PFPI Unit 2
PFPI Unit 2
UNIT : 2
Traditions of Inquiry
TOPICS
● Quantitative and qualitative approaches- overview, differences and convergences in
mixed methods
● Issues and techniques of sampling in quantitative and qualitative approaches
● Issues of quality and ethics in quantitative and qualitative approaches
● Role of reflexivity in knowledge generation
Quantitative research is outlined as a distinctive research strategy. In very broad terms, it was
described as entailing the collection of numerical data, as exhibiting a view of the relationship
between theory and research as deductive and a predilection for a natural science approach
(and of positivism in particular), and as having an objectivist conception of social reality.
Qualitative research is a research strategy that usually emphasizes words rather than
quantification in the collection and analysis of data. As a research strategy it is broadly
inductivist, constructionist, and interpretivist, but qualitative researchers do not always subscribe
to all three of these features.
Nature of the data data collection techniques Soft data (i.e., words,
that differ from hard data sentences, photos,
(in the form of numbers) for symbols) dictate qualitative
which quantitative research strategies
approaches are used.
Sample A small set of cases a researcher selects from a large pool and generalizes to the
population.
Sampling is a method used in empirical studies to select detailed cases and use the learned
information to understand a larger set of cases. It is primarily used in quantitative research to
create a representative sample that closely reproduces or represents features of interest in a
larger population, with most books emphasizing its use in quantitative research.
Probability sampling is a method used in quantitative research to create representative samples
by examining data in detail and using precise sampling procedures based on probability
mathematics, allowing for generalization of results to the entire population.
Quantitative studies aim to determine how many cases of a population fit into specific
categories. Probability samples are used in quantitative research due to their efficiency and
cost-effectiveness. A well-designed probability sample can save time and cost by delivering
nearly identical results. For example, if we study the 18 million people in the United States
diagnosed with diabetes, a well-designed sample of 1,800 can generalize the findings to all 18
million people, making it more efficient to study 1,800 people.
Probability samples can be highly accurate for large populations, often being more accurate
than trying to count every case. The U.S. government agreed that probability sampling would
produce more accurate data than the traditional census method. A careful probability sample of
30,000 has a tiny error rate, while trying to locate every person of 300,000,000 would result in
systematic errors. The traditional census was conducted for political reasons.
Sampling in qualitative studies differs from quantitative studies as it focuses on identifying
relevant categories in a few cases rather than a mathematically accurate reproduction of the
entire population. In qualitative research, the goal is to deepen understanding about a larger
process, relationship, or social scene by providing valuable information or new aspects. The
aspects accentuate, enhance, or enrich key features or situations, opening up new theoretical
insights, revealing distinctive aspects of people or social settings, or deepening understanding
of complex situations, events, or relationships. In qualitative research, the selection of people to
be studied is based on their relevance to the research topic rather than their representativeness.
This approach allows for a more nuanced understanding of the social world and its processes.
The distinction between quantitative and qualitative studies should not be overstated, as most
quantitative studies use probability samples, while qualitative studies use nonprobability
methods and nonrepresentative strategies.
Population The abstract idea of a large group of many cases from which a researcher draws a
sample and to which results from a sample are generalized.
Sampling Strategies
To avoid potential sampling mistakes, it's crucial to be meticulous and systematic in sampling.
To avoid the second mistake, a sampling strategy that matches the study's purpose and data is
essential. There are two types of sampling strategies: accurate sample representation and all
others. The first strategy is used in quantitative studies, while the latter is used in qualitative
studies.
Strategies When the Goal Is to Create a Representative Sample
A representative sample aims to create data that represents many cases that cannot be directly
examined. Two methods are used: probability sampling, the preferred method, and
nonprobability sampling. Probability sampling is considered the "gold standard" for
representative samples, based on over a century of reasoning and applied mathematics. It aims
to create an accurate representative sample with mathematically predictable errors. Non
Probability sampling is a simpler alternative, acceptable when probability sampling is
impossible, too costly, time-consuming, or impractical.
Nonprobability Sampling Techniques.
Probability samples are preferred for creating representative samples, but convenience and
quota samples are less demanding alternatives. Convenience sampling, also known as
accidental or haphazard sampling, is suitable for exploratory preliminary studies and qualitative
research studies, but often produces nonrepresentative samples, making it unsuitable for
accurate population representation. Quota samples are more suitable for more rigorous
studies.Convenience sampling can lead to nonrepresentative data, as it is easy, cheap, and
quick to obtain. It may also be due to ignorance about creating a good representative sample.
For example, television programs often use person-on-the-street interviews, which do not
represent everyone. Additionally, newspapers and websites may use questionnaires, websites,
or television programs to collect data. These methods may have entertainment value but can
yield highly misleading data that does not represent the entire population, even when a large
number of people respond. Therefore, it is crucial to choose samples that accurately represent
the population.
A nonrepresentative sample is one where only a small percentage of the population participates
in a study, such as 50,000 people in a city with a population of 1 million. This self-selected
sample is not representative as many people may not be interested in the issue or have limited
access to specific resources. Two key ideas to remember about representative samples are that
self-selection results in a nonrepresentative sample and that a large sample size alone is not
enough to make a representative sample.
Quota sampling is a nonprobability method for producing a quasi-representative sample. It
involves identifying relevant categories among the population to capture diversity, and
determining the "quota" for each category, thereby setting a fixed number of cases in various
sample categories at the start.
A 180-person sample from city XYZ is selected, including 25 males and 25 females under 30
years, 50 males and 50 females aged 30 to 60, and 15 males and 15 females over 60. Quota
sampling ensures diversity in the sample, unlike convenience sampling, which may have
everyone of the same age, gender, or background. This method is used in the Promises I Can
Keep study.
The study used quota sampling to conduct an opinion survey of the undergraduate student
body. Three quota categories were used: gender, class, and minority/majority group status. The
number of interviews was set in advance, with proportions based on university official records. A
student interviewer confirmed the person's gender, class, and minority/majority status. If the
person fit an unfilled quota, they were included in the sample, and the interviewer proceeded to
ask survey questions. If the person did not fit the quota, the interviewer thanked them without
asking questions.
Quota samples have three main weaknesses: they only capture a few aspects of population
diversity, such as gender and age, and may not accurately reflect the proportion of cases in
each category. Additionally, the fixed number of cases in each category may not accurately
reflect the total population. Convenience sampling selection is used for each quota category,
potentially selecting only "friendly"-acting people who want to be chosen.
Probability Sampling Techniques.
Probability sampling is the "gold standard" for creating representative samples, but its
specialized vocabulary may be challenging to understand until learned.
The Language of Probability Sampling
A sample is drawn from a large collection of cases/units, each representing a unit of analysis in
a population, such as a person, family, neighborhood, nation, organization, or document.
The population, or the "universe," is a large collection of elements and speakers. However, as of
2007, around 5% of U.S. households were "linguistically isolated." To draw a probability sample,
we start with a population, but it is an abstract concept. To study a specific population, we must
conceptualize and define it more precisely, similar to the measurement process. The target
population is the specific collection of elements we will study, similar to the conceptual definition
of the measurement process.
Populations are constantly changing, necessitating a temporal boundary. In a city, people are
dying, boarding or getting off airplanes, and driving across city boundaries. It's difficult to
determine who should be counted, especially if a long-time resident is on vacation. A
population, like those over 18 in Milwaukee, is an abstract idea that's difficult to pinpoint
concretely.
To conceptualize a population, we need to define it operationally, similar to measuring. This
involves creating an empirically concrete list that approximates all population elements, known
as our sampling frame.
Sampling frames, such as telephone directories, tax records, and driver's license records, are
essential for accurate population analysis. However, listing elements in a population can be
challenging due to a lack of an accurate list. A good sampling frame is crucial for accurate
sampling, as any mismatch between it and the conceptually defined population can lead to
errors. The most famous case in sampling history involved sampling frames.
The sampling ratio is the ratio of a sample size to the size of the target population. For example,
if a target population has 50,000 people and a sample has 150, the sampling ratio is 150/50,000
or 0.3 percent. For a target population of 500 and a sample of 100, the ratio is 100/500 or 20
percent. The number of elements in a sampling frame is the best estimate of the target
population size. Population parameters, such as smoking percentages or UFO beliefs, are
estimated using sample data, and the information used is called a statistic.
Random Sampling
Probability theory in applied mathematics uses random processes, which can mean
unpredictable, unusual, unexpected, or haphazard. In mathematics, random processes mean
each element has an equal probability of selection, and we can calculate the probability of
outcomes with precision for true random processes.
Random samples are the most accurate representation of the entire population and allow for the
calculation of the sampling error, which is the deviation between the sample data and an ideal
population parameter.
Probability samples use random selection processes, which require more precision, time, and
effort than nonrandom selection. The formal mathematical procedure specifies which person to
pick for the sample, making it difficult to locate. Random sampling is not thoughtless or
haphazard, as it may require multiple calls at different times and days to identify a specific
person.
Five Ways to Sample Randomly
1. Simple Random. Probability samples are modeled on a simple random sample, which
specifies the population and target population, identifies specific sampling elements,
creates an accurate sampling frame, and uses a true random process to select
elements. The challenge lies in locating the selected sampled element, which may
require revisiting or calling back five times to contact the selected household.
To select elements from a sampling frame, create a list of random numbers ranging from
1 to the highest number in the frame. For example, to sample 150 households from
15,000 households, we need a list of 150 random numbers, generated by a true random
process from 1 to 15,000.
Two main methods to obtain a list of random numbers are using a random-number table,
which is available in statistics and research methods books, and using computer
programs, which are often free and readily available, to generate lists of random
numbers.
Random sampling is a method of selecting an element from a sample frame, with the
aim of replacing it with another element after sampling. This method is called "random
sampling with replacement," which involves discarding or ignoring elements already
selected for the sample. In social science, simple random sampling is without
replacement. An example of this is sampling marbles from a jar, where the population
parameter is the percentage of red marbles in the jar.
To estimate the population parameter, a random process is used to select 100 marbles.
The sample size is 100, and the population parameter is the percentage of red versus
white marbles. The sample has 52 white and 48 red marbles, but the population
parameter is 50/50.
To understand the process of sampling, the sample is drawn from 130 different samples,
revealing a clear pattern. The most common mix of red and white marbles is 50/50, and
samples close to that split are more frequent. The population parameter appears to be
50 percent white and 50 percent red marbles.
Mathematical proofs and empirical tests show that the pattern found in Chart 1 always
appears. The sampling distribution is a distribution of different samples, revealing the
frequency of different sample outcomes from many separate random samples. This
pattern becomes clearer as more independent random samples are drawn from a
population.
The sampling distribution pattern indicates that the true population parameter, such as
the 50/50 split, is more common over many separate samples. Some samples may
deviate from this parameter, but they are less common. The sampling distribution always
looks like a normal or bell-shaped curve, which is theoretically important and used
throughout statistics. Standard statistical charts provide odds of getting a specific
number of marbles. The central limit theorem from mathematics states that as the
number of random samples increases, the pattern of samples and the population
parameter becomes increasingly predictable.
The central limit theorem is a mathematical formula that allows us to estimate the
probability of a sample being off from the population parameter. It is often used to verify
the central limit theorem, which is about drawing only one random sample. This theorem
allows us to calculate the probability that a particular sample is off from the population
parameter. However, random sampling does not guarantee perfect representation of the
population. Most random samples will be close to the population parameter most of the
time. The central limit theorem estimates the chance that a particular sample is
unrepresentative or deviates from the population parameter, allowing us to create
confidence intervals, which are ranges around a specific point used to estimate a
population parameter.
The statistics of random processes use a range to estimate the true population
parameter with a high level of confidence, typically around 95%. This range is
determined by the sampling distribution, which is the key idea that determines the
sampling error and confidence interval. For example, a sample may not provide a perfect
measure of the population parameter, but it can be 95 percent certain that the true
parameter is no more than 2% different from the sample. This allows for predictions of
specific ranges around the population parameter with a specific degree of confidence.
Systematic Sampling.
Systematic sampling is a simple random sampling method that uses a sampling interval
to create a quasirandom selection method. The interval, 1 in k, tells us how to select
elements from a sampling frame by skipping elements in the frame before selecting one
for the sample. For example, to sample 300 names from 900, we select every third name
from 900, resulting in a sampling interval of 3. Sampling intervals are easy to compute
using sample size and population size.
A simple random sample and a systematic sample usually yield the same results.
However, systematic sampling cannot be used when the elements in a sample are
organized in a cycle or pattern. For example, a sample frame organized as a list of
married couples with male first and female second results in an unrepresentative
sample. The systematic sample can be nonrepresentative and include only wives due to
the organization of the cases. In a sample with 20 males and 20 females, the simple
random sample yielded three males and seven females, while the systematic sample
yielded five males and five females. To test the accuracy of systematic sampling, a new
sample with different random numbers and a different random start is drawn.
Stratified Sampling
stratified sampling involves dividing a population into subpopulations based on
supplementary information, and then drawing a random sample from each
subpopulation. This method controls the relative size of each stratum, ensuring
representativeness and fixed proportions within a sample. However, not all necessary
information about strata is available. If stratum information is accurate, stratified
sampling produces more representative samples than simple random sampling. For
example, if a population is 51 percent female and 49 percent male, stratified sampling
ensures a 51 to 49 percent gender ratio, resulting in fewer errors representing the
population and smaller sampling errors.
Stratified sampling is used when a stratum of interest is a small percentage of a
population and random processes could miss it by chance. For example, a sample of
200 college students from 20,000 students is drawn to include 400 divorced women with
children under the age of 5. This ensures that the sample represents the population with
regard to important strata.
In special situations, the proportion of a stratum in a sample may differ from its true
proportion in the population. For example, if the population contains 0.5 percent Aleuts,
the researcher oversamples so that Aleuts make up 10% of the sample. This type of
disproportionate stratified sample cannot be generalized directly to the population
without special adjustments.
In some situations, the proportion of a stratum or subgroup may differ from its true
proportion in the population. For example, the 1987 General Social Survey oversampled
African Americans, resulting in a larger sample that is more likely to reflect the full
diversity of the African American subpopulation.
Cluster Sampling
Cluster sampling addresses the challenges of a dispersed population and the high cost
of reaching a sampled element. Instead of using a single sampling frame, a sampling
design involves multiple stages and clusters. A cluster is a unit that contains final
sampling elements but can be treated temporarily as a sampling element itself. The
process involves randomly sampling clusters and elements from selected clusters. This
approach has practical advantages, such as creating a good sampling frame for clusters
even if it's impossible to create one for sampling elements. Additionally, clusters provide
physical proximity to each other, reducing the cost of locating or reaching each element.
Cluster sampling is a method of obtaining a sample of a population by drawing several
samples in stages. In a three-stage sample, stage 1 involves random sampling of large
clusters, stage 2 involves random sampling of small clusters within each selected large
cluster, and the last stage is a sampling of elements from within the sampled small
clusters.
Cluster sampling is usually less expensive than simple random sampling but is less
accurate due to the introduction of sampling errors at each stage. To determine the best
design for a sample, it is essential to consider the number of clusters and the number of
elements within clusters. A design with more clusters is better because elements within
clusters tend to be similar to each other. If few clusters are chosen, many similar
elements could be selected, which would be less representative of the total population.
When sampling from a large geographical area and must travel to each element, cluster
sampling significantly reduces travel costs. However, there is a trade-off between
accuracy and cost. For example, Alan, Ricardo, and Barbara each personally interview a
sample of 1,500 students representing the population of all college students in North
America. Alan's sample is highly accurate, while Ricardo's sample is slightly less
accurate for one-third of the cost.
Within-Household Sampling.
Cluster sampling involves selecting individuals within a household or similar unit to avoid
potential bias. The first person to answer a phone, door, or mail should only be chosen if
their answer is a truly random process. This is rarely the case, as certain people may not
be at home or in some households, one person may be more likely to answer.
Researchers use within-household sampling to ensure that after a random household is
chosen, the individual within the household is also selected randomly. The most common
method is using a selection table specifying the person to pick, such as the oldest male
or youngest female, after determining the size and composition of the household.
Probability Proportionate to Size (PPS).
Cluster sampling can be either proportionate or unweighted, depending on the size of
each cluster. In proportionate sampling, each cluster has an equal chance of being
selected, but in unweighted sampling, each group has different sizes. For example,
Barbara's method of drawing a random sample of 300 colleges from a list of 3,000
colleges gave each college an equal chance of being selected. However, each student
had an equal chance to be selected, as colleges have different numbers of students.
Barbara's method violated the principle of random sampling, as each element had an
equal chance to be selected into the sample. If she uses probability proportionate to size
(PPS) and samples correctly, each final sampling element or student will have an equal
probability of being selected. This is achieved by adjusting the chances of selecting a
college in the first stage of sampling. Large colleges with more students have a greater
chance of being selected, while small colleges have a smaller chance.
In summary, cluster sampling can be used in various ways, but proportionate sampling is
more effective in ensuring equal chances for each element.
Random-Digit Dialing.
RDD is a sampling technique used in research projects where the general public is
interviewed by telephone. It does not use the published telephone directory as the
sampling frame, which misses three kinds of people: those without telephones, those
who have recently moved, and those with unlisted numbers. In advanced industrialized
nations, 95 percent of people have a telephone. Unlisted numbers can be used by those
who want to avoid collection agencies, are very wealthy, or want privacy. In some urban
areas in the United States, the percentage of unlisted numbers is 50 percent.
RDD works in the United States by identifying active area codes and exchanges and
randomly selecting four-digit numbers. However, the researcher can select any number
in an exchange, which means that some selected numbers are out of service,
disconnected, pay phones, or numbers for businesses. This means spending much time
reaching numbers that are disconnected, are for businesses, and so forth. Research
organizations often use computers to select random digits and dial the phone
automatically, but a human must still listen and find out whether the number is a working
residential one.
The sampling element in RDD is the phone number, not the person or the household.
Several families or individuals can share the same phone number, and in other
situations, each person may have a separate phone number. After a working residential
phone is reached, a second stage of sampling, within household sampling, is necessary
to select the person to be interviewed.
Decision Regarding Sample Size
The size of a sample for social research depends on various factors, including
population characteristics, data analysis type, and confidence in sample accuracy. A
large sample without random sampling or with a poor sampling frame creates a less
representative sample than a smaller one with careful random sampling and an excellent
sampling frame. Two methods to determine sample size are to make assumptions about
the population and use statistical equations about random sampling processes.
A large sample size alone does not guarantee a representative sample; it requires
careful random sampling and an excellent sampling frame. However, samples from
homogeneous populations with simple data analysis of one or a few variables can be
equally effective when they are smaller.
A rule of thumb is another method to decide a sample size, which is based on past
experience with samples that have met the requirements of the statistical method. A
major principle of sample size is that the smaller the population, the larger the sampling
ratio has to be for a sample that has a high probability of yielding the same results as the
entire population. Larger populations permit smaller sampling ratios for equally good
samples because as the population size grows, the returns in accuracy for sample size
decrease.
In practical terms, for small populations (under 500), a large sampling ratio (about 30
percent) or 150 people is needed, while for large populations (over 150,000) we can
obtain equally good accuracy with a smaller sampling ratio (1 percent). For very large
populations (more than 10 million), we can achieve accuracy with tiny sampling ratios
(0.025 percent) or samples of about 2,500.
A related principle is that for small samples, a small increase in sample size produces a
big gain in accuracy. Equal increases in sample size produce an increase in accuracy
more for small than for large samples.
Plans for data analysis also influence the required sample size. For example, if we want
to analyze many small subgroups within the population, we will need a large sample
because the subgroup is a small proportion (e.g., 10 percent) of the entire sample. A rule
of thumb is to have about 50 cases for each subgroup we wish to analyze.
Making Inferences
Probability sampling is used to make inferences from the sample to the population, a
subfield of statistical data analysis called inferential statistics. This involves directly
observing data in the sample but not being interested in the sample alone. A gap exists
between what we concretely have (variables measured in sample data) and what is of
real interest (population parameters). The logic of measurement can be expressed as a
gap between abstract constructs and concrete indicators. Measures of concrete,
observable data are approximations for abstract constructs, which are used to estimate
what is of real interest (constructs and causal laws). Conceptualization and
operationalization bridge the gap in measurement, just as the use of sampling frames,
the sampling process, and inference bridge the gap in sampling.
In order to integrate the logic of sampling with the logic of measurement, we directly
observe measures of constructs and empirical relationships in samples. We infer or
generalize from what we observe empirically in samples to the abstract causal laws and
parameters in the population. There is an analogy between the logic of sampling and the
logic of measurement for validity. In measurement, we want valid indicators of
constructs, while in sampling, we want samples that have little sampling error. A good
sample has little sampling error and permits estimates that deviate little from population
parameters.
Sampling error is related to confidence intervals. If two samples are identical except one
is much larger, the larger one will have a smaller sampling error and narrower
confidence intervals. Conversely, if two samples are identical except the cases in one
are more similar to each other, the one with greater homogeneity will have a smaller
sampling error and narrower confidence intervals.
Strategies When the Goal Differs from Creating a Representative Sample
Qualitative research often doesn't require a representative sample from a large number
of cases, preferring nonprobability samples. These samples don't require a
predetermined sample size and require limited knowledge about the larger population.
Nonprobability sampling gradually selects cases based on the specific content of a case,
unlike probability sampling that requires a preplanned mathematical approach. Table 4
displays various nonprobability sampling techniques.
Purposive or Judgmental Sampling
Purposive sampling, also known as judgmental sampling, is a valuable method for
special situations in exploratory or field research. It uses the expert's judgment to select
cases or select cases with specific purposes in mind. Purposive sampling is not suitable
for representative samples or picking "average" or "typical" cases. It is appropriate for
selecting unique, informative cases, such as content analysis for cultural themes in
magazines. In the study Promises I Can Keep, eight neighborhoods were selected using
purposive sampling. It is often used to select members of difficult-to-reach, specialized
populations, such as prostitutes, as it is impossible to list all prostitutes and sample
randomly. Researchers use local knowledge and local experts to locate potential
prostitutes for inclusion in the research project. Purposive sampling is also used to
identify specific types of cases for in-depth investigation to gain a deeper understanding
of types.
Snowball Sampling
Interconnected networks of people or organizations can be diverse, including scientists,
elites, organized crime families, board members of major banks and corporations, or
college students with sexual relations. Each person or unit is connected through direct or
indirect linkages, indicating that most people are within an interconnected web of
linkages. Researchers can represent such networks using a sociogram, a diagram of
circles connected with lines representing each person or case and the lines representing
friendship or other linkages.
The process stops when no new names are given, indicating a closed network, or the
network is too large to be studied. The sample includes those named by at least one
other person in the network as being a close friend.