AE 9 AY 2021-2022 Module 2 (Complete)
AE 9 AY 2021-2022 Module 2 (Complete)
AE 9
LEARNER’S
MODULE
Student’s Name:
CHAPTER 3
SAMPLING and
SAMPLING DISTRIBUTIONS
Objectives
SAMPLING
In sample surveys, there are mainly two methods of sampling: random sampling and nonrandom sampling.
Suppose we have a list of 100 students and we want to select 10 of them. If we write the names of all 100
students on pieces of paper, put them in a hat, mix them, and then draw 10 names, the result will be a random
sample of 10 students. However, if we arrange the names of these 100 students alphabetically and pick the
first 10 names, it will be a nonrandom sample because the students who are not among the first 10 have no
chance of being selected in the sample.
A random sample is usually a representative sample. The random sample from the population should be
selected such that it is a representative of the population, i.e. the sample has the same characteristics as the
population. Note that for a random sample, each member of the population may or may not have the same
chance of being included in the sample.
Two types of nonrandom sampling are a convenience sampling and a judgment sampling.
In convenience sampling, the most accessible members of the population are selected to obtain
the results quickly. For example, an opinion poll may be conducted in a few hours by collecting
information from certain shoppers at a single shopping mall.
In judgment sampling, the members are selected from the population based on the judgment
and prior knowledge of an expert. Although such a sample may happen to be a representative
sample, the chances of it being so are small. If the population is large, it is not an easy task to
select a representative sample based on judgment.
Sample size is the number of elements in the sample, denoted by n. Meanwhile, population size,
denoted by N, is the number of elements in the population.
If the target population is too large, simple random sampling may not be practical to use. For
example, we need to get a random sample of 150 households from a list of 45,000 households.
Listing all 45,000 names would consume very large amount of time. Here, we can use systematic
random sampling.
The procedure to select a systematic random sample is as follows. In the example just mentioned,
we would arrange all 45,000 households alphabetically (or based on some other characteristic).
Since the sample size should equal 150, the ratio of population to sample size is 45,000/150 = 300.
Using this ratio, we randomly select one household from the first 300 households in the arranged
list using either method. Suppose by using either of the methods, we select the 210th household.
We then select the 210th household from every 300 households in the list. In other words, our
sample includes the households with numbers 210, 510, 810, 1110, 1410, 1710, and so on.
We will now have three subpopulations, which are usually called strata. We then select one sample
from each subpopulation or stratum. The collection of all three samples selected from the three
strata gives the required sample, called the stratified random sample.
For example, suppose we are to conduct a survey of households in Aklan in Western Visayas. First,
we divide the whole province into its 17 municipalities which are called clusters or primary units.
We make sure that all clusters are similar and, hence, representative of the population. We then
select at random, say, 5 clusters from 17.
Next, we randomly select certain households
from each of these 5 clusters and conduct a
survey of these selected households. This is
called cluster sampling.
https://www.youtube.com/watch?v=9PaR1TsvnJs https://www.youtube.com/watch?v=lJqV1vrxtHc
What Are The Types Of Sampling Techniques In Sampling: Stratified random sampling
Statistics - Random, Stratified, Cluster, Systematic (With Computations)
We know that the value of a population parameter is always constant. For example, for any population data set, there is
only one value of the population mean, 𝜇. However, we cannot say the same about the sample mean, 𝑥̅ . We would
expect different samples of the same size drawn from the same population to yield different values of the sample mean,
𝑥̅ . The value of the sample mean for any one sample will depend on the elements included in that sample. Consequently,
the sample mean, 𝒙
̅, is a random variable. Therefore, like other random variables, the sample mean possesses a
probability distribution, which is more commonly called the sampling distribution of 𝒙
̅. Other sample statistics, such
as the proportion and standard deviation also possess sampling distributions.
Illustrative Example:
Consider the population of midterm scores of five students: 70, 78, 80, 80, 95. Consider all possible samples of three
scores each that can be selected, without replacement, from that population. The total number of possible samples is
5! 5×4×3×2×1
5C3 = 3!(5−3)! = (3×2×1)(2×1)= 10.
Suppose we assign the letters A, B, C, D, and E to the scores of the five students, so that
ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE
These 10 samples and their respective means are listed below. The
mean of each sample is obtained by dividing the sum of the three
scores included in that sample by 3. For instance, the mean of the first
sample ABC is (70 + 78 + 80)/3 = 76.
These relative frequencies are used as probabilities and listed in the table below. This table gives the
sampling distribution of 𝑥̅ .
P(𝒙
̅ = 81.67) = 0.20.
The mean and standard deviation of 𝑥̅ are, respectively, the mean and standard deviation of the means of
all samples of the same size selected from a population. Therefore, the mean and standard deviation of 𝒙
̅
are the mean and standard deviation of the sampling distribution of 𝒙
̅, respectively. The standard
deviation of 𝒙
̅ is also called the standard error of 𝒙
̅.
If we calculate the mean and standard deviation of the 10 values of 𝑥̅ listed in the illustrative example above,
we obtain the mean and standard deviation of the sampling distribution of 𝑥̅ .
(76.00−80.60)2 +(76.00−80.60)2+⋯+(85.00−80.60)2
𝜎𝑥̅ = √ = 3.30
10
The shape of the sampling distribution of 𝑥̅ relates to the following two cases:
1. The population from which samples are drawn has a normal distribution.
2. The population from which samples are drawn does not have a normal distribution.
When the population from which samples are drawn is normally distributed with its mean equal to 𝜇 and
standard deviation equal to 𝜎, then:
1. The mean of 𝑥̅ , which is 𝜇𝑥̅ , is equal to the mean population, 𝜇. This is why 𝝁𝒙̅ is an unbiased
estimator of 𝝁.
𝜎 𝑛
2. The standard deviation of 𝑥̅ , which is 𝜎𝑥̅ , is equal to , assuming 𝑁 ≤ 0.05.
√𝑛
3. The shape of the sampling distribution of 𝑥̅ is normal, regardless of the sample size n.
1. The spread of the sampling distribution of 𝑥̅ is smaller than the spread of the corresponding population
distribution. In other words, 𝜎𝑥̅ < 𝜎. This is obvious from the formula for 𝜎𝑥̅ . When n is greater than 1,
𝜎
which is usually true, the denominator in is greater than 1. Hence, 𝜎𝑥̅ is smaller than 𝜎.
√𝑛
2. The standard deviation of the sampling distribution of 𝑥̅ decreases as the sample size increases. This
𝜎
feature of the sampling distribution of 𝑥̅ is also obvious from the formula 𝜎𝑥̅ = . This is why 𝝈𝒙̅ is
√𝑛
Solution:
Let 𝜇 and 𝜎 be the mean and standard deviation of the 2014 earnings of all American internal medicine
physicians, and 𝜇𝑥̅ and 𝜎𝑥̅ be the mean and standard deviation of the sampling distribution of 𝑥̅ ,
respectively. Then, from the given information,
𝜇 = $196,000 and 𝜎 = $20,000
Because the 2014 earnings of all American internal medicine physicians are approximately normally
distributed, the sampling distribution of 𝑥̅ for samples of 16 physicians is also approximately normally
distributed.
The figure below shows the population distribution and the sampling distribution of 𝑥̅ n = 16. Since 𝜎𝑥̅ < 𝜎,
the population distribution has a wider spread but smaller height than the sampling distribution of 𝑥̅ .
(b) For n = 50, the mean and standard deviation of 𝑥̅ are, respectively,
𝜇𝑥̅ = 𝜇 = $𝟏𝟗𝟔, 𝟎𝟎𝟎 and
𝜎 $20,000
𝜎𝑥̅ = = = $𝟐, 𝟖𝟐𝟖. 𝟒𝟑
√𝑛 √50
Again, since the population distribution is approximately normally distributed, the sampling distribution of 𝑥̅
for samples of 50 physicians is also approximately normally distributed.
(c) For n = 1,000, the mean and standard deviation of 𝑥̅ are, respectively,
𝜇𝑥̅ = 𝜇 = $𝟏𝟗𝟔, 𝟎𝟎𝟎 and
𝜎 $20,000
𝜎𝑥̅ = = = $𝟔𝟑𝟐. 𝟓𝟔
√𝑛 √1000
Again, since the population distribution is approximately normally distributed, the sampling distribution of 𝑥̅
for samples of 1,000 physicians is also approximately normally distributed.
The figure below shows the population distribution and the sampling distribution of 𝑥̅ for n = 1,000. Thus,
regardless of the sample size, the sampling distribution of is normal when the population from which the
samples are drawn is normally distributed.
From the preceding calculations, we observe that the mean of the sampling distribution of 𝑥̅ is always equal
to the mean of the population ($196,000) whatever the size of the sample. However, the value of the standard
deviation of 𝑥̅ decreases from $5,000.00 to $2,828.43 and then to $632.46 as the sample size increases
from 16 to 50 and then to 1,000.
Most of the time the population from which the samples are selected is not normally distributed. In such
cases, the shape of the sampling distribution of 𝑥̅ is inferred from the central limit theorem.
According to the central limit theorem, for a large sample size, the sampling distribution of 𝑥̅ is
approximately normal, regardless of the shape of the population distribution. The mean and standard
deviation of the sampling distribution of 𝑥̅ are, respectively,
𝜎
𝜇𝑥̅ = 𝜇 and 𝜎𝑥̅ =
√𝑛
𝜎
In case of the mean, the sample size is usually considered to be large if n ≥ 30. Also, note that 𝜎𝑥̅ = is
√𝑛
true for n/N ≤ 0.05. We will not focus on cases where n/N > 0.05 since these are unlikely conditions.
If the population distribution is not normally distributed, the sampling distribution of 𝑥̅ with n < 30 is also not
normal. However, the sampling distribution of 𝑥̅ with n ≥ 30 is (approximately) normal because of the central
limit theorem.
Solution:
Although the population distribution of rents paid by all tenants in not normal, the sample sizes for (a) and
(b) are both large (n ≥ 30). Hence, the central limit theorem can be applied to infer the shape of the
sampling distribution of 𝑥̅ . From the given information,
The figure below shows the population distribution (a) and the sampling distribution of 𝑥̅ for n = 100 (b).
(b) Applying the central limit theorem, the sampling distribution of 𝑥̅ with n = 100 is approximately normal. The
mean and standard deviation are computed as:
𝜇𝑥̅ = 𝜇 = $𝟏, 𝟓𝟓𝟎 and
𝜎 $225
𝜎𝑥̅ = = = $𝟐𝟐. 𝟓𝟎
√𝑛 √100
The figure below shows the population distribution (a) and the sampling distribution of 𝑥̅ for n = 100 (b).
The concept of proportion is the same as the concept of relative frequency and the concept of probability of
success in a binomial experiment. The relative frequency of a category or class gives the proportion of the
sample or population that belongs to that category or class. Similarly, the probability of success in a binomial
experiment represents the proportion of the sample or population that possesses a given characteristic.
The population proportion, denoted by p, is obtained by taking the ratio of the number of elements in a
population with a specific characteristic to the total number of elements in the population. The sample
proportion, denoted by 𝒑
̂ (pronounced p hat), gives a similar ratio for a sample.
𝑿 𝒙
𝒑= and ̂=
𝒑
𝑵 𝒏
where
Solution:
𝑿 563,282
𝒑= = = 𝟎. 𝟕𝟏
𝑵 789,654
Now, a sample of 240 families is taken from this city, and 158 of them are home-owners. So,
Sampling Distribution of 𝒑
̂
Just like the sample mean 𝑥̅ , the sample proportion 𝑝̂ is a random variable. In other words, the population
proportion p is a contant as it assumes one and only one value. However, the sample proportion 𝑝̂ can
assume one of a large number of possible values depending on which sample is selected. Hence, 𝑝̂ is a
random variable and it possesses a probability distribution, which is called its sampling distribution.
Illustrative Example:
5!
Total number of samples = 5C3 = = 10
3!(5−3)!
The table below lists these 10 possible samples and the proportion of employees who know statistics for
each of those samples.
We now prepare the frequency and relative frequency distributions of 𝑝̂ . The relative frequencies are used
are probilities listed below. The last table gives the sampling distribution of 𝑝̂ .
1. The mean of 𝑝̂ (denoted by 𝜇𝑝̂ ), which is also the mean of the sampling distribution of 𝑝̂ , is always equal
2. The standard deviation of 𝑝̂ (denoted by 𝜎𝑝̂ ), which is also the standard deviation of the sampling
𝑝(1−𝑝) 𝑛 𝒑(𝟏−𝒑)
distribution of 𝑝̂ , is equal to √ , assuming ≤ 0.05. That is, 𝝈𝒑̂ = √ . Note that as n
𝑛 𝑁 𝒏
3. Applying the central limit theorem, the shape of the sampling distribution of 𝑝̂ is approximately normal
for a sufficiently large sample size, that is, np > 5 and n(1-p) > 5.
Solution:
(a) The sample proportion is the number of orders that are shipped within 12 hours (x) divided by
the number of orders in the sample (n):
𝑥 102
𝑝̂ = = = 𝟎. 𝟖𝟒
𝑛 121
Hence, 84% of the orders in the sample were shipped within 12 hours.
𝑝(1−𝑝)
(b) Given with n = 121 and p = 0.90, the standard deviation of 𝑝̂ is computed as √ 𝑛
.
(c) To make an inference about the shape of the sampling distribution of 𝑝̂ , we can check if the central
limit theorem is applicable to this problem. The values of np and n(1-p) are
𝑛𝑝 = 121(0.90) = 109 𝑎𝑛𝑑 𝑛(1 − 𝑝) = 121(1 − 0.90) = 12
Since np and n(1-p) are both greater than 5, by the central limit theorem, we can infer that the
sampling distribution of 𝑝̂ is approximately normal with a mean of 0.84 and a standard deviation of
0.0273.
0.0273
0.84
In this section, we will learn how to select a random sample from a given population data using
MS Excel. This involves assigning a random number unique for every observation in the data.
1. Open the workbook where the data is embedded. If the data is not embedded in an
existing workbook, the data needs to be entered in a workbook.
2. Once the data is ready, create a new column next to the last column. Assign a column
name on Row 1.
3. Click Row 2 of the new column. Enter the formula =RAND(). A random number between
0 and 1 will appear on that cell.
4. Click the fill handle in the cell in Step #3. Alternatively, you can copy that cell in Step
#3 and paste it until the last row (in the same column) of the data.
5. To avoid refreshing of the random numbers, copy all cells in the last column and paste it
as value in the same column. The values in the cells of the last column should be numbers
without the formula =RAND().
6. Order the entire data in terms of the last column in ascending order (from smallest to
largest).
7. Create another column on the right of the last column. Assign a column name on Row 1.
8. Apply a filter on the column in Step #7 depending on the number of observations we want
for our sample. For example, if we want to get a sample of size 500 from a population of
size 4,000, use the Number Filter on the last column and click “Less than or equal to…”.
A dialog box will appear and there you have to enter next to “less than or equal to” the
number 500 for our sample size. Click OK.
9. We now have our sample (e.g. size of 500) but other observations not included are only
hidden. In order to retain only those observations in the sample, copy the filtered data
and paste it on a separate worksheet or workbook. If you have large population data, it
is suggested to paste it in a separate workbook to avoid lag on your software.
10. You can delete the last two columns since it not needed in your analysis. And now you
have your sample!
Follow the demonstration below. Please download the data set and perform random sampling
in MS Excel following these steps. This will serve as your MS Excel output 2.
The workbook Loan Borrower Data (2015, 5-year term).xlsx contains a 2015 financial
institution data of borrowers of a particular loan payable for 60 months. This consists of
137,650 borrowers identifiable using the account number (on Column 1, account_number).
Other columns include the demographic profile of the borrowers (from Columns B to F), and
also their loan details and performance (from Columns G to Q). Our objective to obtain a
random sample of 5,000 borrowers out of this population data.
2. Since our data is ready, we create a new column next to the column installment
(Column Q). In cell R1, we assign random_number as the name of this column.
A random number 0.238177426, which is between 0 and 1, is assigned in the first row.
4. Click the fill handle in cell R2. Alternatively, you can copy cell R2 and paste it until the
last cell for that column (cell R137650).
At this point, we are done creating two additional columns in our data: random_number
in column R (which consists of refreshing random numbers) and random_number2 in
column S (which contains fixed random numbers). We are now ready for the selection part.
In the Sort & Filter dropdown, click FILTER. Arrow button should appear on the right
side of each of the column headers.
7. We will now rank these random numbers in a new column. In Column T, we will assign
here the column rank. Then, we just use auto-fill to generate the ranks. We write 1 in cell
T2 and 2 in cell T3, highlight both cells, and then click the fill handle to create an auto-fill
of ranks in that column.
8. In the column rank, we apply a filter to get the 5,000 borrowers we want in our sample.
To have a filter button on the rank column header, unclick the Filter option in Sort &
Filter, and click it again. Next, click the filter button in the rank column header and click
NUMBER FILTERS. Further, click “Less Than Or Equal To…”
Books:
Mann, P.S. (2016). Introductory Statistics (9th ed.). John Wiley & Sons, Inc.
Mendenhall III, W., Beaver, R.J., & Beaver, B.M. (2020). Introduction to Probability and Statistics: Metric
Version (15th ed.). Cengage Learning, Inc.
Online References:
https://psa.gov.ph/statistics/quickstat/national-quickstat/all/*
https://www.rappler.com/nation/octa-research-filipinos-covid-19-vaccine-willingness-february-2021
https://www.questionpro.com/blog/probability-sampling/
https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Book%3A_Introductory_Statistics_(Shafer_a
nd_Zhang)/06%3A_Sampling_Distributions/6.03%3A_The_Sample_Proportion
𝟑𝟓
Name: _________________________________________________ Score: ________________
INSTRUCTION: Answer the following items on your answer sheet. Show all necessary computations.
1. The GPAs of all 5,540 students enrolled at a university have an approximate normal
distribution with a mean of 3.02 and a standard deviation of 0.29. Let 𝑥̅ be the mean GPA of
a random sample of 60 students selected from this university.
1.a. Calculate the mean and standard deviation of 𝑥̅ , and comment on the shape of its
sampling distribution. (10pts)
1.b. Compare the shape of the sampling distribution of 𝑥̅ with a sample size of 60 and that
with a sample size of 25. (3pts)
2. According to the National Association of Colleges and Employers Spring 2015 Salary Survey,
the average starting salary for college graduates in 2014 was $48,127. Suppose that the
mean starting salary of all college graduates in 2014 was $48,127 with a standard deviation
of $9,200, and that this distribution is strongly skewed to the right. Let 𝑥̅ be the mean starting
salary of 100 randomly selected college graduates in 2014.
2.a. Calculate the mean and standard deviation of 𝑥̅ . (6pts)
2.b. Describe the shape of the sampling distribution of 𝑥̅ . (3pts)
2.c. If the sample size in getting 𝑥̅ is 25, describe the shape of the sampling distribution of 𝑥̅ .
(3pts)
3. According to a Gallup poll conducted January 5–8, 2014, 67% of American adults were
dissatisfied with the way income and wealth are distributed in America. Assume that this
percentage is true for the current population of American adults. Let 𝑝̂ be the proportion in
a random sample of 350 American adults who hold the above opinion. Calculate the mean
and standard deviation of the sampling distribution of 𝑝̂ and describe its shape. (10pts)