0% found this document useful (0 votes)

112 views32 pages

AE 9 AY 2021-2022 Module 2 (Complete)

This topic, statistical analysis provides us knowledge about how statistics works in our daily life, by gathering data.

Uploaded by

Mae Ann Raquin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views32 pages

AE 9 AY 2021-2022 Module 2 (Complete)

This topic, statistical analysis provides us knowledge about how statistics works in our daily life, by gathering data.

Uploaded by

Mae Ann Raquin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

AKLAN STATE UNIVERSITY

AE 9

LEARNER’S
MODULE

August KASPAROV I. REPEDRO

2021 INSTRUCTOR

Student’s Name:
CHAPTER 3
SAMPLING and
SAMPLING DISTRIBUTIONS

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 0

TABLE OF CONTENTS

CHAPTER 3: Sampling and Sampling Distributions

1. Selecting a Sample
2. Introduction to Sampling Distributions
3. Sampling Distribution of the Mean
4. Sampling Distribution of the Proportion
5. Sampling in MS Excel

Objectives

Upon accomplishing this chapter, the students must be able to:

1. Explain the importance of sampling.
2. Distinguish the different sampling methods.

3. Explain the concept of sampling distribution.

4. Conduct sampling techniques in a data using MS Excel.
5. Compute probabilities related to the sample mean and proportion using MS Excel.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 1

CH 3 Sampling and
Sampling Distributions
As introduced earlier, majority of statistics involves drawing inferences and making decisions about populations
based on results obtained from a subset of that population or a sample. Understanding sampling is crucial for
beginners in statistics.

SAMPLING

In sample surveys, there are mainly two methods of sampling: random sampling and nonrandom sampling.

RANDOM and NONRANDOM SAMPLING

Random sampling is a method of sampling in Nonrandom sampling is a method of sampling

which each member of the population has some in which some member of the population may not
chance of being selected in the sample. have any chance of being selected in the sample.

Suppose we have a list of 100 students and we want to select 10 of them. If we write the names of all 100
students on pieces of paper, put them in a hat, mix them, and then draw 10 names, the result will be a random
sample of 10 students. However, if we arrange the names of these 100 students alphabetically and pick the
first 10 names, it will be a nonrandom sample because the students who are not among the first 10 have no
chance of being selected in the sample.
A random sample is usually a representative sample. The random sample from the population should be
selected such that it is a representative of the population, i.e. the sample has the same characteristics as the
population. Note that for a random sample, each member of the population may or may not have the same
chance of being included in the sample.
Two types of nonrandom sampling are a convenience sampling and a judgment sampling.
 In convenience sampling, the most accessible members of the population are selected to obtain
the results quickly. For example, an opinion poll may be conducted in a few hours by collecting
information from certain shoppers at a single shopping mall.

 In judgment sampling, the members are selected from the population based on the judgment
and prior knowledge of an expert. Although such a sample may happen to be a representative
sample, the chances of it being so are small. If the population is large, it is not an easy task to
select a representative sample based on judgment.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 2

RANDOM SAMPLING TECHNIQUES
In selecting a random sample, there are different techniques
that can be used. Random sampling techniques are used to
obtain a random sample that represents the target population.
This is because all members of the population have a chance
to be included in the sample. Here, we will be discussing four
commonly used techniques.

1. Simple Random Sampling

Simple random sampling is a sampling technique in which any particular sample of a

specific sample size has the same chance of being selected as any other sample of the
same size.

Sample size is the number of elements in the sample, denoted by n. Meanwhile, population size,
denoted by N, is the number of elements in the population.

There are several techniques under

simple random sampling. One of the
most common simple random
sampling is through lottery/drawing.

Lottery or fishbowl sampling. For

example, if we need to select 5
students from a class of 50 (target
population), we write each of the 50
names on a separate piece of paper. Then, we place all 50 names in a bowl and mix them
thoroughly. Next, we draw 1 name randomly from the bowl. We repeat this experiment four more
times. The 5 drawn names make up a simple random sample with a sample size of 5.
2. Systematic Random Sampling

If the target population is too large, simple random sampling may not be practical to use. For
example, we need to get a random sample of 150 households from a list of 45,000 households.
Listing all 45,000 names would consume very large amount of time. Here, we can use systematic
random sampling.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 3

Systematic random sampling is a
sampling technique in which the
elements of the sample are taken from
every kth element in the population
arranged alphabetically or by other
characteristic. Here, k = 𝑁 𝑛.

The procedure to select a systematic random sample is as follows. In the example just mentioned,
we would arrange all 45,000 households alphabetically (or based on some other characteristic).
Since the sample size should equal 150, the ratio of population to sample size is 45,000/150 = 300.
Using this ratio, we randomly select one household from the first 300 households in the arranged
list using either method. Suppose by using either of the methods, we select the 210th household.
We then select the 210th household from every 300 households in the list. In other words, our
sample includes the households with numbers 210, 510, 810, 1110, 1410, 1710, and so on.

3. Stratified Random Sampling

Suppose we need to select a sample from the
population of a city, and we want households with
different income levels to be proportionately
represented in the sample. In this case, instead of
selecting a simple random sample or a systematic
random sample, we may prefer to apply a different
technique. First, we divide the whole population into
different groups based on income levels. For example,
we may form three groups of low-, medium-, and high-
income households.

We will now have three subpopulations, which are usually called strata. We then select one sample
from each subpopulation or stratum. The collection of all three samples selected from the three
strata gives the required sample, called the stratified random sample.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 4

Usually, the sizes of the samples selected from different strata are proportionate to the sizes of the
subpopulations in these strata. Note that the elements of each stratum are identical with regard to
the possession of a characteristic.

Note that stratified random sampling has two types:

Stratified random sampling is a sampling technique in which the entire population is

divided into smaller groups (called strata; stratum in singular) that are not overlapping
and represent the entire population.

4. Cluster Random Sampling

Sometimes the target population is scattered over a
wide geographical area. Consequently, if a simple
random sample is selected, it may be costly to contact
each member of the sample. In such a case, we divide
the population into different geographical groups or
clusters and, as a first step, select a random sample
of certain clusters from all clusters. We then take a
random sample of certain elements from each
selected cluster.

For example, suppose we are to conduct a survey of households in Aklan in Western Visayas. First,
we divide the whole province into its 17 municipalities which are called clusters or primary units.
We make sure that all clusters are similar and, hence, representative of the population. We then
select at random, say, 5 clusters from 17.
Next, we randomly select certain households
from each of these 5 clusters and conduct a
survey of these selected households. This is
called cluster sampling.

Cluster sampling is a sampling

technique in which the entire population
is divided into multiple groups (called
clusters) usually by geographical area.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 5

SUPPLEMENTARY VIDEOS FOR FURTHER UNDERSTANDING

https://www.youtube.com/watch?v=9PaR1TsvnJs https://www.youtube.com/watch?v=lJqV1vrxtHc
What Are The Types Of Sampling Techniques In Sampling: Stratified random sampling
Statistics - Random, Stratified, Cluster, Systematic (With Computations)

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 6

INTRODUCTION TO SAMPLING DISTRIBUTIONS

We know that the value of a population parameter is always constant. For example, for any population data set, there is
only one value of the population mean, 𝜇. However, we cannot say the same about the sample mean, 𝑥̅ . We would
expect different samples of the same size drawn from the same population to yield different values of the sample mean,
𝑥̅ . The value of the sample mean for any one sample will depend on the elements included in that sample. Consequently,
the sample mean, 𝒙
̅, is a random variable. Therefore, like other random variables, the sample mean possesses a
probability distribution, which is more commonly called the sampling distribution of 𝒙
̅. Other sample statistics, such
as the proportion and standard deviation also possess sampling distributions.

Illustrative Example:

Consider the population of midterm scores of five students: 70, 78, 80, 80, 95. Consider all possible samples of three
scores each that can be selected, without replacement, from that population. The total number of possible samples is
5! 5×4×3×2×1
5C3 = 3!(5−3)! = (3×2×1)(2×1)= 10.

Suppose we assign the letters A, B, C, D, and E to the scores of the five students, so that

A = 70, B = 78, C = 80, D = 80, E = 95

Then, the 10 possible samples of three scores each are

ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE

These 10 samples and their respective means are listed below. The
mean of each sample is obtained by dividing the sum of the three
scores included in that sample by 3. For instance, the mean of the first
sample ABC is (70 + 78 + 80)/3 = 76.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 7

By using the values of 𝑥̅ given in the table above, we record the frequency distribution of 𝑥̅ . By dividing the
frequencies of the various values of 𝑥̅ by the sum of all frequencies, we obtain the relative frequencies of the
classes, which are listed in the third column of the table.

These relative frequencies are used as probabilities and listed in the table below. This table gives the
sampling distribution of 𝑥̅ .

If we select just one sample of three scores from the population of

five scores, we may draw any of the 10 possible samples. Hence,
the sample mean, 𝑥̅ , can assume any of the values listed in the
table with the corresponding probability. For instance, the
probability that the mean of a randomly selected sample of three
scores is 81.67 is 0.20. This probability can be written as

P(𝒙
̅ = 81.67) = 0.20.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 8

Sampling Distribution of the Mean

The mean and standard deviation of 𝑥̅ are, respectively, the mean and standard deviation of the means of
all samples of the same size selected from a population. Therefore, the mean and standard deviation of 𝒙
̅
are the mean and standard deviation of the sampling distribution of 𝒙
̅, respectively. The standard
deviation of 𝒙
̅ is also called the standard error of 𝒙
̅.

Mean of 𝑥̅ Mean of sampling distribution of 𝑥̅ 𝜇𝑥̅

Standard deviation of sampling distribution of 𝑥̅ , or
Standard deviation of 𝑥̅
standard error of 𝑥̅
𝜎𝑥̅

If we calculate the mean and standard deviation of the 10 values of 𝑥̅ listed in the illustrative example above,
we obtain the mean and standard deviation of the sampling distribution of 𝑥̅ .

We obtain the mean of the sampling distribution of 𝑥̅ as:

76.00 + 76.00 + ⋯ + 84.33 + 85.00

𝜇𝑥̅ = = 80.60
10

and the standard deviation of the sampling distribution of 𝑥̅ as:

(76.00−80.60)2 +(76.00−80.60)2+⋯+(85.00−80.60)2
𝜎𝑥̅ = √ = 3.30
10

The shape of the sampling distribution of 𝑥̅ relates to the following two cases:

1. The population from which samples are drawn has a normal distribution.
2. The population from which samples are drawn does not have a normal distribution.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 9

Sampling Distribution of 𝒙
̅ from a Normally Distributed Population

When the population from which samples are drawn is normally distributed with its mean equal to 𝜇 and
standard deviation equal to 𝜎, then:

1. The mean of 𝑥̅ , which is 𝜇𝑥̅ , is equal to the mean population, 𝜇. This is why 𝝁𝒙̅ is an unbiased
estimator of 𝝁.
𝜎 𝑛
2. The standard deviation of 𝑥̅ , which is 𝜎𝑥̅ , is equal to , assuming 𝑁 ≤ 0.05.
√𝑛

3. The shape of the sampling distribution of 𝑥̅ is normal, regardless of the sample size n.

Following are two important observations regarding the sampling distribution of 𝑥̅ .

1. The spread of the sampling distribution of 𝑥̅ is smaller than the spread of the corresponding population
distribution. In other words, 𝜎𝑥̅ < 𝜎. This is obvious from the formula for 𝜎𝑥̅ . When n is greater than 1,
𝜎
which is usually true, the denominator in is greater than 1. Hence, 𝜎𝑥̅ is smaller than 𝜎.
√𝑛

2. The standard deviation of the sampling distribution of 𝑥̅ decreases as the sample size increases. This
𝜎
feature of the sampling distribution of 𝑥̅ is also obvious from the formula 𝜎𝑥̅ = . This is why 𝝈𝒙̅ is
√𝑛

a consistent estimator of 𝝈. Also, the sample mean 𝒙

̅ is a consistent estimator of the population
mean 𝝁.

Example: Earnings of Internal Medicine Physicians

According to the 2015 Physician Compensation Report by Medscape, American internal medicine
physicians earned an average of $196,000 in 2014. Suppose that the 2014 earnings of all American internal
medicine physicians are approximately normally distributed with a mean of $196,000 and a standard
deviation of $20,000. Let 𝑥̅ be the mean 2014 earnings of a random sample of American internal medicine
physicians. Calculate the mean and standard deviation of 𝑥̅ and describe the shape of its sampling
distribution when the sample size is (a) 16, (b) 50, and (c) 1000.

Solution:
Let 𝜇 and 𝜎 be the mean and standard deviation of the 2014 earnings of all American internal medicine
physicians, and 𝜇𝑥̅ and 𝜎𝑥̅ be the mean and standard deviation of the sampling distribution of 𝑥̅ ,
respectively. Then, from the given information,
𝜇 = $196,000 and 𝜎 = $20,000

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 10

(a) For n = 16, the mean and standard deviation of 𝑥̅ are, respectively,
𝜇𝑥̅ = 𝜇 = $𝟏𝟗𝟔, 𝟎𝟎𝟎 and
𝜎 $20,000
𝜎𝑥̅ = = = $𝟓, 𝟎𝟎𝟎
√𝑛 √16

Because the 2014 earnings of all American internal medicine physicians are approximately normally
distributed, the sampling distribution of 𝑥̅ for samples of 16 physicians is also approximately normally
distributed.
The figure below shows the population distribution and the sampling distribution of 𝑥̅ n = 16. Since 𝜎𝑥̅ < 𝜎,
the population distribution has a wider spread but smaller height than the sampling distribution of 𝑥̅ .

(b) For n = 50, the mean and standard deviation of 𝑥̅ are, respectively,
𝜇𝑥̅ = 𝜇 = $𝟏𝟗𝟔, 𝟎𝟎𝟎 and
𝜎 $20,000
𝜎𝑥̅ = = = $𝟐, 𝟖𝟐𝟖. 𝟒𝟑
√𝑛 √50
Again, since the population distribution is approximately normally distributed, the sampling distribution of 𝑥̅
for samples of 50 physicians is also approximately normally distributed.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 11

The figure above shows the population distribution and the sampling distribution of 𝑥̅ for n = 50. Since 𝜎𝑥̅ < 𝜎,
the population distribution has a wider spread but smaller height than the sampling distribution of 𝑥̅ .

(c) For n = 1,000, the mean and standard deviation of 𝑥̅ are, respectively,
𝜇𝑥̅ = 𝜇 = $𝟏𝟗𝟔, 𝟎𝟎𝟎 and
𝜎 $20,000
𝜎𝑥̅ = = = $𝟔𝟑𝟐. 𝟓𝟔
√𝑛 √1000

Again, since the population distribution is approximately normally distributed, the sampling distribution of 𝑥̅
for samples of 1,000 physicians is also approximately normally distributed.
The figure below shows the population distribution and the sampling distribution of 𝑥̅ for n = 1,000. Thus,
regardless of the sample size, the sampling distribution of is normal when the population from which the
samples are drawn is normally distributed.

From the preceding calculations, we observe that the mean of the sampling distribution of 𝑥̅ is always equal
to the mean of the population ($196,000) whatever the size of the sample. However, the value of the standard
deviation of 𝑥̅ decreases from $5,000.00 to $2,828.43 and then to $632.46 as the sample size increases
from 16 to 50 and then to 1,000.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 12

Sampling Distribution of 𝒙
̅ from a Population that is Not Normally
Distributed

Most of the time the population from which the samples are selected is not normally distributed. In such
cases, the shape of the sampling distribution of 𝑥̅ is inferred from the central limit theorem.

According to the central limit theorem, for a large sample size, the sampling distribution of 𝑥̅ is
approximately normal, regardless of the shape of the population distribution. The mean and standard
deviation of the sampling distribution of 𝑥̅ are, respectively,
𝜎
𝜇𝑥̅ = 𝜇 and 𝜎𝑥̅ =
√𝑛

𝜎
In case of the mean, the sample size is usually considered to be large if n ≥ 30. Also, note that 𝜎𝑥̅ = is
√𝑛

true for n/N ≤ 0.05. We will not focus on cases where n/N > 0.05 since these are unlikely conditions.

If the population distribution is not normally distributed, the sampling distribution of 𝑥̅ with n < 30 is also not
normal. However, the sampling distribution of 𝑥̅ with n ≥ 30 is (approximately) normal because of the central
limit theorem.

Example: Rents Paid by Tenants in a City

The mean rent paid by all tenants in a small city is $1,550 with a standard deviation of $225. However, the
population distribution of rents for all tenants in this city is skewed to the right. Let 𝑥̅ be the mean rent paid
of a random sample of tenants in that city. Calculate the mean and standard deviation of 𝑥̅ and describe
the shape of its sampling distribution when the sample size is (a) 30 (b) 100.

Solution:

Although the population distribution of rents paid by all tenants in not normal, the sample sizes for (a) and
(b) are both large (n ≥ 30). Hence, the central limit theorem can be applied to infer the shape of the
sampling distribution of 𝑥̅ . From the given information,

𝜇 = $1,550 and 𝜎 = $225

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 13

(a) Applying the central limit theorem, the sampling distribution of 𝑥̅ with n = 30 is approximately normal. The
mean and standard deviation are computed as:
𝜇𝑥̅ = 𝜇 = $𝟏, 𝟓𝟓𝟎 and
𝜎 $225
𝜎𝑥̅ = = = $𝟒𝟏. 𝟎𝟖
√𝑛 √30

The figure below shows the population distribution (a) and the sampling distribution of 𝑥̅ for n = 100 (b).

(b) Applying the central limit theorem, the sampling distribution of 𝑥̅ with n = 100 is approximately normal. The
mean and standard deviation are computed as:
𝜇𝑥̅ = 𝜇 = $𝟏, 𝟓𝟓𝟎 and
𝜎 $225
𝜎𝑥̅ = = = $𝟐𝟐. 𝟓𝟎
√𝑛 √100

The figure below shows the population distribution (a) and the sampling distribution of 𝑥̅ for n = 100 (b).

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 14

Sampling Distribution of the Proportion

The concept of proportion is the same as the concept of relative frequency and the concept of probability of
success in a binomial experiment. The relative frequency of a category or class gives the proportion of the
sample or population that belongs to that category or class. Similarly, the probability of success in a binomial
experiment represents the proportion of the sample or population that possesses a given characteristic.

The population proportion, denoted by p, is obtained by taking the ratio of the number of elements in a
population with a specific characteristic to the total number of elements in the population. The sample
proportion, denoted by 𝒑
̂ (pronounced p hat), gives a similar ratio for a sample.

𝑿 𝒙
𝒑= and ̂=
𝒑
𝑵 𝒏

where

N = total number of elements in the population

n = total number of elements in the sample

X = number of elements in the population that possess a specific characteristic

n = number of elements in the sample that possess a specific characteristic

Example: Families Owning Homes

Suppose a total of 789,654 families live in a particular city and 563,282 of them own homes. A sample of
240 families is selected from this city, and 158 of them own homes. Find the proportion of families who own
homes in the population and in the sample.

Solution:

For the population of the city,

N = population size = 789,654

X = number of families in the population who own homes = 563,282

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 15

The population proportion of families in this city who own homes is

𝑿 563,282
𝒑= = = 𝟎. 𝟕𝟏
𝑵 789,654

Now, a sample of 240 families is taken from this city, and 158 of them are home-owners. So,

n = sample size = 240

x = number of families in the sample who own homes = 158

The sample proportion of families in this city who own homes is

𝒙 158
̂=
𝒑 = = 𝟎. 𝟔𝟔
𝒏 240

Sampling Distribution of 𝒑
̂

Just like the sample mean 𝑥̅ , the sample proportion 𝑝̂ is a random variable. In other words, the population
proportion p is a contant as it assumes one and only one value. However, the sample proportion 𝑝̂ can
assume one of a large number of possible values depending on which sample is selected. Hence, 𝑝̂ is a
random variable and it possesses a probability distribution, which is called its sampling distribution.

Illustrative Example:

Boe Consultant Associates has five employees. The table below

gives the names of these five employees and information concerning
their knowledge of statistics.

If we define the population proportion, p, as the proportion of

employees who know statistics, then
3
𝑝= = 0.60
5

Note that this population proportion, p = 0.60, is constant. As long

as the population does not change, this value of p will not change.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 16

Now, suppose we draw all possible samples of three employees each and compute the proportion of
employees, for each sample, who know statistics. The total number of samples of size three that can be
drawn from the population of five employees is

5!
Total number of samples = 5C3 = = 10
3!(5−3)!

The table below lists these 10 possible samples and the proportion of employees who know statistics for
each of those samples.

We now prepare the frequency and relative frequency distributions of 𝑝̂ . The relative frequencies are used
are probilities listed below. The last table gives the sampling distribution of 𝑝̂ .

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 17

When we are dealing with proportions, we consider the following:

1. The mean of 𝑝̂ (denoted by 𝜇𝑝̂ ), which is also the mean of the sampling distribution of 𝑝̂ , is always equal

to the population proportion, p. That is, 𝝁𝒑̂ = 𝒑. This is why 𝒑

̂ is an unbiased estimator of p.

2. The standard deviation of 𝑝̂ (denoted by 𝜎𝑝̂ ), which is also the standard deviation of the sampling

𝑝(1−𝑝) 𝑛 𝒑(𝟏−𝒑)
distribution of 𝑝̂ , is equal to √ , assuming ≤ 0.05. That is, 𝝈𝒑̂ = √ . Note that as n
𝑛 𝑁 𝒏

increases, the value of 𝜎𝑝̂ decreases; thus, 𝒑

̂ is also a consistent estimator of p.

3. Applying the central limit theorem, the shape of the sampling distribution of 𝑝̂ is approximately normal
for a sufficiently large sample size, that is, np > 5 and n(1-p) > 5.

Example: Shipping of Orders

An online retailer claims that 90% of all orders are shipped within 12 hours of being received. A consumer
group placed 120 orders of different sizes and at different times of day; 102 orders were shipped within
12 hours.

(a) Compute the sample proportion 𝑝̂ of items shipped within 12 hours.

(b) Compute the standard deviation of 𝑝̂ .
(c) Describe the shape of the sampling distribution of 𝑝̂ . Use p = 0.90, corresponding to the
assumption that the retailer’s claim is valid.

Solution:

(a) The sample proportion is the number of orders that are shipped within 12 hours (x) divided by
the number of orders in the sample (n):
𝑥 102
𝑝̂ = = = 𝟎. 𝟖𝟒
𝑛 121
Hence, 84% of the orders in the sample were shipped within 12 hours.
𝑝(1−𝑝)
(b) Given with n = 121 and p = 0.90, the standard deviation of 𝑝̂ is computed as √ 𝑛
.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 18

Computing for 𝜎𝑝̂ , we have

𝑝(1 − 𝑝) (0.90)(1 − 0.90) (0.90)(0.10)

𝜎𝑝̂ = √ =√ =√ = 𝟎. 𝟎𝟐𝟕𝟑
𝑛 121 121

(c) To make an inference about the shape of the sampling distribution of 𝑝̂ , we can check if the central
limit theorem is applicable to this problem. The values of np and n(1-p) are
𝑛𝑝 = 121(0.90) = 109 𝑎𝑛𝑑 𝑛(1 − 𝑝) = 121(1 − 0.90) = 12

Since np and n(1-p) are both greater than 5, by the central limit theorem, we can infer that the
sampling distribution of 𝑝̂ is approximately normal with a mean of 0.84 and a standard deviation of
0.0273.

0.0273

0.84

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 19

SELECTING A SAMPLE IN MICROSOFT EXCEL

In this section, we will learn how to select a random sample from a given population data using
MS Excel. This involves assigning a random number unique for every observation in the data.

Follow these steps in selecting a random sample in MS Excel. A demonstration will be

discussed after this.

1. Open the workbook where the data is embedded. If the data is not embedded in an
existing workbook, the data needs to be entered in a workbook.
2. Once the data is ready, create a new column next to the last column. Assign a column
name on Row 1.
3. Click Row 2 of the new column. Enter the formula =RAND(). A random number between
0 and 1 will appear on that cell.
4. Click the fill handle in the cell in Step #3. Alternatively, you can copy that cell in Step
#3 and paste it until the last row (in the same column) of the data.
5. To avoid refreshing of the random numbers, copy all cells in the last column and paste it
as value in the same column. The values in the cells of the last column should be numbers
without the formula =RAND().
6. Order the entire data in terms of the last column in ascending order (from smallest to
largest).
7. Create another column on the right of the last column. Assign a column name on Row 1.
8. Apply a filter on the column in Step #7 depending on the number of observations we want
for our sample. For example, if we want to get a sample of size 500 from a population of
size 4,000, use the Number Filter on the last column and click “Less than or equal to…”.
A dialog box will appear and there you have to enter next to “less than or equal to” the
number 500 for our sample size. Click OK.
9. We now have our sample (e.g. size of 500) but other observations not included are only
hidden. In order to retain only those observations in the sample, copy the filtered data
and paste it on a separate worksheet or workbook. If you have large population data, it
is suggested to paste it in a separate workbook to avoid lag on your software.
10. You can delete the last two columns since it not needed in your analysis. And now you
have your sample!

Follow the demonstration below. Please download the data set and perform random sampling
in MS Excel following these steps. This will serve as your MS Excel output 2.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 20

RANDOM SAMPLING DEMONSTRATION

The workbook Loan Borrower Data (2015, 5-year term).xlsx contains a 2015 financial
institution data of borrowers of a particular loan payable for 60 months. This consists of
137,650 borrowers identifiable using the account number (on Column 1, account_number).
Other columns include the demographic profile of the borrowers (from Columns B to F), and
also their loan details and performance (from Columns G to Q). Our objective to obtain a
random sample of 5,000 borrowers out of this population data.

Demonstration of Random Sampling in Loan Borrower Data:

1. Open the workbook Loan Borrower Data (2015, 5-year term).xlsx.

2. Since our data is ready, we create a new column next to the column installment
(Column Q). In cell R1, we assign random_number as the name of this column.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 21

3. To obtain the random numbers in column random_number, we start with cell R2. In cell
R2, we enter the formula =RAND().

A random number 0.238177426, which is between 0 and 1, is assigned in the first row.

4. Click the fill handle in cell R2. Alternatively, you can copy cell R2 and paste it until the
last cell for that column (cell R137650).

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 22

5. Notice that from Step #3 to Step #4, the random number in cell R2 changed. To avoid
refreshing of the random numbers, we copy all cells from the column random_number and
paste it as value (in Paste Options) in cell S2 to make a new column. We name this new
column as random_number2. The values in the column random_number2 should be
numbers without the formula =RAND().

random_number2 in formula bar should not be =RAND().

At this point, we are done creating two additional columns in our data: random_number
in column R (which consists of refreshing random numbers) and random_number2 in
column S (which contains fixed random numbers). We are now ready for the selection part.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 23

6. Order the entire data in terms of the column random_number2 in ascending order (from
smallest to largest). Click any cell in the column random_number2 and click the Sort &
Filter option.

In the Sort & Filter dropdown, click FILTER. Arrow button should appear on the right
side of each of the column headers.

Click the arrow button in the header and click

Sort Smallest to Largest to achieve the
ascending order of the random numbers in
random_number2.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 24

Now, the entire data is sorted by the column random_number2 column from the smallest
random number to the largest value.

7. We will now rank these random numbers in a new column. In Column T, we will assign
here the column rank. Then, we just use auto-fill to generate the ranks. We write 1 in cell
T2 and 2 in cell T3, highlight both cells, and then click the fill handle to create an auto-fill
of ranks in that column.

8. In the column rank, we apply a filter to get the 5,000 borrowers we want in our sample.
To have a filter button on the rank column header, unclick the Filter option in Sort &
Filter, and click it again. Next, click the filter button in the rank column header and click
NUMBER FILTERS. Further, click “Less Than Or Equal To…”

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 25

A Custom AutoFilter dialog box will then appear. Next to “is less then or equal to”, write
the desired sample size. In our exercise, we need to get a sample of size 5000. Thus, we
enter 5000. Click OK.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 26

We will know that a sample of size 5000 is obtained if the row headers font turned blue
and the last row of the data is random_number2 = 5000. All other observations in the data
are hidden.

9. Lastly, we add a new worksheet for our final sample data.

10. In Sheet 1, we copy the filtered data and paste it on Sheet 2.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 27

11. Delete the columns random_number, random_number2, and rank and the final random
sample of size 5000 is obtained. Rename Sheet 1 as Population Data and Sheet 2 as
Random Sample Data. Click SAVE.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 28

REFERENCES

Books:
Mann, P.S. (2016). Introductory Statistics (9th ed.). John Wiley & Sons, Inc.

Mendenhall III, W., Beaver, R.J., & Beaver, B.M. (2020). Introduction to Probability and Statistics: Metric
Version (15th ed.). Cengage Learning, Inc.

Online References:
https://psa.gov.ph/statistics/quickstat/national-quickstat/all/*

https://www.rappler.com/nation/octa-research-filipinos-covid-19-vaccine-willingness-february-2021
https://www.questionpro.com/blog/probability-sampling/
https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Book%3A_Introductory_Statistics_(Shafer_a
nd_Zhang)/06%3A_Sampling_Distributions/6.03%3A_The_Sample_Proportion

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 20

29
MODULAR ACTIVITY 2: Sampling Distributions

𝟑𝟓
Name: _________________________________________________ Score: ________________

Course/Year/Section: ____________________ Date: _

INSTRUCTION: Answer the following items on your answer sheet. Show all necessary computations.

1. The GPAs of all 5,540 students enrolled at a university have an approximate normal
distribution with a mean of 3.02 and a standard deviation of 0.29. Let 𝑥̅ be the mean GPA of
a random sample of 60 students selected from this university.
1.a. Calculate the mean and standard deviation of 𝑥̅ , and comment on the shape of its
sampling distribution. (10pts)
1.b. Compare the shape of the sampling distribution of 𝑥̅ with a sample size of 60 and that
with a sample size of 25. (3pts)

2. According to the National Association of Colleges and Employers Spring 2015 Salary Survey,
the average starting salary for college graduates in 2014 was $48,127. Suppose that the
mean starting salary of all college graduates in 2014 was $48,127 with a standard deviation
of $9,200, and that this distribution is strongly skewed to the right. Let 𝑥̅ be the mean starting
salary of 100 randomly selected college graduates in 2014.
2.a. Calculate the mean and standard deviation of 𝑥̅ . (6pts)
2.b. Describe the shape of the sampling distribution of 𝑥̅ . (3pts)
2.c. If the sample size in getting 𝑥̅ is 25, describe the shape of the sampling distribution of 𝑥̅ .
(3pts)

3. According to a Gallup poll conducted January 5–8, 2014, 67% of American adults were
dissatisfied with the way income and wealth are distributed in America. Assume that this
percentage is true for the current population of American adults. Let 𝑝̂ be the proportion in
a random sample of 350 American adults who hold the above opinion. Calculate the mean
and standard deviation of the sampling distribution of 𝑝̂ and describe its shape. (10pts)

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 21

Chapter 3 Sampling & Estimation Theory
No ratings yet
Chapter 3 Sampling & Estimation Theory
52 pages
(REVISED) 1st DETAILED LESSON PLAN IN PRACTICAL RESEARCH 1
100% (2)
(REVISED) 1st DETAILED LESSON PLAN IN PRACTICAL RESEARCH 1
15 pages
Chapter 4 Intermediate Accounting
100% (1)
Chapter 4 Intermediate Accounting
41 pages
Module 2 Stat 111 2
No ratings yet
Module 2 Stat 111 2
20 pages
Pfrs 1 - First Time Adoption To Pfrs
No ratings yet
Pfrs 1 - First Time Adoption To Pfrs
27 pages
BS Accountancy Qualifying Exam: Management Advisory Services
No ratings yet
BS Accountancy Qualifying Exam: Management Advisory Services
7 pages
Probability and Nonprobability Sampling
No ratings yet
Probability and Nonprobability Sampling
7 pages
7sampling Technique
No ratings yet
7sampling Technique
60 pages
Preface To International Standards and Philippine Standards Philippine Framework For Assurance Engagements Glossary of Terms (December 2002)
No ratings yet
Preface To International Standards and Philippine Standards Philippine Framework For Assurance Engagements Glossary of Terms (December 2002)
7 pages
Consolidated FS at Date of Acquisition
No ratings yet
Consolidated FS at Date of Acquisition
2 pages
Sugabo Labiano - Final Activity
No ratings yet
Sugabo Labiano - Final Activity
27 pages
ECON 321 Extra Sample Midterm 3 Solutions 2
No ratings yet
ECON 321 Extra Sample Midterm 3 Solutions 2
8 pages
05 - One-Sample T Test
100% (1)
05 - One-Sample T Test
26 pages
SBE13 CH 18
No ratings yet
SBE13 CH 18
74 pages
Module 3 Discrete Probability Distribution
100% (2)
Module 3 Discrete Probability Distribution
22 pages
SCM-4-Master Budget
No ratings yet
SCM-4-Master Budget
21 pages
College of Education Arts and Sciences Acivity # 2 Statistical Analysis With Software Application
No ratings yet
College of Education Arts and Sciences Acivity # 2 Statistical Analysis With Software Application
1 page
Adstat Final Exam Reviewer2highlighted
No ratings yet
Adstat Final Exam Reviewer2highlighted
29 pages
Lecture 4 & 5 - Chapter 5 - Forecasting
No ratings yet
Lecture 4 & 5 - Chapter 5 - Forecasting
50 pages
RFBT Assessment
100% (1)
RFBT Assessment
9 pages
Comparison of IAS 7 and PAS 7
100% (1)
Comparison of IAS 7 and PAS 7
2 pages
01 Guide Questions - Audits of Agriculture
No ratings yet
01 Guide Questions - Audits of Agriculture
3 pages
Scale of Measurement - Adekola Kamaldeen
No ratings yet
Scale of Measurement - Adekola Kamaldeen
5 pages
Internal Rate of Return
No ratings yet
Internal Rate of Return
18 pages
Quantitative Techniques PDF
No ratings yet
Quantitative Techniques PDF
7 pages
Assignment II - Montejo
100% (1)
Assignment II - Montejo
23 pages
Set B Final Pre-Board Examination
No ratings yet
Set B Final Pre-Board Examination
15 pages
Chapter 11
No ratings yet
Chapter 11
58 pages
Retained Earnings1
No ratings yet
Retained Earnings1
11 pages
Prof. Lhars M. Barsabal Biostatistics With Epidemiology Program Cagayan State University
No ratings yet
Prof. Lhars M. Barsabal Biostatistics With Epidemiology Program Cagayan State University
28 pages
Acquisition, Development and Implementation of Information Systems
No ratings yet
Acquisition, Development and Implementation of Information Systems
32 pages
AP Chemistry Problems - Freezing Point Depression and Other Solution Properties With Answers
No ratings yet
AP Chemistry Problems - Freezing Point Depression and Other Solution Properties With Answers
16 pages
Chapter 8. Preparation of Research Report
No ratings yet
Chapter 8. Preparation of Research Report
26 pages
Assignment III - Montejo-Final
No ratings yet
Assignment III - Montejo-Final
17 pages
Chapter 3
100% (1)
Chapter 3
59 pages
BComp3 Module 5 Measures of Variability
No ratings yet
BComp3 Module 5 Measures of Variability
17 pages
MCQ Chapter 11
100% (1)
MCQ Chapter 11
3 pages
Sample Size: Slovin's Equation
No ratings yet
Sample Size: Slovin's Equation
6 pages
Accounting For Investments in Associates
No ratings yet
Accounting For Investments in Associates
90 pages
The Nature of Probability and Statistics
100% (1)
The Nature of Probability and Statistics
16 pages
2C. NPO Hospitals - PPT
No ratings yet
2C. NPO Hospitals - PPT
18 pages
Appendix 8 - Instructions - RAPAL
No ratings yet
Appendix 8 - Instructions - RAPAL
1 page
SM ch04
100% (1)
SM ch04
54 pages
Case 1 2
No ratings yet
Case 1 2
2 pages
Exercise Module 4
No ratings yet
Exercise Module 4
5 pages
Chapter 19 Ans
No ratings yet
Chapter 19 Ans
10 pages
Quizzer 8
No ratings yet
Quizzer 8
6 pages
At 1 Theory Palang With Questions 1
No ratings yet
At 1 Theory Palang With Questions 1
12 pages
Equity Instruments Include All of The Following, Except
No ratings yet
Equity Instruments Include All of The Following, Except
5 pages
Chapter 2 Facility Layout
No ratings yet
Chapter 2 Facility Layout
3 pages
CH1 - Overview of Research
No ratings yet
CH1 - Overview of Research
36 pages
Quantitative Methods
No ratings yet
Quantitative Methods
11 pages
ACCTG ED-22 Unit 3 (Activity 3) BALASTA, JOHN RUBE B.
No ratings yet
ACCTG ED-22 Unit 3 (Activity 3) BALASTA, JOHN RUBE B.
6 pages
Quiz 5 Economics
No ratings yet
Quiz 5 Economics
11 pages
ANOVA Test Bank
No ratings yet
ANOVA Test Bank
121 pages
Qualifying Far
No ratings yet
Qualifying Far
11 pages
Biostat Prelims
No ratings yet
Biostat Prelims
9 pages
Noncurrent Liabs
100% (1)
Noncurrent Liabs
4 pages
IFRS 2 Implementation
No ratings yet
IFRS 2 Implementation
34 pages
Gec 4 Final Problem Sets With Answers HL
No ratings yet
Gec 4 Final Problem Sets With Answers HL
14 pages
PDF CH 3 Hmwkdocxdocx DD
No ratings yet
PDF CH 3 Hmwkdocxdocx DD
5 pages
Zurita - Summary Table For PSAs
No ratings yet
Zurita - Summary Table For PSAs
2 pages
Sampling PDF
No ratings yet
Sampling PDF
2 pages
Auditing Cup FINALS Nfjpia Bsa Cpa
No ratings yet
Auditing Cup FINALS Nfjpia Bsa Cpa
13 pages
Sensitivity Analysis
No ratings yet
Sensitivity Analysis
20 pages
At Reviewer Part II - (May 2015 Batch)
No ratings yet
At Reviewer Part II - (May 2015 Batch)
22 pages
Intermediate Accounting III: Pre-Test - Errors and Changes
No ratings yet
Intermediate Accounting III: Pre-Test - Errors and Changes
2 pages
CONSORT Statement 2001 Checklist: Items To Include When Reporting A Randomized Trial
No ratings yet
CONSORT Statement 2001 Checklist: Items To Include When Reporting A Randomized Trial
1 page
2 Data Collection 1
No ratings yet
2 Data Collection 1
38 pages
CH 12 For Exam
No ratings yet
CH 12 For Exam
10 pages
Jurnal Indeks Bias - A. Ainur Fadilla (60400120028)
No ratings yet
Jurnal Indeks Bias - A. Ainur Fadilla (60400120028)
13 pages
Name: Score: - Year and Section: - Date
100% (3)
Name: Score: - Year and Section: - Date
5 pages
Sampling Plan
No ratings yet
Sampling Plan
2 pages
Activity 3 Measures of Central Tendency
No ratings yet
Activity 3 Measures of Central Tendency
2 pages
Ch-3-Census and Sampling Methods-F PDF
No ratings yet
Ch-3-Census and Sampling Methods-F PDF
7 pages
Chapter 3 Intermediate Accounting
No ratings yet
Chapter 3 Intermediate Accounting
56 pages
Stat 336-Design of Experiments - Dr. Eric Nyarko
No ratings yet
Stat 336-Design of Experiments - Dr. Eric Nyarko
125 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Set 6
No ratings yet
Set 6
53 pages
Statistical Method 2 Ignou
No ratings yet
Statistical Method 2 Ignou
74 pages
Chapter 5 - Census and Sample Survey
No ratings yet
Chapter 5 - Census and Sample Survey
32 pages
04 Sample Problems For Chapter 4 - ANSWER KEY
No ratings yet
04 Sample Problems For Chapter 4 - ANSWER KEY
17 pages
Anova 1
No ratings yet
Anova 1
47 pages
11 ANOVA (Student Version)
No ratings yet
11 ANOVA (Student Version)
30 pages
Sampling Techniques
No ratings yet
Sampling Techniques
15 pages
Sampling Distribution
No ratings yet
Sampling Distribution
32 pages
Repeability and Reproducibility Analysis Report
No ratings yet
Repeability and Reproducibility Analysis Report
7 pages
Sampling Techniques
No ratings yet
Sampling Techniques
28 pages
The Research Process: - Elements of Research Design
No ratings yet
The Research Process: - Elements of Research Design
30 pages
Atzmueller Steiner 2010
No ratings yet
Atzmueller Steiner 2010
12 pages
ID Pengaruh Kualitas Produk Harga Dan Lokas
No ratings yet
ID Pengaruh Kualitas Produk Harga Dan Lokas
12 pages
Stratified Sampling
No ratings yet
Stratified Sampling
4 pages
Unit 6 - Sampling (Notes)
No ratings yet
Unit 6 - Sampling (Notes)
8 pages
Learning Activity Sheet 6 STAT PROB
No ratings yet
Learning Activity Sheet 6 STAT PROB
3 pages
Ae 15 Intermediate Accounting 1: (Problem 2-1 Ia 1 2019 Edition)
No ratings yet
Ae 15 Intermediate Accounting 1: (Problem 2-1 Ia 1 2019 Edition)
4 pages
Auditing Theory Quiz
No ratings yet
Auditing Theory Quiz
1 page
Ae 15 Bs Acc 1 Home Based Activity
No ratings yet
Ae 15 Bs Acc 1 Home Based Activity
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.