0% found this document useful (0 votes)
112 views32 pages

AE 9 AY 2021-2022 Module 2 (Complete)

This topic, statistical analysis provides us knowledge about how statistics works in our daily life, by gathering data.

Uploaded by

Mae Ann Raquin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views32 pages

AE 9 AY 2021-2022 Module 2 (Complete)

This topic, statistical analysis provides us knowledge about how statistics works in our daily life, by gathering data.

Uploaded by

Mae Ann Raquin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

AKLAN STATE UNIVERSITY

AE 9

LEARNER’S
MODULE

August KASPAROV I. REPEDRO


2021 INSTRUCTOR

Student’s Name:
CHAPTER 3
SAMPLING and
SAMPLING DISTRIBUTIONS

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 0


TABLE OF CONTENTS

CHAPTER 3: Sampling and Sampling Distributions


1. Selecting a Sample
2. Introduction to Sampling Distributions
3. Sampling Distribution of the Mean
4. Sampling Distribution of the Proportion
5. Sampling in MS Excel

Objectives

Upon accomplishing this chapter, the students must be able to:


1. Explain the importance of sampling.
2. Distinguish the different sampling methods.

3. Explain the concept of sampling distribution.


4. Conduct sampling techniques in a data using MS Excel.
5. Compute probabilities related to the sample mean and proportion using MS Excel.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 1


CH 3 Sampling and
Sampling Distributions
As introduced earlier, majority of statistics involves drawing inferences and making decisions about populations
based on results obtained from a subset of that population or a sample. Understanding sampling is crucial for
beginners in statistics.

SAMPLING

In sample surveys, there are mainly two methods of sampling: random sampling and nonrandom sampling.

RANDOM and NONRANDOM SAMPLING

Random sampling is a method of sampling in Nonrandom sampling is a method of sampling


which each member of the population has some in which some member of the population may not
chance of being selected in the sample. have any chance of being selected in the sample.

Suppose we have a list of 100 students and we want to select 10 of them. If we write the names of all 100
students on pieces of paper, put them in a hat, mix them, and then draw 10 names, the result will be a random
sample of 10 students. However, if we arrange the names of these 100 students alphabetically and pick the
first 10 names, it will be a nonrandom sample because the students who are not among the first 10 have no
chance of being selected in the sample.
A random sample is usually a representative sample. The random sample from the population should be
selected such that it is a representative of the population, i.e. the sample has the same characteristics as the
population. Note that for a random sample, each member of the population may or may not have the same
chance of being included in the sample.
Two types of nonrandom sampling are a convenience sampling and a judgment sampling.
 In convenience sampling, the most accessible members of the population are selected to obtain
the results quickly. For example, an opinion poll may be conducted in a few hours by collecting
information from certain shoppers at a single shopping mall.

 In judgment sampling, the members are selected from the population based on the judgment
and prior knowledge of an expert. Although such a sample may happen to be a representative
sample, the chances of it being so are small. If the population is large, it is not an easy task to
select a representative sample based on judgment.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 2


RANDOM SAMPLING TECHNIQUES
In selecting a random sample, there are different techniques
that can be used. Random sampling techniques are used to
obtain a random sample that represents the target population.
This is because all members of the population have a chance
to be included in the sample. Here, we will be discussing four
commonly used techniques.

1. Simple Random Sampling

Simple random sampling is a sampling technique in which any particular sample of a


specific sample size has the same chance of being selected as any other sample of the
same size.

Sample size is the number of elements in the sample, denoted by n. Meanwhile, population size,
denoted by N, is the number of elements in the population.

There are several techniques under


simple random sampling. One of the
most common simple random
sampling is through lottery/drawing.

Lottery or fishbowl sampling. For


example, if we need to select 5
students from a class of 50 (target
population), we write each of the 50
names on a separate piece of paper. Then, we place all 50 names in a bowl and mix them
thoroughly. Next, we draw 1 name randomly from the bowl. We repeat this experiment four more
times. The 5 drawn names make up a simple random sample with a sample size of 5.
2. Systematic Random Sampling

If the target population is too large, simple random sampling may not be practical to use. For
example, we need to get a random sample of 150 households from a list of 45,000 households.
Listing all 45,000 names would consume very large amount of time. Here, we can use systematic
random sampling.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 3


Systematic random sampling is a
sampling technique in which the
elements of the sample are taken from
every kth element in the population
arranged alphabetically or by other
characteristic. Here, k = 𝑁 𝑛.

The procedure to select a systematic random sample is as follows. In the example just mentioned,
we would arrange all 45,000 households alphabetically (or based on some other characteristic).
Since the sample size should equal 150, the ratio of population to sample size is 45,000/150 = 300.
Using this ratio, we randomly select one household from the first 300 households in the arranged
list using either method. Suppose by using either of the methods, we select the 210th household.
We then select the 210th household from every 300 households in the list. In other words, our
sample includes the households with numbers 210, 510, 810, 1110, 1410, 1710, and so on.

3. Stratified Random Sampling


Suppose we need to select a sample from the
population of a city, and we want households with
different income levels to be proportionately
represented in the sample. In this case, instead of
selecting a simple random sample or a systematic
random sample, we may prefer to apply a different
technique. First, we divide the whole population into
different groups based on income levels. For example,
we may form three groups of low-, medium-, and high-
income households.

We will now have three subpopulations, which are usually called strata. We then select one sample
from each subpopulation or stratum. The collection of all three samples selected from the three
strata gives the required sample, called the stratified random sample.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 4


Usually, the sizes of the samples selected from different strata are proportionate to the sizes of the
subpopulations in these strata. Note that the elements of each stratum are identical with regard to
the possession of a characteristic.

Note that stratified random sampling has two types:

Stratified random sampling is a sampling technique in which the entire population is


divided into smaller groups (called strata; stratum in singular) that are not overlapping
and represent the entire population.

4. Cluster Random Sampling


Sometimes the target population is scattered over a
wide geographical area. Consequently, if a simple
random sample is selected, it may be costly to contact
each member of the sample. In such a case, we divide
the population into different geographical groups or
clusters and, as a first step, select a random sample
of certain clusters from all clusters. We then take a
random sample of certain elements from each
selected cluster.

For example, suppose we are to conduct a survey of households in Aklan in Western Visayas. First,
we divide the whole province into its 17 municipalities which are called clusters or primary units.
We make sure that all clusters are similar and, hence, representative of the population. We then
select at random, say, 5 clusters from 17.
Next, we randomly select certain households
from each of these 5 clusters and conduct a
survey of these selected households. This is
called cluster sampling.

Cluster sampling is a sampling


technique in which the entire population
is divided into multiple groups (called
clusters) usually by geographical area.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 5


SUPPLEMENTARY VIDEOS FOR FURTHER UNDERSTANDING

https://www.youtube.com/watch?v=9PaR1TsvnJs https://www.youtube.com/watch?v=lJqV1vrxtHc
What Are The Types Of Sampling Techniques In Sampling: Stratified random sampling
Statistics - Random, Stratified, Cluster, Systematic (With Computations)

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 6


INTRODUCTION TO SAMPLING DISTRIBUTIONS

We know that the value of a population parameter is always constant. For example, for any population data set, there is
only one value of the population mean, 𝜇. However, we cannot say the same about the sample mean, 𝑥̅ . We would
expect different samples of the same size drawn from the same population to yield different values of the sample mean,
𝑥̅ . The value of the sample mean for any one sample will depend on the elements included in that sample. Consequently,
the sample mean, 𝒙
̅, is a random variable. Therefore, like other random variables, the sample mean possesses a
probability distribution, which is more commonly called the sampling distribution of 𝒙
̅. Other sample statistics, such
as the proportion and standard deviation also possess sampling distributions.

Illustrative Example:

Consider the population of midterm scores of five students: 70, 78, 80, 80, 95. Consider all possible samples of three
scores each that can be selected, without replacement, from that population. The total number of possible samples is
5! 5×4×3×2×1
5C3 = 3!(5−3)! = (3×2×1)(2×1)= 10.

Suppose we assign the letters A, B, C, D, and E to the scores of the five students, so that

A = 70, B = 78, C = 80, D = 80, E = 95

Then, the 10 possible samples of three scores each are

ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE

These 10 samples and their respective means are listed below. The
mean of each sample is obtained by dividing the sum of the three
scores included in that sample by 3. For instance, the mean of the first
sample ABC is (70 + 78 + 80)/3 = 76.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 7


By using the values of 𝑥̅ given in the table above, we record the frequency distribution of 𝑥̅ . By dividing the
frequencies of the various values of 𝑥̅ by the sum of all frequencies, we obtain the relative frequencies of the
classes, which are listed in the third column of the table.

These relative frequencies are used as probabilities and listed in the table below. This table gives the
sampling distribution of 𝑥̅ .

If we select just one sample of three scores from the population of


five scores, we may draw any of the 10 possible samples. Hence,
the sample mean, 𝑥̅ , can assume any of the values listed in the
table with the corresponding probability. For instance, the
probability that the mean of a randomly selected sample of three
scores is 81.67 is 0.20. This probability can be written as

P(𝒙
̅ = 81.67) = 0.20.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 8


Sampling Distribution of the Mean

The mean and standard deviation of 𝑥̅ are, respectively, the mean and standard deviation of the means of
all samples of the same size selected from a population. Therefore, the mean and standard deviation of 𝒙
̅
are the mean and standard deviation of the sampling distribution of 𝒙
̅, respectively. The standard
deviation of 𝒙
̅ is also called the standard error of 𝒙
̅.

Mean of 𝑥̅ Mean of sampling distribution of 𝑥̅ 𝜇𝑥̅


Standard deviation of sampling distribution of 𝑥̅ , or
Standard deviation of 𝑥̅
standard error of 𝑥̅
𝜎𝑥̅

If we calculate the mean and standard deviation of the 10 values of 𝑥̅ listed in the illustrative example above,
we obtain the mean and standard deviation of the sampling distribution of 𝑥̅ .

We obtain the mean of the sampling distribution of 𝑥̅ as:

76.00 + 76.00 + ⋯ + 84.33 + 85.00


𝜇𝑥̅ = = 80.60
10

and the standard deviation of the sampling distribution of 𝑥̅ as:

(76.00−80.60)2 +(76.00−80.60)2+⋯+(85.00−80.60)2
𝜎𝑥̅ = √ = 3.30
10

The shape of the sampling distribution of 𝑥̅ relates to the following two cases:

1. The population from which samples are drawn has a normal distribution.
2. The population from which samples are drawn does not have a normal distribution.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 9


Sampling Distribution of 𝒙
̅ from a Normally Distributed Population

When the population from which samples are drawn is normally distributed with its mean equal to 𝜇 and
standard deviation equal to 𝜎, then:

1. The mean of 𝑥̅ , which is 𝜇𝑥̅ , is equal to the mean population, 𝜇. This is why 𝝁𝒙̅ is an unbiased
estimator of 𝝁.
𝜎 𝑛
2. The standard deviation of 𝑥̅ , which is 𝜎𝑥̅ , is equal to , assuming 𝑁 ≤ 0.05.
√𝑛

3. The shape of the sampling distribution of 𝑥̅ is normal, regardless of the sample size n.

Following are two important observations regarding the sampling distribution of 𝑥̅ .

1. The spread of the sampling distribution of 𝑥̅ is smaller than the spread of the corresponding population
distribution. In other words, 𝜎𝑥̅ < 𝜎. This is obvious from the formula for 𝜎𝑥̅ . When n is greater than 1,
𝜎
which is usually true, the denominator in is greater than 1. Hence, 𝜎𝑥̅ is smaller than 𝜎.
√𝑛

2. The standard deviation of the sampling distribution of 𝑥̅ decreases as the sample size increases. This
𝜎
feature of the sampling distribution of 𝑥̅ is also obvious from the formula 𝜎𝑥̅ = . This is why 𝝈𝒙̅ is
√𝑛

a consistent estimator of 𝝈. Also, the sample mean 𝒙


̅ is a consistent estimator of the population
mean 𝝁.

Example: Earnings of Internal Medicine Physicians


According to the 2015 Physician Compensation Report by Medscape, American internal medicine
physicians earned an average of $196,000 in 2014. Suppose that the 2014 earnings of all American internal
medicine physicians are approximately normally distributed with a mean of $196,000 and a standard
deviation of $20,000. Let 𝑥̅ be the mean 2014 earnings of a random sample of American internal medicine
physicians. Calculate the mean and standard deviation of 𝑥̅ and describe the shape of its sampling
distribution when the sample size is (a) 16, (b) 50, and (c) 1000.

Solution:
Let 𝜇 and 𝜎 be the mean and standard deviation of the 2014 earnings of all American internal medicine
physicians, and 𝜇𝑥̅ and 𝜎𝑥̅ be the mean and standard deviation of the sampling distribution of 𝑥̅ ,
respectively. Then, from the given information,
𝜇 = $196,000 and 𝜎 = $20,000

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 10


(a) For n = 16, the mean and standard deviation of 𝑥̅ are, respectively,
𝜇𝑥̅ = 𝜇 = $𝟏𝟗𝟔, 𝟎𝟎𝟎 and
𝜎 $20,000
𝜎𝑥̅ = = = $𝟓, 𝟎𝟎𝟎
√𝑛 √16

Because the 2014 earnings of all American internal medicine physicians are approximately normally
distributed, the sampling distribution of 𝑥̅ for samples of 16 physicians is also approximately normally
distributed.
The figure below shows the population distribution and the sampling distribution of 𝑥̅ n = 16. Since 𝜎𝑥̅ < 𝜎,
the population distribution has a wider spread but smaller height than the sampling distribution of 𝑥̅ .

(b) For n = 50, the mean and standard deviation of 𝑥̅ are, respectively,
𝜇𝑥̅ = 𝜇 = $𝟏𝟗𝟔, 𝟎𝟎𝟎 and
𝜎 $20,000
𝜎𝑥̅ = = = $𝟐, 𝟖𝟐𝟖. 𝟒𝟑
√𝑛 √50
Again, since the population distribution is approximately normally distributed, the sampling distribution of 𝑥̅
for samples of 50 physicians is also approximately normally distributed.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 11


The figure above shows the population distribution and the sampling distribution of 𝑥̅ for n = 50. Since 𝜎𝑥̅ < 𝜎,
the population distribution has a wider spread but smaller height than the sampling distribution of 𝑥̅ .

(c) For n = 1,000, the mean and standard deviation of 𝑥̅ are, respectively,
𝜇𝑥̅ = 𝜇 = $𝟏𝟗𝟔, 𝟎𝟎𝟎 and
𝜎 $20,000
𝜎𝑥̅ = = = $𝟔𝟑𝟐. 𝟓𝟔
√𝑛 √1000

Again, since the population distribution is approximately normally distributed, the sampling distribution of 𝑥̅
for samples of 1,000 physicians is also approximately normally distributed.
The figure below shows the population distribution and the sampling distribution of 𝑥̅ for n = 1,000. Thus,
regardless of the sample size, the sampling distribution of is normal when the population from which the
samples are drawn is normally distributed.

From the preceding calculations, we observe that the mean of the sampling distribution of 𝑥̅ is always equal
to the mean of the population ($196,000) whatever the size of the sample. However, the value of the standard
deviation of 𝑥̅ decreases from $5,000.00 to $2,828.43 and then to $632.46 as the sample size increases
from 16 to 50 and then to 1,000.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 12


Sampling Distribution of 𝒙
̅ from a Population that is Not Normally
Distributed

Most of the time the population from which the samples are selected is not normally distributed. In such
cases, the shape of the sampling distribution of 𝑥̅ is inferred from the central limit theorem.

According to the central limit theorem, for a large sample size, the sampling distribution of 𝑥̅ is
approximately normal, regardless of the shape of the population distribution. The mean and standard
deviation of the sampling distribution of 𝑥̅ are, respectively,
𝜎
𝜇𝑥̅ = 𝜇 and 𝜎𝑥̅ =
√𝑛

𝜎
In case of the mean, the sample size is usually considered to be large if n ≥ 30. Also, note that 𝜎𝑥̅ = is
√𝑛

true for n/N ≤ 0.05. We will not focus on cases where n/N > 0.05 since these are unlikely conditions.

If the population distribution is not normally distributed, the sampling distribution of 𝑥̅ with n < 30 is also not
normal. However, the sampling distribution of 𝑥̅ with n ≥ 30 is (approximately) normal because of the central
limit theorem.

Example: Rents Paid by Tenants in a City


The mean rent paid by all tenants in a small city is $1,550 with a standard deviation of $225. However, the
population distribution of rents for all tenants in this city is skewed to the right. Let 𝑥̅ be the mean rent paid
of a random sample of tenants in that city. Calculate the mean and standard deviation of 𝑥̅ and describe
the shape of its sampling distribution when the sample size is (a) 30 (b) 100.

Solution:

Although the population distribution of rents paid by all tenants in not normal, the sample sizes for (a) and
(b) are both large (n ≥ 30). Hence, the central limit theorem can be applied to infer the shape of the
sampling distribution of 𝑥̅ . From the given information,

𝜇 = $1,550 and 𝜎 = $225

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 13


(a) Applying the central limit theorem, the sampling distribution of 𝑥̅ with n = 30 is approximately normal. The
mean and standard deviation are computed as:
𝜇𝑥̅ = 𝜇 = $𝟏, 𝟓𝟓𝟎 and
𝜎 $225
𝜎𝑥̅ = = = $𝟒𝟏. 𝟎𝟖
√𝑛 √30

The figure below shows the population distribution (a) and the sampling distribution of 𝑥̅ for n = 100 (b).

(b) Applying the central limit theorem, the sampling distribution of 𝑥̅ with n = 100 is approximately normal. The
mean and standard deviation are computed as:
𝜇𝑥̅ = 𝜇 = $𝟏, 𝟓𝟓𝟎 and
𝜎 $225
𝜎𝑥̅ = = = $𝟐𝟐. 𝟓𝟎
√𝑛 √100

The figure below shows the population distribution (a) and the sampling distribution of 𝑥̅ for n = 100 (b).

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 14


Sampling Distribution of the Proportion

The concept of proportion is the same as the concept of relative frequency and the concept of probability of
success in a binomial experiment. The relative frequency of a category or class gives the proportion of the
sample or population that belongs to that category or class. Similarly, the probability of success in a binomial
experiment represents the proportion of the sample or population that possesses a given characteristic.

The population proportion, denoted by p, is obtained by taking the ratio of the number of elements in a
population with a specific characteristic to the total number of elements in the population. The sample
proportion, denoted by 𝒑
̂ (pronounced p hat), gives a similar ratio for a sample.

𝑿 𝒙
𝒑= and ̂=
𝒑
𝑵 𝒏

where

N = total number of elements in the population

n = total number of elements in the sample

X = number of elements in the population that possess a specific characteristic

n = number of elements in the sample that possess a specific characteristic

Example: Families Owning Homes


Suppose a total of 789,654 families live in a particular city and 563,282 of them own homes. A sample of
240 families is selected from this city, and 158 of them own homes. Find the proportion of families who own
homes in the population and in the sample.

Solution:

For the population of the city,

N = population size = 789,654

X = number of families in the population who own homes = 563,282

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 15


The population proportion of families in this city who own homes is

𝑿 563,282
𝒑= = = 𝟎. 𝟕𝟏
𝑵 789,654

Now, a sample of 240 families is taken from this city, and 158 of them are home-owners. So,

n = sample size = 240

x = number of families in the sample who own homes = 158

The sample proportion of families in this city who own homes is


𝒙 158
̂=
𝒑 = = 𝟎. 𝟔𝟔
𝒏 240

Sampling Distribution of 𝒑
̂

Just like the sample mean 𝑥̅ , the sample proportion 𝑝̂ is a random variable. In other words, the population
proportion p is a contant as it assumes one and only one value. However, the sample proportion 𝑝̂ can
assume one of a large number of possible values depending on which sample is selected. Hence, 𝑝̂ is a
random variable and it possesses a probability distribution, which is called its sampling distribution.

Illustrative Example:

Boe Consultant Associates has five employees. The table below


gives the names of these five employees and information concerning
their knowledge of statistics.

If we define the population proportion, p, as the proportion of


employees who know statistics, then
3
𝑝= = 0.60
5

Note that this population proportion, p = 0.60, is constant. As long


as the population does not change, this value of p will not change.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 16


Now, suppose we draw all possible samples of three employees each and compute the proportion of
employees, for each sample, who know statistics. The total number of samples of size three that can be
drawn from the population of five employees is

5!
Total number of samples = 5C3 = = 10
3!(5−3)!

The table below lists these 10 possible samples and the proportion of employees who know statistics for
each of those samples.

We now prepare the frequency and relative frequency distributions of 𝑝̂ . The relative frequencies are used
are probilities listed below. The last table gives the sampling distribution of 𝑝̂ .

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 17


When we are dealing with proportions, we consider the following:

1. The mean of 𝑝̂ (denoted by 𝜇𝑝̂ ), which is also the mean of the sampling distribution of 𝑝̂ , is always equal

to the population proportion, p. That is, 𝝁𝒑̂ = 𝒑. This is why 𝒑


̂ is an unbiased estimator of p.

2. The standard deviation of 𝑝̂ (denoted by 𝜎𝑝̂ ), which is also the standard deviation of the sampling

𝑝(1−𝑝) 𝑛 𝒑(𝟏−𝒑)
distribution of 𝑝̂ , is equal to √ , assuming ≤ 0.05. That is, 𝝈𝒑̂ = √ . Note that as n
𝑛 𝑁 𝒏

increases, the value of 𝜎𝑝̂ decreases; thus, 𝒑


̂ is also a consistent estimator of p.

3. Applying the central limit theorem, the shape of the sampling distribution of 𝑝̂ is approximately normal
for a sufficiently large sample size, that is, np > 5 and n(1-p) > 5.

Example: Shipping of Orders


An online retailer claims that 90% of all orders are shipped within 12 hours of being received. A consumer
group placed 120 orders of different sizes and at different times of day; 102 orders were shipped within
12 hours.

(a) Compute the sample proportion 𝑝̂ of items shipped within 12 hours.


(b) Compute the standard deviation of 𝑝̂ .
(c) Describe the shape of the sampling distribution of 𝑝̂ . Use p = 0.90, corresponding to the
assumption that the retailer’s claim is valid.

Solution:

(a) The sample proportion is the number of orders that are shipped within 12 hours (x) divided by
the number of orders in the sample (n):
𝑥 102
𝑝̂ = = = 𝟎. 𝟖𝟒
𝑛 121
Hence, 84% of the orders in the sample were shipped within 12 hours.
𝑝(1−𝑝)
(b) Given with n = 121 and p = 0.90, the standard deviation of 𝑝̂ is computed as √ 𝑛
.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 18


Computing for 𝜎𝑝̂ , we have

𝑝(1 − 𝑝) (0.90)(1 − 0.90) (0.90)(0.10)


𝜎𝑝̂ = √ =√ =√ = 𝟎. 𝟎𝟐𝟕𝟑
𝑛 121 121

(c) To make an inference about the shape of the sampling distribution of 𝑝̂ , we can check if the central
limit theorem is applicable to this problem. The values of np and n(1-p) are
𝑛𝑝 = 121(0.90) = 109 𝑎𝑛𝑑 𝑛(1 − 𝑝) = 121(1 − 0.90) = 12

Since np and n(1-p) are both greater than 5, by the central limit theorem, we can infer that the
sampling distribution of 𝑝̂ is approximately normal with a mean of 0.84 and a standard deviation of
0.0273.

0.0273

0.84

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 19


SELECTING A SAMPLE IN MICROSOFT EXCEL

In this section, we will learn how to select a random sample from a given population data using
MS Excel. This involves assigning a random number unique for every observation in the data.

Follow these steps in selecting a random sample in MS Excel. A demonstration will be


discussed after this.

1. Open the workbook where the data is embedded. If the data is not embedded in an
existing workbook, the data needs to be entered in a workbook.
2. Once the data is ready, create a new column next to the last column. Assign a column
name on Row 1.
3. Click Row 2 of the new column. Enter the formula =RAND(). A random number between
0 and 1 will appear on that cell.
4. Click the fill handle in the cell in Step #3. Alternatively, you can copy that cell in Step
#3 and paste it until the last row (in the same column) of the data.
5. To avoid refreshing of the random numbers, copy all cells in the last column and paste it
as value in the same column. The values in the cells of the last column should be numbers
without the formula =RAND().
6. Order the entire data in terms of the last column in ascending order (from smallest to
largest).
7. Create another column on the right of the last column. Assign a column name on Row 1.
8. Apply a filter on the column in Step #7 depending on the number of observations we want
for our sample. For example, if we want to get a sample of size 500 from a population of
size 4,000, use the Number Filter on the last column and click “Less than or equal to…”.
A dialog box will appear and there you have to enter next to “less than or equal to” the
number 500 for our sample size. Click OK.
9. We now have our sample (e.g. size of 500) but other observations not included are only
hidden. In order to retain only those observations in the sample, copy the filtered data
and paste it on a separate worksheet or workbook. If you have large population data, it
is suggested to paste it in a separate workbook to avoid lag on your software.
10. You can delete the last two columns since it not needed in your analysis. And now you
have your sample!

Follow the demonstration below. Please download the data set and perform random sampling
in MS Excel following these steps. This will serve as your MS Excel output 2.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 20


RANDOM SAMPLING DEMONSTRATION

The workbook Loan Borrower Data (2015, 5-year term).xlsx contains a 2015 financial
institution data of borrowers of a particular loan payable for 60 months. This consists of
137,650 borrowers identifiable using the account number (on Column 1, account_number).
Other columns include the demographic profile of the borrowers (from Columns B to F), and
also their loan details and performance (from Columns G to Q). Our objective to obtain a
random sample of 5,000 borrowers out of this population data.

Demonstration of Random Sampling in Loan Borrower Data:

1. Open the workbook Loan Borrower Data (2015, 5-year term).xlsx.

2. Since our data is ready, we create a new column next to the column installment
(Column Q). In cell R1, we assign random_number as the name of this column.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 21


3. To obtain the random numbers in column random_number, we start with cell R2. In cell
R2, we enter the formula =RAND().

A random number 0.238177426, which is between 0 and 1, is assigned in the first row.

4. Click the fill handle in cell R2. Alternatively, you can copy cell R2 and paste it until the
last cell for that column (cell R137650).

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 22


5. Notice that from Step #3 to Step #4, the random number in cell R2 changed. To avoid
refreshing of the random numbers, we copy all cells from the column random_number and
paste it as value (in Paste Options) in cell S2 to make a new column. We name this new
column as random_number2. The values in the column random_number2 should be
numbers without the formula =RAND().

random_number2 in formula bar should not be =RAND().

At this point, we are done creating two additional columns in our data: random_number
in column R (which consists of refreshing random numbers) and random_number2 in
column S (which contains fixed random numbers). We are now ready for the selection part.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 23


6. Order the entire data in terms of the column random_number2 in ascending order (from
smallest to largest). Click any cell in the column random_number2 and click the Sort &
Filter option.

In the Sort & Filter dropdown, click FILTER. Arrow button should appear on the right
side of each of the column headers.

Click the arrow button in the header and click


Sort Smallest to Largest to achieve the
ascending order of the random numbers in
random_number2.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 24


Now, the entire data is sorted by the column random_number2 column from the smallest
random number to the largest value.

7. We will now rank these random numbers in a new column. In Column T, we will assign
here the column rank. Then, we just use auto-fill to generate the ranks. We write 1 in cell
T2 and 2 in cell T3, highlight both cells, and then click the fill handle to create an auto-fill
of ranks in that column.

8. In the column rank, we apply a filter to get the 5,000 borrowers we want in our sample.
To have a filter button on the rank column header, unclick the Filter option in Sort &
Filter, and click it again. Next, click the filter button in the rank column header and click
NUMBER FILTERS. Further, click “Less Than Or Equal To…”

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 25


A Custom AutoFilter dialog box will then appear. Next to “is less then or equal to”, write
the desired sample size. In our exercise, we need to get a sample of size 5000. Thus, we
enter 5000. Click OK.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 26


We will know that a sample of size 5000 is obtained if the row headers font turned blue
and the last row of the data is random_number2 = 5000. All other observations in the data
are hidden.

9. Lastly, we add a new worksheet for our final sample data.

10. In Sheet 1, we copy the filtered data and paste it on Sheet 2.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 27


11. Delete the columns random_number, random_number2, and rank and the final random
sample of size 5000 is obtained. Rename Sheet 1 as Population Data and Sheet 2 as
Random Sample Data. Click SAVE.

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 28


REFERENCES

Books:
Mann, P.S. (2016). Introductory Statistics (9th ed.). John Wiley & Sons, Inc.

Mendenhall III, W., Beaver, R.J., & Beaver, B.M. (2020). Introduction to Probability and Statistics: Metric
Version (15th ed.). Cengage Learning, Inc.

Online References:
https://psa.gov.ph/statistics/quickstat/national-quickstat/all/*

https://www.rappler.com/nation/octa-research-filipinos-covid-19-vaccine-willingness-february-2021
https://www.questionpro.com/blog/probability-sampling/
https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Book%3A_Introductory_Statistics_(Shafer_a
nd_Zhang)/06%3A_Sampling_Distributions/6.03%3A_The_Sample_Proportion

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 20


29
MODULAR ACTIVITY 2: Sampling Distributions

𝟑𝟓
Name: _________________________________________________ Score: ________________

Course/Year/Section: ____________________________________ Date: _________________

INSTRUCTION: Answer the following items on your answer sheet. Show all necessary computations.

1. The GPAs of all 5,540 students enrolled at a university have an approximate normal
distribution with a mean of 3.02 and a standard deviation of 0.29. Let 𝑥̅ be the mean GPA of
a random sample of 60 students selected from this university.
1.a. Calculate the mean and standard deviation of 𝑥̅ , and comment on the shape of its
sampling distribution. (10pts)
1.b. Compare the shape of the sampling distribution of 𝑥̅ with a sample size of 60 and that
with a sample size of 25. (3pts)

2. According to the National Association of Colleges and Employers Spring 2015 Salary Survey,
the average starting salary for college graduates in 2014 was $48,127. Suppose that the
mean starting salary of all college graduates in 2014 was $48,127 with a standard deviation
of $9,200, and that this distribution is strongly skewed to the right. Let 𝑥̅ be the mean starting
salary of 100 randomly selected college graduates in 2014.
2.a. Calculate the mean and standard deviation of 𝑥̅ . (6pts)
2.b. Describe the shape of the sampling distribution of 𝑥̅ . (3pts)
2.c. If the sample size in getting 𝑥̅ is 25, describe the shape of the sampling distribution of 𝑥̅ .
(3pts)

3. According to a Gallup poll conducted January 5–8, 2014, 67% of American adults were
dissatisfied with the way income and wealth are distributed in America. Assume that this
percentage is true for the current population of American adults. Let 𝑝̂ be the proportion in
a random sample of 350 American adults who hold the above opinion. Calculate the mean
and standard deviation of the sampling distribution of 𝑝̂ and describe its shape. (10pts)

STATISTICAL ANALYSIS WITH SOFTWARE ANALYSIS | 21


30

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy