Sta 221 Part I
Sta 221 Part I
STA 221
Statistical Inference II
Texts
(1) Afonja B., Olubusoye O. E., Ossai E. and Arinola J. (2014):
“Introductory Statistics – A Learner’s Motivated Approach.” Evans
Brothers Ltd. Ibadan.
(2) Hamburg Morris (1970). “Statistics Analysis for Decision
Making.” New York: Harcourt, Brace & World, Inc.
(3) Hogg R. V and Craig A. T (1970). “Introduction to Mathematical
Statistics.” 3rd Edition. New York: Macmillan Publishing Co., Inc.
London: Collier Macmillan Publishers.
(4) Hogg R. V and Craig A. T (1995). “Introduction to Mathematical
Statistics.” 5th Edition. London: Prentice-Hall, Inc.
(5) Hogg R. V and Tanis E. A (1993). “Probability and Statistical
Inference.” New York: Macmillan Publishing Company.
(6) Larson Harold J (1982). “Introduction to Probability Theory and
Statistical Inference.” Third Edition. New York: John Wiley &
Sons.
(7) Lindgren Bernard W (1976). “Statistical Theory” Third Edition.
New York: Macmillan Publishing Co., Inc.
(8) Montgomery C. Douglas and Runger C. George (2003).
“Applied Statistics and Probability for Engineers.” 3rd Ed. John
Wiley & Sons, Inc. New York.
LECTURE ONE
Sampling
Introduction
Have you ever tasted a hot soup and decided whether the soup was tasty or not?
If yes, then you are a sampler. Sampling is a part of our day-to-day life, which
we use either advertently or inadvertently. Another example is a pathologist
who takes a few drops of blood and tests for any abnormality in the blood of the
whole body. The process of using information obtained from the smaller
quantity to make statement about the larger quantity is called sampling. In this
lecture, we shall examine why this process is sometimes necessary and the
various techniques for doing it. We shall first learn some fundamental concepts,
which are related to sampling.
Objectives
At the end of this lecture, you should be able to:
1. distinguish between census, population and sample;
2. discuss the reasons for sampling; and
3. discuss the various procedures of sampling.
Pre-Test
1. Have you heard of census before? What do you understand by it?
2. Mention different kinds of statistical investigations you are familiar
with.
CONTENT
A. Some Basic Concepts
A1. Census
A census involves a complete count (or a complete enumeration) of every
individual member of the population of interest, such as persons in a country,
households in a town, shops in a city, students in a college, and so on. Apart
from the cost and the large amount of resources (such as enumerators, clerical
assistance, etc.) that are required, the main problem is the time required to
process the data. Thus, the results are not known immediately.
A2. Population
In statistical sense, population is a group of items, units or subjects, which is
under reference of study. It is often referred to as universe by a number of
statisticians and scientists. The inhabitants of a region, number of cars in a city,
workers in a factory, students in a university, insects in a field, etc., are few
examples of populations. Generally, populations or universe is classified into
four categories:
Finite population- the number of items or units is fixed, limited and countable,
e.g. workers in a factory.
Infinite population- the number of items or units is uncountable, e.g. stars in the
sky.
Real population- the items or units in the population are all physically present
or visible.
Hypothetical population- the population results from repeated trials, e.g. the
tossing of a coin repeatedly results into a hypothetical population of heads and
tails, rolling of a die again and again gives rise to a hypothetical population of
numbers from 1 to 6, etc.
A3. Sample
A sample is a part or fraction of a population selected on some basis. In
principle, a sample should be such that it is a true representative of the
population. The process of selecting a sample from the population is called
sampling, and the manner or scheme through which the required number of
units is selected is called the sampling method. The foremost purpose of
B. Sampling Methods
Several sampling methods are available, which are classified into two
categories:
Summary
In this lecture, we have defined some fundamental concepts such as census,
population and sample. To understand the characteristics of any population with
absolute accuracy, we need to have all the possible relevant information about
every member of that population. This is usually not possible because such
information is either not available or the task of data collection is not desirable
in terms of time and/or money. So we settle for a part or a fraction of the
population which is called sample. Sampling has been defined as a process of
selecting units from the population, and there are several techniques or methods
of doing this. The methods are classified into random sampling and non-
random sampling.
Post-Test
1. Briefly explain:
a. The fundamental reason for sampling
b. Some of the reasons why a sample is chosen instead of testing the
entire population.
2. Distinguish between sampling and non-sampling errors. What are their
sources? How can these errors be controlled?
3. To study the average effect of fish on human cholesterol level (in
blood), a researcher randomly selects 500 males of 25 years of age who
have never taken fish more than once a week and measures their
cholesterol level. The researcher then serves all the individuals 8 ounces
of fish everyday for one year. After one year the researcher measures the
cholesterol level of each individual again, and calculates the difference
with the year before value (difference=pre-diet level minus post diet
level). Determine the
a. population
b. sample
c. variable under study and
d. the parameter of interest
4. List and explain the various sampling methods.
5. Define the following:
a. sampling unit
10
b. sampling frame
c. sampling interval
d. sampling method
e. sampling error
11
LECTURE TWO
Introduction
In the previous lecture, we discussed population and sampling. Imagine that
you have a large population to study and the description of its characteristics is
not possible by census method. Then, in order to make statistical inference,
samples of given size are drawn repeatedly from the population and ‘statistic’
computed for each sample. The computed value of a particular statistic will
differ from sample to sample. This implies that, if the same statistic is
computed for each of the samples, the value is likely to vary from sample to
sample. Thus, it would be theoretically possible to construct a frequency table
showing the values assumed by the statistic and their frequency of occurrence.
This distribution of values of a statistic is called a sampling distribution,
because the values are the outcome of a process of the sampling. Since the
values of statistic are the results of several simple random samples, therefore
they are random variables.
Objectives
At the end of this lecture, you should be able to:
1. explain the concept of a sampling distribution; and
2. explain the concept of standard error and differentiate it from standard
deviation;
Pre- Test
1. Distinguish between parameter and statistic and give example each.
2. Give the formula for sample mean of n observations.
3. Sample standard deviation of the variate values x1 , x2 ,..., xn can be
computed from the formula.
12
CONTENT
Population Distribution
The population distribution is the distribution of values of its members and has
mean denoted by μ, variance σ2 and standard deviation σ. For example, a
population consisting of the numbers 0, 2, 4 and 6 has mean μ = 3 and standard
deviation σ = 5 .
Example
A population consists of the numbers 0, 2, 4 and 6, List all possible samples of
size 2 that can be drawn
1. with replacement
2. without replacement
Solution
1. The population size N = 4 and sample size n = 2, therefore, 42 =16
possible samples can be drawn with replacement. The list of the possible
samples is given as follow:
sample number sample elements
1 0, 0
2 0, 2
3 0, 4
4 0, 6
13
5 2, 0
6 2, 2
7 2, 4
8 2, 6
9 4, 0
10 4, 2
11 4, 4
12 4, 6
13 6, 0
14 6, 2
15 6, 4
16 6, 6
2. The population size N = 4 and sample size n = 2, therefore,
4C2 = 2!(44!−2)! = 6 possible samples can be drawn without replacement. The
list of the possible samples is given as follow:
sample number sample elements
1 0, 2
2 0, 4
3 0, 6
4 2, 4
5 2, 6
6 4, 6
Sampling Distribution of a sample statistic
If a particular statistic (e.g. sample mean, sample standard deviation, etc.) is
computed for each of the possible samples, the value of the statistic will differ
from sample to sample. Thus, it would be theoretically possible to construct a
frequency table showing the values assumed by the statistic and their frequency
of occurrence. This distribution of values of a statistic is called a sampling
distribution. Thus, we see that there would be an overall mean (where it is
centered), a standard deviation (representing the spread) and a shape if the
histogram is plotted. So, we can talk of the mean of sampling distribution of a
statistic (denoted m if m is the statistic), and standard deviation of sampling
14
Summary
In this lecture, we have learnt that:
1. there are Nn possible samples of size n that can be drawn with
replacement from a population having N elements;
N!
2. there are N C = possible samples of size n that can
n
n !( N − n)!
be drawn without replacement from the population having N
elements.;
3. sampling distribution is the probability distribution of all
possible values of a given statistic from all the distinct
possible samples of equal size drawn from a population.; and
4. standard error of statistic measures the amount of chance error
in the sampling process.
Post- Test
1. A population consists of the following numbers 12, 7, 9, 11, and 13.
15
16
LECTURE THREE
Introduction
The sample mean is referred to as the point estimate of the population mean.
For example, if you are interested in the mean rent charged for a 2-bedroom
apartment in the Bodija area of Ibadan, you may obtain a random sample and
from that sample you obtain the sample mean. This sample mean is one number
which estimates the population mean rent for 2-bedroom apartment in the area.
The sampling distribution of the mean refers to the distribution of all the
possible sample means that could be obtained if you select all possible samples
of a given size. In general, the sampling distribution of the sample mean
depends on the distribution of the population from which the sample is drawn.
If a population is normally distributed, then the sampling distribution of the
sample mean is also normally distributed regardless of the sample size. Even if
the population is not distributed normally, the sampling distribution of the
sample mean tends to be distributed normally as the sample size is sufficiently
large.
Objectives
At the end of this lecture, you should be able to:
1. list the properties of the sampling distribution of the sample mean;
2. determine the sampling distribution of mean when population has
normal distribution;
3. determine the sampling distribution of mean when population has non-
normal distribution; and
4. determine the sampling distribution of the difference between two
sample means.
17
Pre-Test
1. What are the parameters of normal distribution? What information is
provided by these parameters?
2. What are the chief properties of normal distribution? Describe briefly
the importance of normal distribution in statistical analysis.
CONTENT
A. Properties of the Sampling Distribution of the Sample Mean
There are three very important properties associated with the sampling
distribution of the sample mean. These properties are the centre, spread and
shape of the sampling distribution.
1. Centre: The sample mean is an unbiased estimator
The arithmetic mean X of sampling distribution of mean values (also called
mean of means) is equal to the population mean μ regardless of the form of
population distribution, that is, X = μ.
Example 1
A population consists of the numbers 0, 2, 4 and 6. The population mean μ = 3.
Now, consider all possible samples of size 2 without replacement from the
population and their means as shown in the following table.
sample number sample elements sampling distribution of mean
1 0, 2 1
2 0, 4 2
3 0, 6 3
4 2, 4 3
5 2, 6 4
6 4, 6 5
The arithmetic mean of sampling distribution of mean value
is X = 1+ 2+3+63+ 4+5 = 3 .
➢ If the population is finite and the samples of fixed size n are drawn
without replacement, then X = n NN−−1n , where NN −−1n is called the
finite correction factor.
19
z=
(X 1 (
− X 2 ) − X1 − X 2 ) = (X 1 − X 2 ) − ( 1 − 2 )
X −X 1 2
X −X
1 2
where
X − X = X − X = 1 − 2 mean of sampling distribution of difference of
1 2 1 2
two means
12
X − X = X2 + X2 = + n22 standard error of sampling distribution of
2
1 2 1 2 n1
20
and
A2
X + nBB =
2
A−XB
= nA
40,000
50 + 90,000
100 = 41.23
Post-Test
1. What are the properties of the sampling distribution of the sample mean?
2. Random samples of size 2 are taken from the finite population which
consists of the numbers 0, 2, 4, 6, 8, and 10.
a. Show that the mean and the standard deviation of this population are
μ = 5 and σ = 35 3 .
b. List the 15 possible samples of size 2 that can be taken from this
finite population and calculate their respective means.
c. Calculate the mean and the standard deviation of the sampling
distribution of means obtained in b.
3. The finite population in 2 above can be converted into an infinite
population if we sample with replacement.
a. List the 36 possible samples of size 2 that can be drawn with
replacement from the population.
b. Calculate the mean of each of the 36 samples obtained in part a, and
construct the sampling distribution of the mean.
c. Calculate the mean and standard deviation of the sampling
distribution of means obtained in b.
4. Assume that the heights of 300 soldiers in an army battalion are
normally distributed with mean 68 inches and standard deviation 3
inches. If 80 samples consisting of 25 soldiers each are taken, what
would be the expected mean and standard deviation of the resulting
sampling distribution of means if the sampling is done (a) with
replacement and (b) without replacement?
22