Chapter 5: Sampling Distributions & Hypothesis Testing
Chapter 5: Sampling Distributions & Hypothesis Testing
Sampling Distributions
And
Hypothesis Testing
5.1 Introduction
Sampling is a statistical method of obtaining representative data (observations) from a
group. We have been using sampling concepts in our day to day lives knowingly or
unknowingly; for instance we take a handful of rice to check the rice quality of the full
lot. This is an example of random sampling from a large population.
Population (Universe): The group of
objects (individuals) under study is called
population or universe. Universe may be
finite or infinite.
Sample: A part containing objects
(individuals), selected from the
population is called a sample.
Random Sampling: The selection of
objects (individuals) from the universe in
such a way that each object (individual)
of the universe has the same chance of being selected is called random sampling. Lottery
system is the most common example of random sampling.
Simple Sampling: Simple sampling is a special case of random sampling in which each
event has same probability of success or failure.
Note: Every random sampling need not be simple. For example if balls are drawn
without replacement from a bag of balls containing different balls; the probability of
success changes in every trial. Thus the sampling though random is not simple.
Hypothesis: A hypothesis is an assumption based on insubstantial evidences that lends
itself to further testing and experimentation. For example a farmer claims significant
increase in crop production after using a particular fertilizer and after a season of
experimenting, his hypothesis may be proved true or false. Any hypothesis may be
accepted or rejected as per specific confidence levels and must be admissible to
refutation.
Null Hypothesis: A hypothesis which is tested for possible rejection under the
assumption of being true is known as null hypothesis. Usually the null hypothesis is
stated as ‘There is no relationship between two quantities’. It is denoted by .
Alternative Hypothesis: It is the opposite statement of null hypothesis and denoted
by .
Significance levels : The probability levels below which we reject a hypothesis are
called levels of significance. Most common significance levels employed in hypothesis
testing are , , in which critical (rejection) regions occupy
5% , 1% and 0.27% areas of normal curve respectively.
One Tailed and Two Tailed Tests: While testing statistical significance levels; one-
tailed test and a two-tailed test are used for accepting or rejecting a hypothesis. One-
tailed tests are used for asymmetric distributions (reference value is unidirectional) which
have a single tail; such as the chi-square distribution.
A two-tailed test is appropriate if the estimated value may lie on both sides of reference
value. Two-tailed tests are only applicable when the probability curve has two tails; such
as normal distribution.
Testing of Hypothesis:
Testing of statistical hypothesis is a procedure designed for accepting or rejecting a
hypothesis on the basis of some preset values.
Step1: Plant Null Hypothesis and alternate hypothesis (optional) where is the
hypothesis of no difference, i.e. presumes that there is no significant difference
between observed value and expected value.
Step2: Find the most befitting test statistic for the analysis.
Step3: Take a random sample and compute the test statistic.
Step4: is accepted if the value of test statistic lies in acceptance zone and rejected if it
falls in critical (rejection) region at the desired significance level.
5.2 Sampling Distributions
A sampling distribution is a distribution of all of the possible values of a statistic;
computed from randomly drawn samples of the same size from a population.
Some commonly used notations in sampling distributions are given below:
Population Sample
Size
Mean
Variance
Standard Deviation
Suppose we take various samples each of size from a population. If and be the
probabilities of success and failure of each member of the sample, then the binomial
distribution given by provides the sampling distribution of the number of
successes in the sample with mean and variance .
Mean (expected value) of number of successes
Standard deviation .
Probable occurrence range at 99.73% confidence level i.e. 0.27% significance
level is given by:
Probable occurrence range at 99% confidence level i.e. 1% significance level is
given by:
Probable occurrence range at 95% confidence level i.e. 5% significance level is
given by:
In case of proportion of successes, mean and standard deviation of proportion of
successes are obtained by dividing each statistic by .
Mean (expected value) of proportion of successes
Standard deviation
Probable occurrence range of the proportion at 99.73% confidence level i.e. 0.27%
significance level is given by:
Probable occurrence range of the proportion at 99% confidence level i.e. 1%
significance level is given by:
Probable occurrence range of the proportion at 95% confidence level i.e. 5%
significance level is given by:
Standard Error: The standard deviation of the sampling distribution of a statistic is
known as Standard Error (S.E.).
Precision: Reciprocal of standard error is known as precision.
Probable Error: It is taken as 0.67449 times the standard error and is used sometimes to
explain the concept of sampling errors to layman or unprofessional people.
5.3 Sampling of Attributes for large samples (n > 30)
Characteristics like language, religion, habits (traits) etc. cannot be measured in numbers
as they are attributes. Sampling of attributes means testing how many in a population
possess a particular attribute (trait) or whether the two populations share an attribute
(trait) in common and to how much confidence level.
When sample size ( ) is very large i.e. greater than 30 and neither nor are very small,
the binomial distribution tends to normal distribution and therefore we choose the
variate as test statistic.
Following procedure is adopted for testing the significance of large samples in terms of
attributes.
Step1: Postulate the null hypothesis ( ); if required.
Step2: If is the observed number of successes in a sample and is the standard normal
variate, then , i.e.
0.27%
1%
5%
In case of absence of any specified significance level, we may consider 0.27% level, i.e.
take acceptance range as .
Example 1 A coin is tossed 400 times and turns up head 216 times. Discuss whether the
coin may be unbiased one.
Solution: Let : coin is unbiased
Here , if denotes probability of success; i.e. getting a head,
; i.e.
; i.e. 2.69
0.085 to 0.175
Probable percentage of bad oranges in the consignment is 8.5% to 17.5%
Example4 A random sample of 100 bolts was taken from the lot manufactured by a
machine and 10 were found to be defective. Find the 95% confidence limits for the
proportion of defective bolts produced by the machine.
Solution: Let denote the proportion of defective bolts in the given sample
Probable limit of defective bolts in the lot at 95% confidence level is given by:
0.0412 to 0.1588
Probable percentage of proportion of defective bolts in the lot at 95% confidence level
is 4.12% to 15.88%
Example5 A sample of 900 days is taken from metrological records of a district and 100
of them are found to be foggy. What is the probable percentage of foggy days in
the district?
Solution: Let denote the probability of a foggy day in the district, then
Here ,
, 0.155
0.1712
2.8699
1.53
then and
Result : If all possible samples of size are drawn with replacement from a finite
population of size and if and denote population mean and standard deviation
respectively;
Population variance ( )
, also
Proof: Since the population is normally distributed, for any object of the population;
and
Also
Standard Error: The standard deviation of the sampling distribution is called the
standard error Standard error of sampling distribution is
The mean of the sample means will be the same as population mean from which
the samples were drawn, i.e.
The variance of the sampling distribution of will be equal to the variance of the
population divided by the sample size i.e.
Result : Central Limit Theorem: As the sample size gets large enough (30 or
higher); the sampling distribution becomes approximately normal regardless of shape of
population.
Remark: For large samples ( ), probability distribution is taken as normal for
computational purposes.
Example 10 A population has mean 0.1 and standard deviation 2.1. Find the probability
that the mean of a random sample of size 900 will be negative.
Solution: Given that population mean and standard deviation
Since the sample size is large enough, sampling distribution is approximately normal
with mean 0.1 and standard deviation 0.07
i.e. and
Example 11 Suppose a population has mean 10 and variance 4. What is the probability
that the sample of size 36 has mean lying between 9.8 and 10.2?
Solution: Given that population mean and standard deviation . Since
the sample size is large enough, sampling distribution is approximately normal with mean
10 and standard deviation 0.33
i.e. and
Example 12 A firm produces electric bulbs that have normally distributed mean burning
life 800 hours with a standard deviation of 40 hours. Find the probability that a random
sample of 16 bulbs will have average burning life of less than 775 hours.
Solution: Given distribution is normal with population mean and standard
deviation , sample size
0.27%
1%
5%
Remark: The statistic parameter can also be used to check whether the sample
taken from the given population is random or not.
Example 13 A sample of size 900 is having mean 3.6 mm; could it be reasonably
regarded as a random sample from large population whose mean is 3.35 mm and standard
deviation 2.6 mm at 1% significance level.
Solution: Let : sample belongs to the given population
Here population mean mm, population standard deviation mm,
also sample size and sample mean mm
Now
Now
i.e. and
Then
Also
Standard Error ( )
Then
Also
Standard Error ( )
Example 16 A random sample of 150 villages was taken from a district A having
standard deviation 32 and average population per village was found to be 440. Another
random sample of 250 villages from district B with a standard deviation of 56 gave an
average population of 480 per village. Is the difference between the averages of two
populations significant? Give reasons.
Solution: Let : The differences between averages of two populations is not significant,
i.e.
Here , ,
, ,
1. A coin is tossed 400 times and head turns up 225 times. Discuss whether the coin
is biased or unbiased at 5% level of significance.
2. A random sample of 600 oranges was taken from a large consignment and 60 were
found to be rotten. Show that the standard error of the proportion of bad ones in a
sample of this size is 0.1 and deduce that the percentage of bad oranges in the
consignment almost lies between 6.3 and 13.7
3. In a city 20% of a random sample of 900 school children wore spectacles and in
another city 18.5% of a random sample of 1600 school children used to wear
spectacles. Is the difference between the proportions significant?
4. In a sample of 500 people from a state 280 take tea and rest take coffee. Can we
assume that tea and coffee are equally popular in the state?
5. A sample of 900 members is found to have a mean of 3.4cm. Can it be reasonably
regarded as truly random sample from a large population with mean 3.25cm and
S.D. 1.61cm.
6. A sample of 100 electric bulbs produced by a manufacturer showed a mean life
time 1190 hours with a standard deviation of 90 hours. Another sample of 75
bulbs produced by manufacturer showed a mean life time 1230 hours with a
standard deviation of 120 hours. Is there a difference between the mean life times
of two brands at 5% level of significance?
7. The means of two large samples of 1000 and 2000 members are 168.75 cm and
170cm respectively. Can these be regarded as drawn from the same population of
standard deviation 6.25 cm.
8. A stenographer states that he can take dictation at the rate of 120 words per
minute. Can we accept his claim on the basis of 100 trials in which he showed a
mean of 116 words with standard deviation of 15 words ?
9. A sample of height of 6400 soldiers has a mean of 67.85 inches and a standard
deviation of 2.56 inches, while a random sample of heights of 1600 sailors has a
mean of 68.55 inches and a standard deviation of 2.52 inches. Does this indicate
that the sailors are on average taller than the soldiers?
10. A random sample of 400 students has an average weight of 55 kg. Can we say that
the sample comes from a population with mean 58 kg. with a variance of 9 kg. ?
11. In a big city two samples of people are drawn. First sample of size 100, the
average daily income of people is 210$ with a standard deviation 10$ and in the
second sample of size 150 persons, average daily income is 220$ with a standard
deviation of 11$. Test if there is any significant difference in average incomes.
Answers
1. Biased
3. the difference is not significant
4. the difference is highly significant
5. it cannot be regarded as a random sample.
6. Yes
7. No
8. The claim is not acceptable
9. Yes
10. No
11. the difference is highly significant.