0% found this document useful (0 votes)
274 views98 pages

Hypothesis Testing II

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
274 views98 pages

Hypothesis Testing II

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

7

CHAPTER

Hypothesis Testing
INSIDE THIS CHAPTER
7.1. Sampling Distributions 7.2. Standard Error 7.3. Test of Significance 7.4. Testing of Statistical
Hypothesis 7.5. One-Tailed and Two-Tailed Tests 7.6. Errors in Sampling 7.7. Test of Significance For Large
Samples 7.8. Test of Significance for Small Samples 7.9. The ‘t’ Distribution or Student’s ‘t’ Distribution 7.10. Application
2
of the t-Distribution 7.11. Variance Ratio or F-Test 7.12. Analysis of Variance 7.13. Chi-Square Test (χ –
2 2
Test) 7.14. Characteristics of Chi-Square (χ ) Distribution 7.15. Uses of Chi-Square (χ ) 7.16. Conditions for
2
Applying Chi-Square (χ ) Test 7.17. Degree of Freedom 7.18. Chi Square Test of Goodness of FIT 7.19. Chi
Square Test as a Test of Independence

7.1. SAMPLING DISTRIBUTIONS


The sampling distribution of a statistic is the distribution of that statistic, considered as a
random variable, when derived from a random sample of size n. It may be considered as the
distribution of the statistic for all possible samples from the same population of a given size.
The sampling distribution depends on the underlying distribution of the population, the statistic
being considered, the sampling procedure employed and the sample size used. There is often
considerable interest in whether the sampling distribution can be approximated by an asymptotic
distribution, which corresponds to the limiting case as n → ∞.
For example, consider a normal population with mean µ and variance σ2. Assume we
repeatedly take samples of a given size from this population and calculate the arithmetic mean
x for each sample — this statistic is called the sample mean. Each sample has its own average
value, and the distribution of these averages is called the “sampling distribution of the sample
mean”. This distribution is normal N(µ, σ2/n) since the underlying population is normal, although
sampling distributions may also often be close to normal even when the population distribution
is not (see central limit theorem). An alternative to the sample mean is the sample median. When
calculated from the same population, it has a different sampling distribution to that of the mean
and is generally not normal (but it may be close for large sample sizes).
The mean of a sample from a population having a normal distribution is an example of a
simple statistic taken from one of the simplest statistical populations. For other statistics and
other populations the formulas are more complicated, and often they don’t exist in closed-form.
In such cases the sampling distributions may be approximated through Monte-Carlo simulations,
bootstrap methods, or asymptotic distribution theory.
233
234 L Probability and Statistics L

7.2. STANDARD ERROR


The standard deviation of the sampling distribution of a statistic is known as the standard
error. Standard error is used to measure the variability of the values of a statistic computed from
the samples of the same size drawn from the population, whereas standard deviation is used to
measure the variation of the observations of the population itself. The term may also be used
to refer to an estimate of that standard deviation, derived from a particular sample used to
compute the estimate. Standard error plays an important role in the theory of large samples and
it forms a basis of the testing of hypothesis.
The following two standard errors are frequently used in statistics:
1. Standard error of the Mean: The standard error of the mean is the standard deviation
of the sample-mean estimate of a population mean. Standard error of the mean is usually
estimated by the sample estimate of the population standard deviation (sample standard deviation)
divided by the square root of the sample size (assuming statistical independence of the values
in the sample):
s
SE x =
n
where s is the sample standard deviation and
n is the size (number of observations) of the sample.
This estimate may be compared with the formula for the true standard deviation of the
sample mean:
σ
SD x =
n
σ is the standard deviation of the population
2. Standard Error of Proportion: It is computed from the populations of all possible
samples of the same size drawn from a population. It is denoted by

SE p = P(1 − P)
, if P is known
n
where P = population proportion, n = sample size

7.3. TEST OF SIGNIFICANCE


In applied investigations, one is often interested in comparing some characteristics (such as
the mean, the variance, or a measure of association between two characters) of a group with
specified value, or in comparing two or more groups with regard to the characteristic. For
instance, one may wish to compare two varieties of wheat with regard to the mean yield per
hectare of wheat.
Once sample data has been gathered through an observational study or experiment, statistical
inference allows analysts to assess evidence in favour or some claim about the population from
which the sample has been drawn. The methods of inference used to support or reject claims
based on sample data are known as tests of significance. An important property of the sampling
theory is to study the test of significance which will enable us to decide, on the basis of the
results of the sample, whether
L Hypothesis Testing L 235
(i) the deviation between the observed sample statistic and the hypothetical parameter value
or
(ii) the deviation between two sample statistics is significant or might be attributed due to
chance or the fluctuations of the sampling.
Statistical Hypothesis: A hypothesis is a preconceived idea about the nature of a population
or about the value of its parameters. The statements like the distribution of heights of students
of a university is normally distributed, number of road accidents per day in Delhi is 15, etc.,
are some examples of a hypothesis. For the purpose of decision-making a hypothesis has to be
verified and then accepted or rejected. This is done with the help of observations. We test a
sample and make a decision on the basis of the result obtained. In other words, it is some
assumption or statement, which may or may not be true, about a population or about the
probability distribution characterizing the given population, which we want to test on the basis
of the evidence from a random sample.

7.4. TESTING OF STATISTICAL HYPOTHESIS


The testing of hypothesis is a procedure that helps us to ascertain the likelihood of hypothesized
population parameter being correct by making use of sample statistic. In other words, testing a
statistical hypothesis on the basis of a sample enables us to decide whether the hypothesis should
be accepted or rejected. The sample data enable us to accept or reject the hypothesis. Since the
sample data give incomplete information about the population the result of the test need not be
considered to be final or unchallengeable. The procedure which, on the basis of sample results,
enables us to decide whether a hypothesis is to be accepted or rejected is called Hypothesis
Testing. The following are the steps involved in hypothesis testing problems:
Step 1: Set Null and Alternative Hypothesis
For applying the test of significance, we first set up a hypothesis which is a definite
statement about the population parameter called null hypothesis. The null hypothesis is set up
in testing a statistical hypothesis only to decide whether to accept or reject the null hypothesis.
It asserts that there is no difference between the sample statistic and population parameter
and whatever difference is there, is attributed to sampling errors. It is denoted by H 0. Null
hypothesis is the hypothesis which is tested for possible rejection under the assumption that
it is true. Any hypothesis which is complementary to the null hypothesis (H 0) is called
alternative hypothesis. It is denoted by H1. In other words, the negation of null hypothesis
is the alternative hypothesis. When null hypothesis is found to be true, the alternative
hypothesis must be false or when null hypothesis is found to be false, the alternative hypothesis
must be true.
The null hypothesis is always expressed in the form of an equation, which makes a claim
regarding the specific value of the population. Symbolically, a null hypothesis is represented as
H0 : µ = µ0
where µ is the population mean and µ0 is the hypothesized value of the population mean.
Alternative hypothesis will be
H1 : µ ≠ µ0
Consequently,
H1 : µ < µ0; H0 : µ > µ0
236 L Probability and Statistics L
Step 2: Computation of Appropriate Statistical Test
After setting up of null and alternative hypothesis, we compute the test statistic that will be
used for statistical analysis. A statistic is calculated from the sample. To begin with we assume
that the hypothesis about the population parameter is true. We compare the value of the statistic
with the hypothetical value of the parameter. If the difference between them is small, the
hypothesis is accepted and if the difference between them is large, the hypothesis is rejected.
A statistic on which the decision can be based whether to accept or reject a hypothesis is called
test statistic. It is important to remember that a test-statistic does not prove the hypothesis to
be correct but if furnishes as evidence against the hypothesis. Some of the test statistics to be
discussed later are Z, t and Chi-Square.
Step 3: Set the Level of Significance
The next step is fixation of level of significance. The calculation of statistical significance
(significance testing) is subject to a certain degree of error. The researcher must define in advance
the probability of a sampling error (which exists in any test that does not include the entire
population). Sample size is an important component of statistical significance in that larger samples
are less prone to flukes. Only random, representative samples should be used in significance testing.
The level at which one can accept whether an event is statistically significant is known as
the significance level. In other words, the permissible value of probability above which we do
not reject the null hypothesis is termed as the level of significance. The level of significance,
usually denoted by α (alpha), specified before the samples are drawn, so that the results obtained
should not influence the choice of the decision-maker. A low value of α, i.e., of the permissible
type I error (refer to Art. 7.5) means the test is conservative, or, that the null hypothesis is
rejected only when the evidence piles up against it. We usually test at 5% or 1% level of
significance.
Step 4: Establish Critical or Rejection Region
The next step for the researcher is to establish a critical or rejection region, which is the
area under the normal curve, divided into two mutually exclusive regions as shown in Fig. 7.1.
These regions are termed as acceptance region and the rejection (or critical) region.
Rejection region, (H0 is rejected)

Acceptance region (1 – α)
(H0 is accepted)

–z α µ = µ0 +z α
2 2

Critical Values
Fig. 7.1. Acceptance and rejection regions of null hypothesis (two-tailed test)
L Hypothesis Testing L 237
In other words, all possible values which a test-statistic may assume can be divided into two
mutually exclusive groups: one group consisting of values which appear to be consistent with the
null hypothesis and the other having values which are unlikely to occur if Ho is true. The first
group is called the acceptance region and the second set of values is known as the rejection region
for a test. The rejection region is also called the critical region. The value(s) that separates the critical
region from the acceptance region is called the critical value(s). The critical value which can be in
the same units as the parameter or in the standardized units, is to be decided by the experimenter
keeping in view the degree of confidence he (she) is willing to have in the null hypothesis.
Step 5: Decision Rule
The last step is the decision about the null hypothesis i.e., whether to accept it or reject it.
In this regards, we compare the calculated value of the test statistic (which was found I step 2)
with the critical value (also called the standard table value of test statistic as computed from
step 4) at level of significance α and decide as under:
(a) If we test the hypothesis at, say, 5% level of significance and the observed set of results
have a probability of more than 5%, we consider that the difference between the sample
statistic and the hypothesized population parameter as not significant. In other words, if the
calculated value of statistic is less than the tabled vale at a specified level of significance α,
then the difference is not significant and this difference may be due to fluctuation of
sampling. So we accept the null hypothesis and reject the alternative hypothesis.
(b) On the other hand, if the observed set of results have a probability of less than 5%, we
consider that the difference between the sample statistic and the hypothesized population
parameter as significant. In other words, if the calculated value of statistic is more than
the tabled value at a specified level of significance (say, 5%), the computed value of test
statistic falls in the rejection region. So we reject the null hypothesis and accept the
alternative hypothesis.

7.5. ONE-TAILED AND TWO-TAILED TESTS


A test of a statistical hypothesis, where the region of rejection is on only one side of the
sampling distribution, is called a one-tailed test. For example, suppose the null hypothesis states
that the mean is less than or equal to 10. The alternative hypothesis would be that the mean is
greater than 10. The region of rejection would consist of a range of numbers located on the right
side of sampling distribution; that is, a set of numbers greater than 10.
A test of a statistical hypothesis, where the region of rejection is on both sides of the
sampling distribution, is called a two-tailed test. For example, suppose the null hypothesis states
that the mean is equal to 10. The alternative hypothesis would be that the mean is less than 10
or greater than 10. The region of rejection would consist of a range of numbers located on both
sides of sampling distribution; that is, the region of rejection would consist partly of numbers
that were less than 10 and partly of numbers that were greater than 10.
Table shows the summary of certain values at various significance levels for test statistic Zα.
Level of Significance
1%(0.01) 5%(0.05) 10%(0.1)
Two tailed test |Zα| = 2.58 | Zα | = 1.96 | Zα | = –1.645
Right tailed test Zα = 2.33 Zα = 1.645 Zα = 1.28
Left tailed test Zα = –2.33 Zα = –1.645 Zα = –1.28
238 L Probability and Statistics L

7.6. ERRORS IN SAMPLING


The main aim of the sampling theory is to draw a valid conclusion about the population
parameters on the basis of the ample results. In doing this we may commit the two types of
errors:
Type I error. A Type I error occurs when the researcher rejects a null hypothesis when it
is true. The probability of committing a Type I error is called the significance level. This
probability is also called alpha, and is often denoted by α.

Decision Accept H0 Reject H0


H0 Accept true H0 Reject true H0
true Desirable Type I Error
H0 Accept false H0 Reject true H0
False Type II Error Desirable
Type II error. A Type II error occurs when the researcher fails to reject a null hypothesis
that is false. The probability of committing a Type II error is called Beta, and is often denoted
by β. The probability of not committing a Type II error is called the Power of the test.

7.7. TEST OF SIGNIFICANCE FOR LARGE SAMPLES


For large n (n > 30), almost all the distributions such as Binomial, Poisson, Negative
binomial etc., can be approximated very closely by a normal probability curve, we therefore use
the normal test of significance for large samples. If t is any statistic (function of sample values),
then for large sample
t − E (t )
Z = = N (0, 1)
V (t )
Thus, if the discrepancy between the observed and the expected (hypothetical) value of a
statistic is greater than Zα times the standard error (S.E.), hypothesis is rejected at α level of
significance. In other words, if
| Z | ≤ 1.96
then the hypothesis H0 is accepted at 5% level of significance.
Under large sample test, the following are the important tests to test the significance:
1. Hypothesis Testing for Single Population Mean
2. Hypothesis Testing for Difference between Two Population Means
3. Hypothesis Testing for Single Population Proportion
4. Hypothesis Testing for Difference between Two Population Proportions
5. Hypothesis Testing for Difference between Two Population Standard Deviations

7.7.1. Hypothesis Testing for Single Population Mean


A very important assumption underlying the tests of significance for variables is that the
sample mean is asymptotically normally distributed even if the parent population from which the
sample is drawn is not normal.
L Hypothesis Testing L 239
If xi (i = 1, 2, 3, ..., n) is a random sample of size from a normal population with mean
µ and variance σ , then the sample mean is distributed normally with µ and variance σ , i.e.,
2
2
n

x =
F σ IJ
N GH µ,
2

n K
H0 : There is no significant difference between sample mean ( x ) and population mean µ.
Test statistic :
x−µ
Z = ~ N (0, 1)
σ/ n
If σ is unknown, then it is estimated by sample variance i.e., σ = s (large n).
2 2 2

SOLVED EXAMPLES
SOLVED

Example 7.1. A random sample of 900 members has a mean 3.4 cms. Can it be
reasonably regarded as a sample from a large population of mean 3.25 cms and standard
deviation 2.61 cms?
Solution. Here n = 900, x = 3.4, µ = 3.25, σ = 1.61
H0 : The sample has been drawn from the normal population with mean µ = 3.2 and standard
deviation σ = 2.61.
H1 : µ ≠ 3.25 (two tailed test)
x−µ 3.4 − 3.25
Under H0, Z = = = 1.724
σ/ n 2.61 / 900
As the calculated value of | Z | = 1.724 < 1.96 the significant value of Z at 5% level of
significance, H0 is accepted i.e., the sample is drawn from the normal population with mean
µ = 3.2 and standard deviation σ = 2.61.
Example 7.2. A manufacturer claims that the average mileage of scooters of his company
is 40 kms/litre. A random sample of 38 scooters of the company showed an average mileage
of 42 kms/litre. Test the claim of the manufacturer on the assumption that the mileage of
scooter is normally distributed with a standard deviation of 2 kms/litre.
Solution. Here n = 38, x = 42 , µ = 40, σ=2
H0 : Mileage of scooter is normally distributed with a standard deviation of 2 kms/litre.
H1 : µ ≠ 40 (two tailed test).
x −µ 42 − 40
Under H0, Z = = = 6.16 .
σ/ n 2 / 38
As the calculated value of | Z | = 6.16 > 1.96 the significant value of Z at 5% level of
significance, H0 is rejected i.e. mileage of scooter is normally distributed with a standard
deviation of 2 kms/litre.
Example 7.3. A stenographer claims that she can type at the rate of 120 words per
minute. Can we reject her claim on the basis of 100 trials in which she demonstrates a
mean of 116 words with a standard deviation of 15 words? Use 5% level of significance.
Solution. Here n = 100, x = 116, µ = 120, s = 15
240 L Probability and Statistics L

H0 : Stenographer’s claim is true,


H1 : µ ≠ 120 (two tailed test)
Under H0,
x−µ x−µ
Z = = , since σ is not known
σ/ n s/ n

116 − 120
= = − 2.67
15 / 100
As the calculated value of | Z | = 2.67 > 1.96 the significant value of Z at 5% level of
significance, H0 is rejected i.e. stenographer’s claim is not true.
Example 7.4. The mean life of a sample of 400 fluorescent bulbs produced by a
company is found to be 1570 hours with a standard deviation of 150 hours. Test the
hypothesis that the mean life time of the bulbs produced by the company is 1600
against the alternative hypothesis that it is greater than 1600 hours at 1% levels of
significance.
Solution. Here n = 400, µ = 1600, x = 1570, s = 150
H0 : Mean life time of bulbs is 1600 hours, that is H0 : µ = 1600.
H1 : µ > 1600 (right tailed test)
Under H0,
x −µ x −µ
Z = = , since σ is not known
σ/ n s/ n

1570 − 1600 −30


= = = −4
150 / 400 7.5
As the calculated value of | Z | = 4 > 2.33 the significant value of Z at 1% level of
significance for the right tailed test (from the table), H0 is rejected i.e., the mean life time of
bulbs produced by the company is higher than 1600 hours.
Example 7.5. An ambulance service claims that it takes, on the average 8.9 minutes
to reach its destination in emergency calls. To check on this claim, the agency which
licenses ambulance services has then timed on 50 emergency calls, getting a mean of 9.3
minutes with a standard deviation of 1.8 minutes. Does this constitute evidence that the
figure claimed is too low at the 1% significance level ?
Solution. Let us consider the null hypothesis H0 that ‘the claim is same as observed’ and
alternative hypothesis is ‘claim is different than observed’. These two hypothesis are written as:
H0 : µ = 8.9 and H1 : µ ≠ 8.9
Given n = 50, x = 9.3 and s = 1.8. Using the Z-test statistic, we get

x −µ
Z =
s/ n

9.3 − 8.9 0.4


= = = 1.574
1.8 / 50 0.254
L Hypothesis Testing L 241
Since the calculated value of Z = 1.574 is less than its critical value 2.58 at 1% level, the
null hypothesis is accepted. Thus, there is no difference between the average time observed and
claimed.
Example 7.6. The average marks in Statistics of a sample of 100 students were 51 with
a S.D. of 6 marks. Could this have been a random sample from a population with average
marks 50?
Solution. Here n = 100, x = 51, µ = 50, s = 6; σ is unknown
H0 : The sample has been drawn from the normal population with mean µ = 50 i.e., H0 :
µ = 50.
H1 : µ ≠ 50 (two tailed test)
x−µ x−µ
Under H0, Z = = since σ is not known
σ/ n s/ n
51 − 50
= = 1.6666
6 / 100
As the calculated value of | Z | = 1.6666 < 1.96 the significant value of Z at 5% level of
significance, H0 is accepted i.e., the sample is drawn from the normal population with mean
= 50.
Example 7.7. Mice with an average lifespan of 32 months will live up to 40 months
when fed by a certain nutritious food. If 64 mice fed on this diet have an average lifespan
of 38 months and standard deviation of 5.8 months, is there any reason to believe that
average lifespan is less than 40 months.
Solution. From the given data, n = 64, µ = 32, x = 38, s = 5.8
Let the null hypothesis H0 : µ = 32
against the alternative hypothesis H1 : µ > 32 (use right tail test)
Level of significance α is 95% is 1.645.
x−µ
Then the test statistic Z =
S. E ( x )

x −µ x −µ
= =
σ/ n s/ n

38 − 32
=
5.8 / 64
= 8.27 > 1.645
The calculated value of Z is greater than the table value of Z. So we reject the null
hypothesis. Hence, we conclude that the average lifespan of mice is greater than 32 months, i.e.,
the nutritious food affects the average lifespan of mice.
Example 7.8. According to the norms established for a mechanical aptitude test,
persons who are 18 years old have an average height of 73.2 with a standard deviation of
8.6. If 45 randomly selected persons of that age averaged 76.7, test the null hypothesis
µ = 73.2 against the alternative hypothesis µ > 73.2 at the 0.01 level of significance.
242 L Probability and Statistics L
Solution. From the given data,
n = 45, x = 76.7, µ = 73.2, σ = 8.6
Null hypothesis H0 : µ = 73.2
Alternative hypothesis H1 : µ > 73.2 (Use right tailed test)
Level of significance = 99% or probability is 0.01 is 2.33 i.e., zα = 2.33
x−µ
The test statistic, Z =
σ/ n

76.7 − 73.2
=
8.6 / 45
= 2.73 > 2.33
Table value Zα = 2.33.
Calculated value of Z is greater than the table value of Z. So we reject the null hypothesis.
Hence we conclude that µ = 73.2.
Example 7.9. An oceanographer wants to check whether the depth of the ocean in a
certain region is 57.4 fathoms, as had previously been recorded. What can be conclude at
the level of significance α = 0.05, if reading taken at 40 random locations in the given
region yielded a mean of 59.1 fathoms with a standard deviation of 5.2 fathoms.

Solution. Given n = 40, x = 59.1, s = 5.2, µ = 57.4, and α = 0.05


Let the null hypothesis H0 : µ = 57.4
The alternative hypothesis H1 : µ ≠ 57.4
The level of significance α = 0.05 is 1.96.
Also σ is not given, we are taking s the sample s.d. of the population, since ‘n’ is large.

x−µ
∴ The test statistic, | Z | =
s/ n

59 .1 − 57 .4 1.7 × 6 .325
= =
5 .2 / 40 5.2
= 2.06 > 1.96
The calculated value of Z is greater than the table value of Z.
So we reject the null hypothesis. Hence the oceanographer concludes that the depth of the
ocean in a certain region is 57.4 rejected.
Example 7.10. A trucking firm suspects the claim that the average life time of certain
tyres is at least 28,000 miles. To check the claim the firm puts 40 of these tyres on its trucks
and gets a mean life time of 27,463 miles with a standard deviation of 1348 miles. What
can it conclude if the probability of a type 1 error is to be atmost 0.01 ?
Solution. From the given data,
µ = 28,000, n = 40, x = 27,463, s = 1348.
L Hypothesis Testing L 243
Null hypothesis, H0 : µ = 28,000
Alternative hypothesis H1 : µ < 28,000 (Use left tailed test)
The level of significance α = 0.01 is –2.33
x−µ
The test statistic, Z =
σ/ n

27463 − 28000
Z =
1348 / 40
= –2.52 < –2.33
So the calculated value of Z is less than the table value of Z. We reject the null hypothesis
(since it is left tail test).
Hence the claim is rejected.
Example 7.11. A sample of 100 iron bars is said to be drawn from a large number
of bars whose lengths are normally distributed with mean 4 feet and S.D. 0.6 ft ? If the
sample mean is 4.2 ft, can the sample be regarded as a truly random sample ?
Solution. From the given data,
n = 100,
µ = 4,
σ = 0.6,
x = 4.2
Let the null hypothesis H0 : µ = 4
Against the alternative hypothesis H1 : µ ≠ 4 (use two tail test)
Level of significance α is 0.05 is 1.96

x −µ
Then the test statistic |Z| =
S. E ( x )

x −µ
=
σ/ n
Therefore the test statistic,
x −µ
|Z| =
σ/ n

4 .2 − 4
=
0 .6 / 100
= 3.33 > 1.96
The calculated value of Z is greater than the table value of Z. So we reject the null
hypothesis. Hence we conclude that the sample does not come from the same population having
mean 4 and standard deviation of 0.6.
244 L Probability and Statistics L

EXER CISE 7
EXERCISE 7..1

1. Explain the concept of sampling distribution and standard error. Discuss the role of standard
error in the large sample theory.
2. What is test of significance? Explain the concept of standard error related to that.
3. Distinguish between:
(a) Parameter and Statistic
(b) Standard Deviation and Standard Error
(c) Left tailed test and right tailed test
(d) Type I error and Type II error
4. Distinguish between null and alternative hypothesis. State the null and alternative hypothesis
regarding population mean that lead to (i) left tailed test, (ii) right tailed test, and (iii) two-
tailed test.
5. Explain clearly the procedure followed in testing of a hypothesis.
6. A sample of 100 students is taken from a large population. The mean height of these
students is 65 inches and the standard deviation 4 inches. Can it be reasonably regarded
that the population mean height is 66 inches? [Ans. Difference is significant]
7. A random sample of 200 tins of groundnut oil gave an average weight of 4.95 kg with a
standard deviation 0.21 kg. Should we accept the hypothesis of net weight of 5 kg per tin
at 1% level of significance? [Ans. No]
8. The heights of college students in a city are normally distributed with S.D. 6 cms. A sample
of 1000 students has mean height 158 cms. Test the hypothesis that the mean height of
college students in the city is 160 cms. [Ans. H0 accepted at 5% level]
9. An auto company decided to introduce a new six cylinder car whose mean petrol
consumption is claimed to be lower than that of the existing auto engine. It was found that
the mean petrol consumption for the 50 cars was 10 km per litre with a standard deviation
of 3.5 km per litre. Test for the company at 5% level of significance, whether the claim the
new car petrol consumption is 9.5 km per litre on the average is acceptable.
[Ans. Company’s claim is acceptable]
10. It has previously been recorded that the average depth of ocean at a particular region is
67.4 fathoms. Is there reason to believe this at 0.01 level of significance if the reading at
40 random locations in that particular region showed a mean of 69.3 with standard deviation
of 5.4 fathoms. [Ans. Null hypothesis is accepted]
11. A sample of 64 students have a mean weight of 70 kg. Can this be regarded as a sample
from a population with mean weight 65 kgs and standard deviation 25 kg.
[ Ans. Null hypothesis is accepted]
12. The mean breaking strength of the cables supplied by a manufacturer is 1800 with a standard
deviation of 100. By a new technique in the manufacturing process it is claimed that the
breaking strength of the cables has increased. In order to test this claim a sample of 50
cables is tested. It is found that the mean breaking strength is 1850. Can we support the
claim at 0.01% level of significance? [Ans. No]

7.7.2. Hypothesis Testing for Difference between Two Populations Means


Let x1 , x2 be the means of two independent samples of sizes n1 and n2 (both n1 and n2 are
large) from two different populations with standard deviations σ1 and σ2 respectively. Therefore
FG
x1 ~ N µ1 ,
σ12 IJ
,
FG
x2 = N µ 2 ,
σ 22 IJ
H n1 K H n2 K
L Hypothesis Testing L 245
The difference ( x1 − x 2 ) is also a normal variate.
The test statistic is given by
x1 − x2
Z = ~ N (0, 1)
σ12 σ 22
+
n1 n2
Under the null hypothesis. H0 : There is no difference between the population means i.e.,
H0 : µ 1 = µ2 .
x1 − x 2
Note 1: If σ1 = σ2 = σ, then Z =
1 1
σ +
n1 n 2
Note 2: If σ1 and σ2 are not known and σ1 ≠ σ2 the test statistic in this case is
x1 − x 2
Z =
s12 s2
+ 2
n1 n 2

n1 s12 + n 2 s 22
Note 3: If σ is not known and σ1 and σ2, we use σ 2 = to calculate σ.
n1 + n 2

SOLVED EXAMPLES
SOLVED

Example 7.12. A college conducted both day and night classes intended to be identical.
A sample of 100 day students yields examination results as under:
x1 = 72.4 and s1 = 14.8
A sample of 200 night students yields examination results as under :
x 2 = 73.9 and s2 = 17.9
Are the two means statistically equal at 10% level ?
Solution. Here n1 = 100, n2 = 200
x1 = 72.4, x 2 = 73.9
σ 1 = 14.8, σ 2 = 17.9
H0 : The two means are statistically equal i.e., H0 : µ 1 = µ 2
H1 : µ 1 ≠ µ 2 (two tailed test)

x1 − x 2 72 .4 − 73.9
Under H0, Z = =
s12 s 22 (14.8 ) 2 (17.9 ) 2
+ +
n1 n2 100 200

72 .4 − 73.9
= = − 0 .7704
1.95
As the calculated value of | Z | = 0.7704 < 1.645, the significant value of Z at 10% level
of significance, H0 is accepted i.e., the two means are statistically equal.
246 L Probability and Statistics L
Example 7.13. Mean and standard deviation calculated from the weights in kg. of
students of two groups taken from two universities are given below:

Mean S.D. Size


University A 55 10 400
University B 57 15 100
Test the significance of the difference between the means at 5% level.
Solution. Here, n1 = 400, n2 = 100
x1 = 55, x 2 = 57
s1 = 10, s2 = 15
H0 : The two means are statistically equal i.e., H0 : µ1 = µ2
H1 : µ1 ≠ µ2 (two tailed test)
x1 − x 2 55 − 57
Under H0, Z = =
s12 s 22 (10 ) 2 (15 ) 2
+ +
n1 n 2 400 100
55 − 57
= = − 1.2649
1.5811
As the calculated value of | Z | = 1.2649 < 1.96 the significant value of Z at 5% level of
significance, H0 is accepted i.e., the two means are statistically equal.
Example 7.14. A random sample of 1000 workers from South India show that their
mean wages are Rs. 47 per week with a standard deviation of Rs. 28. A random sample
of 1500 workers from North India gives a mean wage of Rs. 49 per week with a standard
deviation of Rs. 40. Is there any significant difference between their mean level of wages?
Solution. Here n1 = 1000, n2 = 1500
x1 = 47, x 2 = 49
s1 = 28, s2 = 40
H0 : There is no significant differences between two mean level of wages i.e., H0 : µ1 = µ2
H1 : µ1 ≠ µ2 (two tailed test)
x1 − x 2 47 − 49
Under H0, Z = =
s12 s 22 ( 28 ) 2 (40 ) 2
+ +
n1 n 2 1000 1500

47 − 49
= = − 1.47
1.36
As the calculated value of | Z | = 1.47 < 1.96 the significant value of Z at 5% level of
significance, H0 is accepted i.e., there is no significant differences between two mean level of
wages.
Example 7.15. A random sample of 200 villages was taken from a certain district and
the average population per village was found to be 485 with standard deviation of 50.
Another random sample of 200 villages from the same district gave an average population
L Hypothesis Testing L 247
of 510 per village with standard deviation of 40. Is the difference between the averages of
the two samples significant ? Justify your answer.
Solution. Here n1 = 200, n2 = 200
x1 = 485, x 2 = 510
s1 = 50, s2 = 40
H0 : There is no significant differences between two mean values of the samples i.e.,
H0 : µ 1 = µ 2
H1 : µ1 ≠ µ2 (two tailed test)
x1 − x 2 485 − 510
Under H0, Z = =
s12 s 22 (50 ) 2 ( 40 ) 2
+ +
n1 n2 200 200

485 − 510
= = − 5 .52
4.527
As the calculated value of | Z | = 5.52 > 1.96, the significant value of Z at 5% level of
significance. H0 is rejected i.e., there is a significant differences between mean values of the two
samples.
Example 7.16. The mean weight of 50 male students who showed above average
participation in school athletics was 68.2 kgs with a standard deviation of 2.5 kgs. While
50 male students who showed no interest in such participation had a mean weight of
67.5 kgs with a standard deviation of 2.8 kgs. Test the hypothesis that male students who
participate in school athletics are healthier than other male students.
Solution. Here n1 = 50, n2 = 50
x1 = 68.2, x 2 = 67.5
s1 = 2.5, s2 = 2.8
H0 : There is no differences between mean weight of male students who participate in
atheletics i.e., H0 : µ1 = µ 2
H1 : µ1 > µ2 (right tailed test)
x1 − x 2 68.2 − 67.5
Under H0, Z = =
s12 s 22 ( 2 .5 ) 2 (2 .8 ) 2
+ +
n1 n2 50 50

= − 0.7 = − 1.3188
0.5308
As the calculated value of | Z | = 1.3188 < 1.645 the significant value of Z at 5% level of
significance, H0 is accepted i.e., average weight of the male students who participate in school
atheletics is same as the average weight of other male students in school.
Example 7.17. The research investigator was interested in studying whether there is a
significant difference in the salaries of MBA grades in two metropolitan cities. A random
sample size 100 from Mumbai yields on average income of Rs. 20,150. Another random
sample of 60 from Chennai results in an average income of Rs. 20,250 if the variances of
both the populations are given as σ 12 = Rs. 40,000 and σ 12 = Rs. 32,400 respectively.
248 L Probability and Statistics L
Solution. From the given data, 1’s related to MBA grades in Mumbai and 2’s related to
MBA grades in Chennai
n1 = 100, x1 = 20,150, σ 12 = Rs. 40,000
and n2 = 60, x 2 = 20,250, σ 12 = Rs 32,400
To test the significance difference between the two population means µ1 and µ2 (or two
sample means x1 and x 2 ).
Let the null hypothesis H0 : µ1 = µ2
against the alternative hypothesis H1 : µ1 ≠ µ2 (use two tail test)
The critical region for Z, level of significance α and for 0.05 is 1.96.
Then we can use the test statistic
x1 − x 2 20150 − 20250
|Z| = =
σ 12 σ 22 40000 32400
+
+ 100 60
n1 n2
= 3.26 > 1.96
The calculated value of Z is greater than the table value of Z at 0.05 level of significance,
so we reject the null hypothesis. Hence we conclude that there is a significant difference between
the salaries of MBA grades in two metropolitan cities.
Example 7.18. IQ test on two groups of boys and girls gave the following results:
Mean of Girls = 78, S.D. = 10, n = 30
Mean of Boys = 78, S.D. = 13, n = 70
Is there any significance in the mean score of girls and boys at 5% level of significance ?
Solution. From the given data, 1’s related to girls and 2’s related to boys
n1 = 30, x1 = 78, s1 = 10
and n2 = 70, x 2 = 78, s2 = 13
To test the significance difference between the two population means µ1 and µ2 (or two
sample means x1 and x 2 ).
Let the null hypothesis H0 : µ1 = µ2
Against the alternative hypothesis H1 : µ1 ≠ µ2 (use two tail test)
The critical region for Z, level of significance α and for 0.05 is 1.96
Then we can use the test statistic

x1 − x 2 x1 − x 2
|Z| = =
s12 s 22 s12 s 22
+ +
n1 n2 n1 n 2

= 78 − 78
100 169
+
30 70
= 0 < 1.96
L Hypothesis Testing L 249
The calculate value of Z is less than the table value of Z at 0.05 level of significance. So we
need not reject the Null hypothesis. Hence we conclude that there is no significant difference
between the two groups’ girls and boys.

EXER CISE 7
EXERCISE .2
7.2

1. The means of two single large samples of 1000 and 2000 members are 67.5 inches and 68.0
inches respectively. Can the samples be regarded as drawn from the same population of
standard deviation 2.5 inches? [Ans. H 0 rejected at 5% level]
2. An examination was given to 50 students of Hindu College and 60 students at Hans Raj
College of Delhi University. The Hindu college has mean score 75 with S.D. 9 and Hans
Raj college has mean score 79 with S.D. 7. Is there a significant difference between the
mean score of two colleges? Test the performance at 5% level. [Ans. H 0 rejected at 5% level]
3. Intelligence Test given to two groups of boys and girls gave the following results:

Mean S.D. Number


Girls 78 10 50
Boys 73 15 100

Is there a significant difference in mean score of boys and girls? [Ans. H0 rejected at 5% level]
4. In two large populations there are 30% and 25% respectively of fair haired people. Is this
difference likely to be hidden in samples of 1200 and 900 respectively from the two
populations? [Ans. H 0 rejected at 5% level]
5. If 60 new entrants in a given university are found to have a mean height of 68.60 inches
and 50 seniors a mean height of 69.51 inches, is the evidence conclusive that the men
height of the seniors is greater than that of the new entrants? Assume the standard deviation
of the height to be 2.48 inches? [Ans. H 0 rejected at 5% level]
6. A man buys 50 electric bulbs of ‘Wipro’ and 50 electric bulbs of ‘Philips’. He finds that
‘Wipro’ bulbs gave an average life of 1500 hours with a standard deviation of 60 hours and
‘Philips’ bulbs gave an average life of 1512 hours with a standard deviation of 80 hours.
Is there a significant difference in the mean life of the two makes of bulbs?
[Ans. H0 accepted at 5% level]
7. In a survey of buying habits, 400 women shoppers are chosen at random in super market ‘A’
located in a certain section of the city. Their average weekly food expenditure is Rs. 250
with a standard deviation of Rs. 40. For 400 women shoppers chosen at random in super
market ‘B’ in another section of the city, the average weekly food expenditure is Rs. 220
with a standard deviation of Rs. 55. Test at 1% level of significance whether the average
weekly food expenditure of the two proportions of shoppers are equal.
[Ans. H 0 rejected at 5% level]

7.7.3. Hypothesis Testing for Single Population Proportion


The population proportion P is the ratio of the number of elements possessing a characteristics
to the total number of elements in the population, i.e.,
Number of elements possessing the characteristics
P =
Total number of elements in the population
Whereas the sample proportion p is the ratio of the number of elements possessing a
characteristics to the total number of elements in the sample, i.e.,
250 L Probability and Statistics L
Number of elements possessing the characteristics
p =
Total number of elements in the sample
Hypothesis testing about the population proportion is carried out very similarly to the
familiar method for hypothesis testing involving the population mean. This test is used to find
the significance between proportion of the sample and the population. Let X be the number of
successes in n independent trials with constant probability P of successes for each trial.
E(X) = nP; V(X) = nPQ; Q = 1 – P = Probability of failure
X
Let p = called the observed proportion of success.
n

E(p) = E
F X I = 1 E ( X ) = nP = P
H nK n n

V(p) = V
F X I = 1 V ( X ) = nPQ = PQ
H nK n 2 2
n n
PQ
S.E.(p) =
n
p − E ( p) p−P
Z = = ~ N ( 0 , 1)
S . E .( p ) PQ / n
Note 1. The probable limits for the observed proportion of successes are given by
p ± 3 PQ / n .
Note 2. If P is not known, it is approximated by p and the limits for the proportion in population
(on the basis of sample proportions are taken as p ± Z α PQ / n , where Q = 1 – P and Zα is
the significant value of Z at level of significance α.

SOLVED EXAMPLES
SOLVED

Example 7.19. A die is thrown 9000 times and a throw of 3 or 4 is observed 3240 times.
Check whether the die can not be regarded as an unbiased one and find the limits between
which the probability of a throw of 3 or 4 lies.
Solution. Here n = 9000, x = 3240. Then
P = Probability of getting 3 or 4 in die
2 1
= =
6 3
2
∴ Q = 1− P =
3
X 3240
p = = = 0.36
n 9000
1
H0 : The die is unbiased i.e., H0 : P =
3
1
H1 : P ≠ (two tailed test)
3
L Hypothesis Testing L 251

1
p−P 0.36 −
Under H0, Z = = 3
PQ / n 1 2 1
× ×
3 3 9000
= 6.053
As the calculated value of | Z | = 6.053 > 1.96 the significant value of Z at 5% level of
significance, H0 is rejectred i.e., die is biased.
Then the probable limits for p are
PQ 0.36 × 0.64
p±3 = 0.36 ± 3
n 9000
= 0.36 ± 0.015 = 0.375 = 0.345
Hence, probability of getting 3 or 4 lies between 0.345 and 0.375
Example 7.20. A wholesaler in apples claims that only 4% of the apples supplied by
him are defective. A random sample of 600 apples contained 36 defective apples. Test the
claim of the wholesaler.
Solution. Here n = 600, x = 36. Then
P = Probability of getting a defective apple
4
= = 0.04
100
∴ Q = 1 – P = 0.96
X 36
p = = = 0.06
n 600
We have to test H0 : P = 0.04
H1 : P > 0.04 (right tailed test)
p−P 0 .06 − 0 .04
Under H0, Z = =
PQ / n 1
0 .04 × 0 .96 ×
600
= 2.5
As the calculated value of | Z | = 2 .5 > 1.645 the significant value of Z at 5% level of
significance, H0 is rejected.
Example 7.21. In a sample of 1000 people, 540 are rice eaters and the rest are wheat
eaters. Can we assume that both rice eater and wheat eater are equally popular at 1% level
of significance?
Solution. Here n = 1000, x = No. of rice eaters = 540. Then
1
P = Probability of rice eaters = = 0.5
2
∴ Q = 1 – P = 0.5
X 540
p = = = 0.54
n 1000
252 L Probability and Statistics L
H0 : Both rice and wheat are equally popular i.e., H0 : p = 0.5
H1 : p ≠ 0.5 (two tailed test)
p−P 0 .54 − 0 .5
Under H0, Z = =
PQ / n 1
0 .5 × 0.5 ×
1000
= 2.532
As the calculated value of | Z | 2.532 < 2.58 the significant value of Z at 1% level of
significance, H0 is accepted i.e., rice and wheat are equally popular.
Example 7.22. A sample of 900 days is taken from meteorological records of a certain
district and 100 of them are found to be foggy. What are the probable limits to the
percentage of foggy days in the district?
Solution. The proportion of foggy days in the sample of 900 days is
100 1
P = =
900 9
8
∴ Q = 1− P =
9
Probable limits for p are
PQ
p ± 3
n
0.1111 × 0.888
= 0.1111 ± 3 = 0.1111 ± 3 × 0.0105
900
= 0.0796 × 100% and 0.1426 × 100%
= 7.96% and 14.26%
Hence, the percentage of foggy days lies between 7.96 and 14.26.
Example 7.23. An insurance company states that 90% of its claims are settled within
30 days. A consumer group selected a simple random sample of 75 of the company’s claims
to test this statement. The consumer group found that 55 of the claims were settled within
30 days. At the 0.05 significance level, test the company’s claim that 90% of its claims are
settled within 30 days.
Solution. Here n = 75, x = No. of claims settled within 30 days = 55. Then
90
P = = 0.9
100
∴ Q = 1 – P = 0.1
X 55
p = = = 0.73
n 75
H0 : Claim that 90% of the company’s insurance claims are settled within 30 days is true i.e.,
H0 : p ≥ 0.9
H1 : p < 0.9 (left tailed test)
p−P 0 .73 − 0.9
Under H0, Z = = = –4.811
PQ / n 1
0.9 × 0 .1 ×
75
L Hypothesis Testing L 253
As the calculated value of Z is –4.811 is less than –1.645, the significant value of Z at 5%
level of significnace, H0 is rejected i.e., the sample data warrants acceptance of the claim that
90% of the company’s insurance claims are settled within 30 days.
Example 7.24. Twenty people were attacked by a disease and only 18 survived.
Will you reject the hypothesis that the survival rate if attacked by this diesease is 85% in
favour of the hypothesis that is more at 5% level.
Solution. From the given data, n = 20 and P = 0.85 and Q = 0.15
18
The observed proportion p = = 0.9.
20
Let the null hypothesis H0 : P = 0.85
Alternative hypothesis H1 : P > 0.85 (use right tail test)
The table value of Z at 5% level is 1.645
p−P
∴ The test statistic | Z | = S. E ( p)
PQ 0.85 × 0.15
S.E(p) = = = 0.08
n 20
0.9 − 0.85
∴ The test statistic | Z | = = 0.625 < 1.645
0.08
∴ The calculated Z value is less than the table value of Z. So we need not reject the null
hypothesis. Hence, we conclude that the survival rate is not more than 85%.
Example 7.25. A social worker believes that fewer than 25% of the couples in a certain
area have ever used any form of birth control. A random sample of 120 couples was
contacted. Twenty of them said that they have used. Test the belief of the social worker
at 0.05 level.
Solution. From the given data, n = 120.
20 1
The observed proportion p = = = 0.167
120 6
25
and population proportion P = = 0.25 Q = 0.75
100
Now the null hypothesis H0 : P = 0.25 and the alternative hypothesis H1 : P > 0.25 (Use right
tail test) since alternative hypothesis is of greater than type. The level of significance for right
tail test at 95% is 1.645.
p−P
Now the test statistic Z =
S. E ( p )
PQ 0.25 × 0.75
S.E(p) = = = 0.04
n 120
0 .167 − 0 .25
∴ Z = = –2.276 < 1.645
0.04
The calculated Z value is less than the table value of Z.
So, the statistic Z falls in the acceptance region. So we need not reject the null hypothesis.
Hence we conclude that the social worker believes true.
254 L Probability and Statistics L

EXER CISE 7
EXERCISE .3
7.3

1. A coin is tossed 1000 times and the head comes out 550 times. Can the deviation from
expected value be due to fluctuations of sampling? [Ans. Coin is unbiased]
2. A machine is producing bolts of which a certain fraction is defective. A random sample of
400 is taken from a large batch and is found to certain 30 defective bolts. Does this
indicate that the proportion of defectives is larger than that claimed by the manufacturer
where the manufacturer claims that only 5% of his products are defective? Find 95%
confidence of the proportion of defective bolts in batch.
[Ans. H 0 rejected at 5% level and limits are 0.07136 and 0.02865]
3. A politician claims that she will receive 60% of the votes in an upcoming election. The
results of a simple random sample of 100 voters showed that 50 of those sampled will vote
for her. Test the politician’s claim at the 0.05 level of significance.
[Ans. H 0 rejected at 5% level]
4. An auditor claims that 10% of customer’s ledger accounts are carrying mistakes of posting
and balancing. A random sample of 600 was taken to test the accuracy of posting and
balancing and 45 mistakes were found. Are these sample results consistent with the claim of
the auditor? Use 5% level of significance. [Ans. H 0 rejected at 5% level]
5. The full-time student body of a college is 50% men and 50% women. Suppose an
introductory chemistry class contains 30 men and 20 women. Does this sample provide
sufficient evidence at the 0.05 significance level to reject the hypothesis that the proportions
of male and female students who take this course are the same as in the general student
body? [Ans. H 0 rejected at 5% level]
6. A sales clerk in the department store claims that 60% of the shoppers entering the store
leave without making a purchase. A random sample of 50 shoppers showed that 35 of them
left without buying anything. Are these sample results consistent with the claim of the sales
clerk? Use 5%level of significance. [Ans. H0 accepted at 5% level]

7.7.4. Hypothesis Testing for Difference between Two Population Proportions


If the two samples are drawn from different populations, we may be interested in finding
out whether the difference between the proportion of successes is significant or not. In such a
case we take the hypothesis that the difference between p1, i.e., the proportion of successes in one
sample, and p2, i.e., the proportion of successes in another sample is due to fluctuations of sampling.
The standard error of the difference between proportions is calculated by applying the formula:

S . E .( p1 − p 2 ) = pq
FG 1 +
1 IJ
Hn1 n2 K
where p = the pooled estimate of the actual proportion in the population. The value of p is
obtained as follows:
n p + n 2 p2 x + x2
p = 1 1 or p= 1
n1 + n 2 n1 + n 2
where q = 1 – p
Then we can use the test statistic
p1 − p2
Z =
S . E .( p1 − p2 )
p1 − p 2
∴ Z =
pq
FG 1 + 1 IJ
Hn n K
1 2
L Hypothesis Testing L 255
If | Z | < 1.96 (5% level of significance), the difference is regarded as due to random
sampling variation, i.e., as not significant.
The confidence limits for ‘P 1 – P2’ are then given by (p 1 – p 2) ± Z α/2 S.E(p1 – p2) i.e.,

(p1 – p2) ± Zα/2 pq


FG 1 +
1 IJ .
Hn 1 n2 K
SOLVED EXAMPLES
SOLVED

Example 7.26. In a sample of 600 men from a certain city, 450 men are found to be
smokers. In a sample of 900 from another city, 450 are found to be smokers. Do the data
indicate that two cities are significantly different with respect to prevalence of smoking
habits among men?
Solution. Here we are given for one city
450
n1 = 600, p1 = proportion of smokers = = 0.75
600
Also for another city,
450
n2 = 900, p2 = proportion of smokers = = 0.5
900
n1 p1 + n 2 p2 450 + 450 900
∴ p = = = = 0.6
n1 + n 2 600 + 900 1500
and q = 1 – p = 1 – 0.6 = 0.4
Let us take the hypothesis that there is no significant difference in the smoking habits of two
cities i.e., H0 : P1 = P2 and H1 : P1 ≠ P2.
Using the Z-statistic as follows:
p1 − p 2
Z =
pq
FG 1 + 1 IJ
Hn n K
1 2

0 .75 − 0 .5 0 .75 − 0.5


= =
F1 + 1I 0.0004 + 0 .000267
0.6 × 0.4
H 600 900 K
0 .75 − 0.5
= = 9. 7
0 .0258
As the calculated value of | Z | = 9.68 > 1.96 the significant value of Z at 5% level of
significance, H0 is rejected i.e., there is a significant differences in the smoking habits of two cities.
Example 7.27. Before an increase in excise duty on tea 800 people out of a sample
of 1000 persons were found to be coffee drinkers. After an increase in the duty, 800 persons
were known to be coffee drinkers in a sample of 1200 people. Do you think that there has
been a significant decrease in the consumption of tea after the increase in the excise duty?
Solution. Here,
800
n1 = 1000, p1 = proportion of coffee drinkers before increase in excise = = 0.8
1000
256 L Probability and Statistics L
800
n2 = 1200, p2 = proportion of coffee drinkers after an increase in duty = = 0.67
1200
n1 p1 + n 2 p2 800 + 800 1600
∴ p = = = = 0.727
n1 + n 2 1000 + 1200 2200
and q = 1 – p = 1 – 0.727 = 0.273
Let us take the hypothesis that there is no significant difference in the consumption of coffee
after the increase in excise duty, i.e.,
H0 : P1 = P2 and H1 : P1 ≠ P2.
Using the Z-statistic as follows:

Z = p1 − p 2

pq
FG 1 + 1 IJ
Hn n K
1 2

= 0.8 − 0 .67 = 0.8 − 0.67


0 .727 × 0.273
F 1 + 1 I 0 .019
H 1000 1200 K
= 6.842
As the calculated value of | Z | = 6.842 > 1.96 the significant value of Z at 5% level of
significance, H0 is rejected i.e., there is a significant differences in the consumption of coffee
after the increase in excise duty.
Example 7.28. A machine produced 20 defective articles in a batch of 400.
After overhauling it produced 10 defectives in a batch of 300. Has the machine improved?
Solution. Here,
20
n1 = 400, p1 = proportion of defective articles before overhauling = = 0.050
400
10
n2 = 300, p2 = proportion of defective articles after overhauling = = 0.033
300
n1 p1 + n 2 p2 20 + 10 30
∴ p = = = = 0.043
n1 + n 2 400 + 300 700
and q = 1 – p = 1 – 0.043 – 0.957
Let us take the hypothesis that the machine has not improved after overhauling, i.e., H0 : P1 = P2
and H1 : P1 ≠ P2.
Using the Z-statistic as follows:
p1 − p 2
Z =
pq
FG 1 + 1 IJ
Hn n K
1 2

0.05 − 0 .033 0 .05 − 0 .033


= =
0 .043 × 0.957
F1 + 1I 0 .0155
H 400 300 K
= 1.1
L Hypothesis Testing L 257
As the calculated value of | Z | = 1.1 < 1.96 the significant value of Z at 5% level of
significance, H0 is accepted i.e., the machine has not improved after overhauling.
Example 7.29. In a year there are 956 births in a town A of which 52.5% were males,
while in town A and B combined, this proportion in a total of 1406 births was 0.496.
Is there any significant difference in the proportion of male births in the two towns?
Solution. Here,
n1 = 956, p1 = proportion of male births in the sample of size n1 = 0.525
n1 + n2 = 1406, p2 = proportion of male births in the sample of n1 + n2 = 0.496
n1 p1 + n 2 p2
p = n1 + n 2
956 × 0 .525 + 450 × p 2
∴ 0.496 =
1406
⇒ p2 = 0.434
and q = 1 – p = 1 = 0.496 = 0.504
Let us take the hypothesis that there is no difference in the proportion in male births in two
towns, i.e., H0 : P1 = P2 and H1 : P1 ≠ P2.
Using the Z-statistic as follows:
p1 − p 2
Z =
pq
FG 1 + 1 IJ
Hn n K
1 2

0 .525 − 0 .434 0 .091


= =
0 .496 × 0.504
F1 + 1I 0 .027
H 956 450 K
= 3.367
As the calculated value of | Z | = 3.367 > 1.96 the significant value of Z at 5% level of
significance, H0 is rejected i.e., there is a difference between the proportion of male births in
town A and B.
Example 7.30. On a certain day, 74 trains were arriving on time at Delhi station
during the rush hours and 83 were late. At New Delhi, there were 65 on time and 107 late.
Is there any difference in the proportion arriving on time at the two stations?
Solution. Here,
n1 = 157, p1 = proportion of trains arriving on time at Delhi station
74
= = 0.471
74 + 83
n2 = 172, p2 = proportion of trains arriving on time at New Delhi station
65
= = 0 .378
65 + 107
Mean proportion of trains arriving on time
n1 p1 + n 2 p2
p =
n1 + n 2
258 L Probability and Statistics L
157 × 0.471 + 172 × 0.378
∴ =
157 + 172
74 + 65 139
= =
329 329
⇒ p = 0.423
and q = 1 – p = 1 – 0.423 = 0.577
Let us take the hypothesis that there is no difference in the proportions arriving on time at
the two stations, i.e., H0 : P1 = P2 and H1 : P1 ≠ P2.
Using the Z-statistic as follows:

Z = p1 − p 2

pq
FG 1 + 1 IJ
Hn n K
1 2

0 .471 − 0 .378 0.093


=
0 .423 × 0 .577
F
1
+
1 I = 0.054
H
157 172 K
= 1.722
As the calculated value of | Z | = 1.722 < 1.96, the sigtnificant value of Z at 5% level of
significance, H0 is accepted i.e., there is no difference of trains ‘arriving on time’ at the two
stations.
Example 7.31. If 57 out of 150 patients suffering with certain disease are cured by
allopathy and 33 out of 100 patients with same disease are cured by homeopathy, is there
reason to believe that allopathy is better than homeopathy at 0.05 level of significance.
Solution. From the given data, n1 = 150 and n2 = 100
57
The observed proportion, cured by allopathy is p1 = = 0.38
150
33
and the observed proportion, cured by homeopathy is p 2 = = 0.33
100
To test the significance difference between the two population proportions P1 and P2 are
equal (or two sample proportion p1 and p2 are equal).
Let the null hypothesis H0 : P1 = P2
against the alternative hypothesis H1 : P1 > P2 (use right tail test)
The critical region for Z, level of significance a at 0.05 is 1.645
Now the test statistic
p1 − p 2 p1 − p 2 FG 1 1 IJ
Z = = where S . E ( p1 − p 2 ) = pq +
S . E ( p1 − p 2 )
pq
FG 1 +
1 IJ Hn1 n2 K
Hn 1 n2 K
n1 p1 + n 2 p2
and p = and q = 1 – p
n1 + n 2
n1 p1 + n 2 p2
Now p =
n1 + n 2
L Hypothesis Testing L 259
150 × 0.38 + 100 × 0.33
=
150 + 100
= 0.36 then q = 0.64

p1 − p 2 0 .38 − 0 .33
Now Z = =
pq
FG 1 + 1 IJ 0 .36 × 0 .64
F1 + 1I
Hn n K
1 2
H 150 100 K
= 0.806 < 1.645
Calculated Z value (= 0.806) is less than the table value of Z (1.645) at 0.05 level of
significance. So we need not reject the null hypothesis. Hence we conclude that there is no
significant difference between the allopathy and homeopathy treatment i.e., the reason to believe
that allopathy is better than homeopathy at 0.05 level of significance is rejected.
Example 7.32. On the basis of their total scores, 200 candidates of a civil service
examination are divided into two groups, the upper 30% and the remaining 70%. Consider
the first question of the examination. Among the first group 40 had correct answer,
whereas the second group, 80 had the correct answer. On the basis of these results, can one
conclude that the first question is not good at discriminating ability of the type being
examined here ?
Solution. From the given data, n1 = 30% of 200 = 60 and n2 = 70% of 200 = 140
40 80
∴ p1 = = 0.667, p2 = = 0.57
60 140
Let the null hypothesis H0 : P1 = P2
and the Alternative Hypothesis H1 : P1 ≠ P2 (use two tail test)
n1 + p1 + n 2 + p2
∴ p =
n1 + n 2
60 × 0.667 + 140 × 0.57
=
60 + 140

= 40 + 80 = 0.6
200
120
= = 0.6
200
and q = 1 – 0.6 = 0.4

S.E(p1 – p2) = pq
FG 1 +
1 IJ = 0.6 × 0.4 F1 + 1I
Hn1 n2 K H 60 140 K
= 0.0756
p1 − p 2
∴ The test statistic | Z | = S . E ( p1 − p 2 )

0 .667 − 0 .571
=
0 .0750
= 1.269 < 1.96
260 L Probability and Statistics L
Since the calculated the value of Z is less than the table value of Z, so we need not reject
the null hypothesis.
Hence we conclude that there is no significant difference between the two proportions, i.e.,
the first question is good enough in discriminating the ability of the candidates of both groups.
Example 7.33. A study shows that 16 of 200 tractors produced on one assembly line
required extensive adjustments before they could be shipped, while the same was true for
14 of 400 tractors produced on another assembly line. At the 0.01 level of significance, does
this support the claim that the second production line does not superior work.
Solution. Given n1 = 200, n2 = 400
x1 16
The observed proportion of first assembly line p1 = n = 200 = 0.08
1
x 14
The observed proportion of the first assembly line p2 = 2 = = 0.035
n 2 400
n1 p1 + n 2 p2
p =
n1 + n 2
200 × 0.08 + 400 × 0.035
=
200 + 400
16 + 14
=
200 + 400
= 0.05 then q = 0.95
From the given data, to test p1 ≥ p2 i.e. p1 ≤ p2
Let Null hypothesis H0 : P1 = P2
Alternative hypothesis H1 : P1 < P2 (use left tailed test)

0.08 − 0 .035
∴ Z =
0 .95 × 0 .05
F1 + 1I
H 200 400 K
= 2.37 > –2.33
The calculated value of Z is greater than the table value of Z(–2.33). So we need not reject
the null hypothesis. Hence we conclude that P1 = P2 i.e., do not support the claim that the second
production line is not superior to first production line.

EXER CISE 7
EXERCISE .4
7.4

1. In a random sample of 1000 persons from town A, 400 are found to be rice eaters and in a
sample of 800 persons from town B, 400 are found to be rice eaters. Do these data reveal a
significant difference between two towns as far as the proportion of rice consumption is
concerned? [Ans. H0 accepted at 5% level]
2. A machine puts out 10 defective articles in a sample of 200. After overhauling it produced
4 defectives in a sample of 100. Has the machine improved? [Ans. H0 accepted at 5% level]
3. In two large population, there are 30% and 25% people of blue eyed respectively. Is this
difference likely to be hidden in the sample of 1200 and 900 respectively from the two
populations? [Ans. H 0 rejected at 5% level]
L Hypothesis Testing L 261
4. Test the significance of the difference between proportions from the following data:

Size of sample No. of defectives

Sample I 100 24
Sample II 300 48

[Ans. H0 accepted at 5% level]

5. In a referendum submitted to the student body at a university 850 men and 566 women
voted. 530 of the men and 304 of the women voted yes. Does this indicate a significant
difference of opinion on the matter, at the 1% level, between men and women students?
[Ans. H 0 rejected at 5% level]
6. In a random sample of 500 persons from Delhi, 200 are to be consumers of Cheese. In a
sample of 400 from Noida, 200 are found to be consumers of cheese. Discuss the question
whether the data reveal a significant difference between Delhi and Noida as far as the
proportion of cheese consumers is concerned. [Ans. H 0 rejected at 5% level]

7.7.5. Hypothesis Testing for Difference between Two Population Standard Deviations
If s1 and s2 are the standard deviations of two independent samples, then under the null
hypothesis H0 : σ1 = σ2, i.e., the sample standard deviations do not differ significantly, the
statistic
s1 − s 2
Z =
σ 12 σ2
+ 2
2 n1 2 n 2
where σ1 and σ2 are population standard deviations.
When population standard deviation is not known, then

s1 − s 2
Z =
s12 s2
+ 2
2 n1 2 n 2

SOLVED EXAMPLES
SOLVED

Example 7.34. The mean yield of two sets of plots and their variability are as given
below. Examine (i) whether the difference in the mean yields of two sets of plots is
significant and (ii) whether the difference in the variability in yields is significant.

Set of 40 plots Set of 60 plots


Mean yield per plot 1258 lb. 1243 lb.
Standard deviation per plot 34 28

Solution. Here n1 = 40, n2 = 60


x1 = 1258, x 2 = 1243
σ 1 = 34, σ 2 = 28
262 L Probability and Statistics L
(i) H0 : There is no differences between mean yields of two sets of plots i.e., H0 : µ1 = µ2
H1 : µ1 ≠ µ2 (two tailed test)

x1 − x 2 1258 − 1243
Under H0, Z = =
σ 12 σ 22 ( 34 ) 2 (28 ) 2
+ +
n1 n2 40 60

15
= = 2 .315
6.478
As the calculated value of | Z | = 2.315 > 1.96 the significant value of Z at 5% level of
significance, H0 is rejected i.e., there is a significant between mean yields of two sets of plots.
(ii) H0 : There is no differences between the variability of two sets of plots i.e., H0 : σ 1 = σ 2 ;
H1 : σ1 ≠ σ2 (two tailed test)
σ1 − σ 2 34 − 28
Under H0, Z = =
σ 12 σ2 ( 34 ) 2 (28 ) 2
+ 2 +
2 n1 2 n 2 80 120

6
= = 1.31
4.580
As the calculated value of | Z | = 1.31 < 1.96 the significant value of Z at 5% level of
significance, H0 is accepted i.e., there is no significant between the variability of two sets of
plots.
Example 7.35. The mean produce of rice of a sample of 50 fields is 200 lb. per acre
with a standard deviation of 10 lb. Another sample of 75 fields gives the mean at 220 lb
with a standard deviation of 12 lb. Assuming the standard deviation of the mean field at
11 lb. for the universe, find at 1% level if the two results are consistent.
Solution. Here n1 = 100 n2 = 150
σ 1 = 100 σ 2 = 12
x1 = 200 x 2 = 220

H0 : The results may be regarded as consistent i.e., H0 : σ1 = σ 2


H1 : σ1 ≠ σ2 (two tailed test)
σ1 − σ 2
Under H0, Z =
σ 12 σ2
+ 2
2 n1 2 n 2
10 − 12
=
(10 ) 2 (12 ) 2
+
100 150
= − 2 = − 1.428
1.4
As the calculated value of | Z | = 1.428 < 2.58 the significant value of Z at 1% level of
significance, H0 is accepted i.e., the results may be regarded as consistent.
L Hypothesis Testing L 263
Example 7.36. Intelligent test of two groups of boys and girls gave the following
results:

Mean S.D. Size


Girls 84 10 121
Boys 81 12 81

(i) Is the difference in mean scores significant?


(ii) Is the difference between the standard deviations significant?
Solution. Here n1 = 121, n2 = 81
x1 = 84, x 2 = 81
s1 = 10 s2 = 12
(i) H0 : Sample means do not differ significantly i.e., H0 : µ1 = µ2
H1 : µ1 ≠ µ2 (two tailed test)

x1 − x 2 84 − 81
Under H0, Z = =
σ 12 σ 22 (10 ) 2 (12 ) 2
+ +
n1 n2 121 81

= 0.1859
As the calculated value of | Z | = 0.1859 < 1.96 the significant value of Z at 5% level of
signifance, H0 is accepted i.e., sample means do not differ significantly.
(ii) H0 : There is no differences between the variability of two samples i.e., H0 : σ1 = σ2
H1 : σ1 ≠ σ2 (two tailed test)
σ1 − σ 2 10 − 12
Under H0, Z = =
σ 12 σ 22 (10 ) 2 (12 ) 2
+ +
2 n1 2 n2 242 162

= –1.7526
As the calculated value of | Z | = 1.7526 < 1.96 the significant value of Z at 5% level of
significance, H0 is accepted i.e., there is no significant between the variability of two samples.

EXER CISE 7
EXERCISE .5
7.5

1. In a survey of incomes of two classes of workers, two random samples gave the following
details:

Sample Size Average annual income S.D.


I 100 Rs. 582 Rs. 24
II 100 Rs. 546 Rs. 28

Examine whether the standard deviations are significant? [Ans. H0 accepted at 5% level]
2. Random samples drawn from two countries gave the following data relating to the heights
of adult males:
264 L Probability and Statistics L
India America

Mean Height (in inches) 67.42 67.25


Standard Deviation 2.58 2.50
Number of samples 1000 1200

Is the difference between the standard deviations significant? [Ans. H0 accepted at 5% level]
3. The yield of wheat in a random sample of 1000 farms in a certain area has S.D. of 192 kg.
Another random sample of 1000 farms gives a S.D. of 224 kg. Are the standard deviations
significantly different? [Ans. H 0 rejected at 5% level]

7.8. TEST OF SIGNIFICANCE FOR SMALL SAMPLES


When the size of the sample is less than 30, then the sample is called small sample. The tests
used in case of large samples are not applicable for small samples because the assumptions on
which they are based do not hold good for small samples. In small samples, population standard
deviation is not known and as such it is estimated from the random sample drawn from the
population.

7.9. THE ‘t’ DISTRIBUTION OR STUDENT’S ‘t’ DISTRIBUTION


It is very important and useful test of significance for the small samples. The ‘t’ test is
mainly applied in testing the significance of the (i) mean of the sample, (ii) difference between
the means of two independent samples (iii) observed coefficient of correlation. Here the population
standard deviation is not known.
x1 + x 2 + x 3 + ... + x n ∑ x
Let a small sample of size n with mean x= = and
n n
variance = 1 ∑ ( x − x ) 2 be drawn from a normal population having mean µ. Defining the quantity
n
x −µ 1
‘t’ as t = . n . We have the statistic ‘t’ where σ 2s = ∑ ( x − x ) 2 . If ‘t’ lies within the
σs n −1
confidence limits, the hypothesis is accepted, otherwise rejected.
1 1
Note: Variance of sample = ∑( x − x ) 2 and σ 2s = ∑( x − x )2
n n −1

n (Variance)
∴ σ 2s =
n −1

1 Variance
or σs = .
n n −1

Also ( n − 1) σ 2s = n (Variance)

7.9.1. Degree of Freedom


For a fixed value of the mean the number of free choices is called degree of freedom.
For example, in a distribution 2, 3, 5, 8, 7 the mean is 5. To have a distribution containing 5
th
values with mean 5, 4 values can be independently chosen but the 5 value has to be taken in
such a way that the mean is 5. So, the degree of freedom is 4. We have defined degree of
L Hypothesis Testing L 265
freedom for a fixed value of mean but in certain situations like Chi-square test, degree of
freedom is calculated in different way.
So in general the degree of freedom is defined as d.f. = Number of frequencies – Number
of independent constraints on them, e.g., if we are given n frequencies subject to linear constraint
Sum of the observed frequencies = Sum of the expected frequencies, i.e., ∑Oi = ∑Ei, then
the degree of freedom is n – 1.
If we use the given frequency distribution to compute the parameters of a theoretical
distribution then we further subtract 1 d.f. for each parameter estimated.
For example, in a Poisson distribution, if we calculate the mean and then use that value of
mean to calculate the theoretical frequencies, the degree of freedom will be n – 1 – 1 = n – 2,
however in Poisson distribution Mean = Variance = S.D.

7.9.2. Properties of J-distribution


1. t-distribution is unimodal distribution.
2. The probability distribution curve is symmetrical about the line t = 0.
f(t)
Standard normal
distribution

t-distribution
with v = 15

t-distribution
with v = 5

Fig. 7.2.

3. It is bell shaped curve just like a normal curve with its tail a little higher above the
abscissa than the normal curve. Its spread increases as degree of freedom ‘k’ increases.
This means that for the same value of t-variate and, the normal variate, the area beyond
t is larger than the area beyond x as is shown in the Fig. 7.2.
4. t-distribution has only one parameter k, the degree of freedom.
5. The constants of t distribution are as follows:
Mean = 0 for k ≥ 2
k
Variance σ = for k ≥ 3
2
k−2
6. The area under t-distribution curve for t < t0 is determined by the equation

P (t < t 0 ) = z
−∞
f (t ) dt
266 L Probability and Statistics L
Students and other readers need not integrate actually for the area as the tables of area
under the curve for different values of t are available and vice versa. (see table for
student’s ‘t’ distribution)
7. t-distribution to normal distribution as k increases. For practical purposes, t is taken as
equivalent to the normal distribution provided k ≥ 30. t-distribution has tremendous
utility in testing of hypothesis about one population mean or about equality of two
population means when standard deviation of population is not known.

7.10. APPLICATION OF THE J-DISTRIBUTION


The student’s t-distribution is used in the following fields:
7.10.1. To test the significance of the mean of a random sample
7.10.2. To test the difference between mean of two independent samples
7.10.3. To test the difference between mean of two dependent samples (Paired t-Test)
7.10.4. To test the significance of an observed correlation coefficient

7.10.1. To Test the Significance of the Mean of a Random Sample


Suppose a random sample x1, x 2, x 3, ..., xn of size n (n ≥ 2) has been drawn from a
normal population whose variance σ is unknown. On the basis of this random sample the
2

aim is to test
H0 : There is no significant difference the sample mean x and the population mean µ i.e.,
H0 : µ = µ 0
We use the test statistic
x −µ
t = ~ tn −1 where x is the mean of the sample
s/ n

2 1
and s = ∑ ( x − x ) 2 with degree of freedom (n – 1)
n −1
The table giving the value of t required for significance at various levels of probability
and for different degree area called t-tables which are given in statistical tables by Fishers
and Yates. The computed value is compared with the tabulated value at 5% or 1% levels of
significance and at (n – 1) degree of freedom and accordingly the null hypothesis is accepted
or rejected.

Fiducial Limits of Population Mean


If tα is the value of t at level of significance α at (n – 1) degree of freedom then,
x −µ
< t α for acceptance of H
s/ n 0

x − tα . s / n < µ < x + tα . s / n

95% confidence limits (5% level of significance) are x ± t 0 . 05 s / n .


99% confidence limits (1% level of significance) are x ± t 0 . 01 s / n .
L Hypothesis Testing L 267
Remark. Instead of calculating s, we calculate S for the sample
1
Since, s2 = ∑( x − x )2
n −1
1

2
S = ∑( x − x ) 2
n
∴ (n – 1)s2 = nS 2

SOLVED EXAMPLES
SOLVED

Example 7.37. A random sample of size 20 from a normal population has mean 42
and standard deviation of 5. Test the hypothesis that the population mean is 45. Use 5%
level of significance.
Solution. Here n = 20, x = 42, µ = 45, s = 5
H0 : There is no significant difference between the sample mean and population mean, i.e.,
µ = 45.
H1 : µ ≠ 45 (two tailed test)
x −µ 42 − 45
Under H0, t = = = − 2 .683
s/ n 5 / 20
The tabulated value of t at 5% level for 19 degree of freedom is t0.05 = 2.09
As the calculated value of | t | = 2.683 > t0.05 for 19 degree of freedom, H0 is rejected i.e.,
there is significant difference between the sample mean and population mean.
Example 7.38. The average breaking strength of steel rods is specified to be 18.5
thousand kg. For this a sample of 14 rods was tested. The mean and standard deviation
obtained were 17.85 and 1.955, respectively. Test the significance of the deviation
Solution. Here n = 14, x = 17.85, µ = 18.5, s = 1.955
H0 : There is no significant deviation in the breaking strength, i.e., µ = 18.5
H1 : µ ≠ 18.5 (two tailed test)

x −µ 17.85 − 18.5
Under H0, t = = = − 1.24
s/ n 1.955 / 14
The tabulated value of t at 5% level for 13 degree of freedom is t0.05 = 2.16.
As the calculated value of | t | = 1.24 < t 0 . 05 for 13 degree of freedom, H0 is accepted i.e., there
is no significant deviation in the breaking strength.
Example 7.39. The nine items of a sample had the following values:
45, 47, 50, 52, 48, 47, 49, 53, 51.
Does the mean of nine items differ significantly from the assumed population mean of
47.5.
Solution. Since n = 9 (<30), we use t-test. We have
268 L Probability and Statistics L
2
x d = x – A d

45 –4 16
47 –2 4
50 1 1
52 3 9
48 –1 1
47 –2 4
49 0 0
53 4 16
51 2 4

1 55

∴ x = A + ∑ d = 49 + 1 = 49.11
n 9

and s2 =
1 LM
∑d2 −
(∑ d )2 OP = 1 F 55 − 1 I = 6.86
n −1 N n Q 8 H 9K
⇒ s = 6.86 = 2 .62
H0 : There is no significant difference between the mean of the population from which the
sample is drawn is 47.5, i.e., µ = 47.5
H1 : µ ≠ 47.5 (two tailed test)
x −µ 49.11 − 47.5
Under H0, t = = = 1.85
s/ n 2 .62 / 9
The tabulated value of t at 5% level for 8 degree of freedom is t0.05 = 2.31
As the calculated value of | t | = 1.85 < t0.05 for 8 degree of freedom, H0 is accepted i.e.,
there is no there is no significant difference between the mean of the population from which the
sample is drawn is 47.5.
Example 7.40. A drug manufacturer has installed a machine which automatically fills
5 gm of drug in each phial. A random sample of fills was taken and it was found to contain
5.02 gm on an average in a phial. The standard deviation of the sample was 0.002 gms.
Test at 5% level of significance if the adjustment in the machine is in order.
Solution. Here n = 10, x = 5.02, µ = 5, s = 0.002
H0 : The adjustment in the machine is in order, i.e., µ = 5.
H1 : µ ≠ 5 (two tailed test)
x −µ 5.02 − 5
Under H0, t = = = 33.33
s/ n 0.002 / 10
The tabulated value of t at 5% level for 9 degree of freedom is t0.05 = 2.26
As the calculated value of | t | = 33.33 > t0.05 for 9 degree of freedom, H0 is rejected i.e.,
the adjustment in the machine is not in order.
L Hypothesis Testing L 269
Example 7.41. The lifetime of electric bulbs for a random sample of 10 from a large
consignment gave the following data:

Item 1 2 3 4 5 6 7 8 9 10

Life in ‘000 hours 4.2 4.6 3.9 4.1 5.2 3.8 3.9 4.3 4.4 5.6

Can we accept the hypothesis that the average lifetime of bulb is 4000 hours?
Solution. H0 : There is no significant difference in the sample mean and population mean,
i.e., µ = 4000 hrs.
H1 : µ ≠ 4000 hrs (two tailed test)
Applying t-test:
x 4.2 4.6 3.9 4.1 5.2 3.8 3.9 4.3 4.4 5.6

x−x –0.2 0.2 –0.5 –0.3 0.8 –0.6 –0.5 –0.1 0 1.2

( x − x )2 0.04 0.04 0.25 0.09 0.64 0.36 0.25 0.01 0 1.44

∑ x 44
As x = = = 4.4
n 10

∑( x − x )2 3.12
and s = = = 0 .589
n −1 9

x − µ 4.4 − 4
Under H0, t = = = 2.123
s 0 .589
n 10
The tabulated value of t at 5% level for 9 degree of freedom is t0.05 = 2.26
As the calculated value of | t | = 2.123 < t0.05 for 9 degree of freedom, H0 is accepted i.e.,
the average life time of bulbs could be 4000 hrs.
Example 7.42. A machine is designed to produce insulating washers for electrical
devices of average thickness of 0.025 cm. A random sample of 10 washers was found to
have a mean thickness of 0.024 cm with a standard deviation of 0.002 cm. Test the
significance of the deviation at 5% level.
Solution. Given n = 10, µ 0.025; x = 0.024 and s = 0.002
Let the null hypothesis H0 : µ = 0.025 against the
Alternative hypothesis H1 : µ ≠ 0.025 (is of ≠ type)
Level of significance α = 0.05 = 2.26 for 9 d.f.

x−µ 0 .024 − 0 .025


Now the test statistic |t| = =
s/ n 0 .002 / 10 − 1

= 1.5 < 2.26


270 L Probability and Statistics L
The calculated ‘t’ value is less than the table value. So we need not reject the null hypothesis.
Hence we conclude that H0 : µ = 0.025 i.e., we conclude that there is no significance difference
between the sample and the population.

EXER CISE 7
EXERCISE .6
7.6

1. Point out the various applications and properties of t-distribution.


2. If for a random sample of size 9 drawn from a normal population the sample mean and S.D.
be 49 and 2.4 respectively. 95% confidence limits for the population mean are 47.04 and
50.96. Comment. [Ans. Yes, the limits given in the question are correct]
3. A soap manufacturing company was distributing a particular brand of soap through a large
number of retail shops. Before a heavy advertising campaign, the mean sale per week of
this soap was 140 dozens. After the campaign, a sample of 81 shops was taken and the
mean sale was found to be 147 dozens with a standard deviation of 16 dozens. Explain
with reasons, if you consider the advertisement effective. [Ans. H0 accepted at 5% level]
4. A courier service advertises that its average delivery time is less than 6 hours for local
deliveries. A random sample of 10 for the amount of time this courier takes to deliver
packages to an addressee across town produced the following times (rounded to the nearest
hours): 7, 3, 4, 6, 10, 5, 6, 4, 3, 8. Is this evidence sufficient to support the courier claim
at 5% level of significance? [Ans. H0 accepted at 5% level]
5. A random sample of 10 boys had the I.Q’s 70, 120, 110, 101, 88, 83, 95, 98, 107 and 100.
Do these data support the assumption of a population mean I.Q. of 160?
[Ans. H0 accepted at 5% level]
6. An automobile tyre manufacturer claims that the average life of a particular grade of tyre is
more than 20000 km when used under normal conditions. A random sample of 16 tyres was
tested and a mean and standard deviation of 22000 km and 5000 km, respectively were
computed. Assuming the life of the tyres in km to be approximated normally distributed,
decide whether the manufacturer’s claim is valid. [Ans. H0 accepted at 5% level]
7. The wage of 10 workers taken at random from a factory are given below:
Wages (in Rs.): 578 572 570 568 572 578 570 572 596 584
Is it possible that the mean wage of workers of this factory is Rs. 580?
[Ans. H0 accepted at 5% level]
8. A machine is designed to produce insulating washers for electrical devices of average
thickness of 0.025 cm. A random sample of 10 washers was found to have an average
thickness of 0.024 cm with a standard deviation of 0.002 cm. Test the significance of the
deviation at 5% level. [Ans. H0 accepted at 5% level]
9. The manufacturer of a certain make of electric bulbs claims that his bulbs have a mean life
of 25 months with a standard deviation of 5 months. A random sample of 6 such bulbs gave
the following values:
Life of months 24, 26, 30, 20, 20, 18.
Can you regard the producer’s claim to be valid at 1% level of significance?
[Ans. H0 accepted at 5% level]
10. The foreman of a company has estimated the average quantity of extracted iron ore to be
36.8 tonnes per shift and the sample standard deviation to be 2.8 tonnes per shift, based
upon a random selection of 4 shifts. Construct a 90% confidence interval around this
estimate. [Ans. 33.0 to 40.6]
11. The heights of 10 children selected at random from a given locality had a mean 63.2 cms
and variance 6.25 cms. Test at 5% level of significance the hypothesis that the given
locality are on the average less than 65 cms. in all. [Ans. H 0 rejected at 5% level]
L Hypothesis Testing L 271
12. A new drug manufacturer wants to market a new drug only if he could be quite sure that
the mean temperature of a healthy person taking the drug could not rise above 98.4°F
otherwise he will withhold the drug.
The drug is administered to a random sample of 17 healthy persons. The mean temperature
was found to be 98.4°F with a standard deviation of 0.6° F. Assuming that the distribution
of the temperature is normal and α = 0.01, what should the manufacturer do?
[Ans. H0 accepted at 5% level]
13. A firm allows its employees to pursue additional income-earning activities such as
consultancy, tuitions, etc. in their out-of-office hours. The average weekly earning through
these additional income earning activities is Rs. 5000 per month per employee. A new HR
manager who has recently joined the firm feels that this amount may have changed. For
verifying his doubt, he has taken random sample of 45 employees and computed the average
additional income of these employees. The sample mean is computed as Rs. 5500 and the
sample standard deviation is computed as Rs. 1000. Use α = 0.10 to test the additional
average income has changed in the population. [Ans. H 0 rejected at 5% level]

7.10.2. To Test the Difference Between Mean of Two Independent Samples


When sample size is small and samples are independent (not related) and population standard
deviation is unknown, the t-statistic can be used to test the hypothesis for the difference between
two population means. This technique is based on the assumption that the characteristic being
studied is normally distributed for both the population.
Suppose two independent samples of sizes n1 and n2 with means x1 and x 2 and standard
deviations s1 and s2 we may be interested in testing: (i) whether the two independent samples
have been drawn from the population with the same means or (ii) the sample mean x1 and x 2
do not differ significantly. To carry out the test, we calculate t as follows:

x1 − x 2
t =
1 1
S +
n1 n 2

where x1 is the mean of the first sample, x 2 is the mean of the second sample, and

1
S2 = ∑ ( x1 − x1 ) 2 + ∑ ( x 2 − x 2 ) 2
n1 + n 2 − 2

n1 s12 + n 2 s 22
=
n1 + n 2 − 2
The significance of t for (n1 + n2 – 2) d.f. is tested in the same as discussed in the previous
sections.

SOLVED EXAMPLES
SOLVED

Example 7.43. Two salesmen A and B are working in a certain district. From a sample
survey conducted by the head office, the following results were obtained. State whether
there is any significant difference in the average sales between the two salesmen:
272 L Probability and Statistics L

A B
No. of sales 20 18
Average sales (in Rs.) 170 205
Standard deviation (in Rs.) 20 25
Solution. Here n1 = 20, n2 = 18
x1 = 170, x 2 = 205
s1 = 20, s2 = 25
H0 : There is no significant difference in the average sales between the two salesmen
1
∴ ∑ ( x1 − x1 ) 2 ⇒
2
20 = ∑ ( x1 − x1 ) 2 = 8000
20
Similarly,
1
∑ ( x 2 − x 2 ) 2 ⇒ ∑ ( x 2 − x 2 ) 2 = 11250
2
25 =
18
1
Now, S =
2
∑ ( x1 − x1 ) 2 + ∑ ( x 2 − x 2 ) 2
n1 + n 2 − 2
1 19250
= (8000 + 11250 ) = = 534 .72
20 + 18 − 2 36
∴ S = 534.72 = 23.12
Under H0,

t = x1 − x 2
1 1
S +
n1 n 2

170 − 205 35
= = − 23.12 × 0.3248
1 1
23.12 +
20 18
35
= − = − 4.66
7.5050
As the calculated value of | t | = 4.66 > t0.05 for 36 degree of freedom, H0 is rejected i.e.,
there is a significant difference in the average sales between the two salesmen.
Example 7.44. The mean life of a random sample of 10 light bulbs was found to be
1456 hours with a S.D. of 423 hours. A second sample of 17 bulbs chosen at random from
a different batch showed a mean life of 1280 hours with a S.D. of 398 hours. Is there a
significant difference between the mean life of the two batches?
Solution. Here n1 = 10, n2 = 17
x1 = 1456, x 2 = 1280
s1 = 423, s2 = 398
H0 : There is no significant difference in the mean life of bulbs of the two batches
L Hypothesis Testing L 273

1
Now,
2
S = ∑ ( x1 − x1 ) 2 + ∑ ( x 2 − x 2 ) 2
n1 + n 2 − 2

n1 s12 + n 2 s 22 1
= 10 + 17 − 2 (10 × 423 + 17 × 398 )
2 2
=
n1 + n 2 − 2
4482158
= = 179286.32
25
∴ S = 179286.32 = 423.42
x1 − x 2 1456 − 1280
Under H0, t = =
1 1 1 1
S + 423.42 +
n1 n 2 10 17

= 176
423.42 × 0 .3985

176
= = 1.04
168.69
As the calculated value of | t | = 1.04 < t0.05 for 25 degree of freedom which is 2.06, H0
is accepted i.e., there is no significant difference in the mean life of bulbs of the two batches.
Example 7.45. Below are given the gain of weights (in lbs.) of lions on two diet X
and Y:
Diet X 25 32 30 32 24 14 32
Diet Y 24 34 22 30 42 31 40 30 32 35
Test at 5% level of significance whether the two diets differ significantly in increasing
weight.
Solution. H0 : The two means do not differ significantly
∑ x i 189 ∑ x 2 320
We have x1 = = = 27 and x2 = = = 32
n 7 n 10
Diet X Diet Y
x1 x1 − x1 ( x1 − x1 ) 2 x2 x2 − x2 ( x2 − x2 )2
25 –2 4 24 –8 64
32 –5 25 34 2 4
30 3 9 22 –10 100
32 5 25 30 –2 4
24 –3 9 42 10 100
14 –13 169 31 –1 1
32 5 25 40 8 64
30 –2 4
189 266 35 3 9
320 350
274 L Probability and Statistics L
1
Now,
2
S = ∑ ( x1 − x1 ) 2 + ∑ ( x 2 − x 2 ) 2
n1 + n 2 − 2
1
= ( 266 + 350 ) = 616 = 41.066
7 + 10 − 2 15
∴ S = 41.066 = 6.408
x1 − x 2 27 − 32 5
Under H0, t = = =−
1 1 1 1 6.408 × 0.4928
S + 6 .408 +
n1 n 2 7 10

3
= − = − 1.583
3.157
As the calculated value of | t | = 1.583 < t0.05 for 15 degree of freedom which is 2.13, H0 is
accepted i.e., the two means do not differ significantly.
Example 7.46. Two laboratories A and B carry out independent estimates of fat-
content in ice-cream made by a firm. A sample is taken from each batch, halved, and the
separate halves sent to the two laboratories. The fat-content (in grams) obtained by the
laboratories are recorded below:
Batch No. 1 2 3 4 5 6 7 8 9 10
Lab. A 7 8 7 3 8 6 9 4 7 8
Lab. B 9 8 8 4 7 7 9 6 6 6
Is there a significant difference between mean fat content obtained by the two laboratories
A and B?
Solution. H0 : There is no significant difference between the mean fat content obtained by
the two laboratories, A and B.
∑ x1 67 ∑ x 2 70
We have x1 = = = 6 .7 and x 2 = = = 7. 0
n 10 n 10
Lab. A Lab. B
x1 X1 = x 1 – A 1 X12 = ( x1 − A1 ) 2
x2 X 2 = x 2 − A2 X 22 = ( x 2 − A2 ) 2
A1 = 8 A2 = 8
7 –1 1 9 1 1
8 0 0 8 0 0
7 –1 1 8 0 0
3 –5 25 4 –4 16
8 0 0 7 –1 1
6 –2 4 7 –1 1
9 1 1 9 1 1
4 –4 16 6 –2 4
7 –1 1 6 –2 4
8 0 0 6 –2 4
67 –13 49 70 –10 32
L Hypothesis Testing L 275

2 1
∑ X 2

LM
( ∑ X1 ) 2
+ ∑ X 2

(∑ X2 )2 OP
Now, S = n +n −2 1 2
1 2 N
n1 n1 Q
( −13) 2 ( −10 ) 2
49 − + 32 −
= 10 10
10 + 10 − 2
54.1
= = 3.009
18
∴ S = 3.009 = 1.732

x1 − x 2
Under H0, t =
1 1
S +
n1 n 2

6.7 − 7 0. 3
= =
1 1 1.732 × 0 .4472
1.732 +
10 10
0.3
= − = − 0.387
0.7745
As the calculated value of | t | 0.387 < t0.05 for 18 degree of freedom which is 2.10, H 0
is accepted i.e., the mean fat contents obtained by two laboratories A and B do not differ
significantly.

EXER CISE 7
EXERCISE .7
7.7

1. Strength tests carried out on samples of two yarns spun to the same count gave the following
results:

Number in sample Sample Mean Sample Variance


Yarn A 4 50 42
Yarn B 9 42 56

The strengths are expressed in pounds. Is the difference in mean strengths significant of the
sources from which the samples are drawn? [Ans. H0 accepted at 5% level]
2. The mean weekly sale of the Cadbury’s chocolate bar in a chain of candy stores was 146.3
bars per store. After an advertising campaign the mean weekly sales in 22 stores for a
typical week increased to 153.7 and showed a standard deviation of 17.2. Is the evidence
conclusive that the advertising was successful? You are given that for 21 degree of freedom,
the value of t is 2.08 at 5% level of significance. [Ans. H0 accepted at 5% level]
3. Two independent samples of 8 and 7 items respectively had the following values:

Sample I 9 11 13 11 15 9 12 14

Sample II 10 12 10 14 9 8 10

Is the difference between the means of the two samples significant?


[Ans. H0 accepted at 5% level]
276 L Probability and Statistics L
4. Two types of batteries, A and B, are tested for their lengths of life and the following results
were obtained:

No. of samples Mean Variance


A 10 500 100
B 10 560 121

Is there a significant difference in the two means? [Ans. H 0 rejected at 5% level]


5. A group of seven week-old chickens reared on a high protein diet weigh 12, 15, 11, 16, 14,
and 16 ounces, a second group of five chickens similarly treated except that they receive a
low protein diet weighted 8, 10, 14, 10 and 13 ounces. Test whether there is sufficient
evidence that additional protein has increased the weight of the chickens.
[Ans. H 0 rejected at 5% level]

6. The heights of six randomly chosen sailors are in inches: 63, 65, 68, 69, 71 and 72. Those
of 10 randomly chosen soldiers are 61, 62, 65, 66, 69, 70, 71, 72 and 73. Discuss the light
that these data throw on the suggestion that sailors are on the average taller than soldiers.
[Ans. H0 accepted at 5% level]
7. Samples of two types of electric bulbs were tested for lengths of life and the following data
were obtained:

Type I Type II
Number in the sample 8 7
Mean of the sample (in hours) 1134 1024
Standard deviation of the sample (in hours) 35 40

Is there a significant difference in the two means? [Ans. H 0 rejected at 5% level]

8. Eight pots growing three wheat plants each were exposed to a high tension discharge, while
nine similar pots were enclosed in an earthen wire case. The number of tillers in each pot
were as follows:

Caged 17 26 18 25 27 28 26 23 17

Electrified 16 16 22 16 21 18 15 20

See whether electrification exercises any real effect of the average tillers at 5% level of
significance. [Ans. H 0 rejected at 5% level]

7.10.3. To Test the Difference between Mean of Two Dependent Samples


(Paired J-Test)
Let us now consider the case when (i) the sample sizes are equal, i.e.., n1 = n2 = n say and
(ii) the two samples are not independent but the same observations are paired together. This test
is used in situations where there is a pairing of observations (xi, yi), like a stimulus is given to
some patients and their blood pressure before and after giving the stimulus is noted, then with
the help of this test it is analyzed if the stimulus is effective, marks obtained by students of a
class in two subjects, etc. For dependent samples or related samples test, it is important that two
samples taken in the study are of the same size. Let the null hypothesis be H0 = µ1 = µ2, i.e.,
the process is not effective. We define di = yi – xi, the difference in the observations for the ith
item.
L Hypothesis Testing L 277

∑ d1 n LM (∑ d ) 2 OP

1 1
Then we compute d = and S 2 = (di − d ) 2 = ∑d2 −
n n − 1 i =1 n −1N n Q
The test statistic for paired observation is defined by the following formula

t = |d |
S
n
where n is the number of pairs of difference.

SOLVED EXAMPLES
SOLVED

Example 7.47. To test whether a course in statistics improved performance, a similar


test was given to 12 participants, their scores both before and after the course are given
below:
Score (Before) 44 40 61 52 32 44 70 41 67 72 53 72
Score (After) 53 38 69 57 46 39 73 48 73 74 60 78
Test at 5% level of significance if the course was useful in terms of performance on the
test.
Solution. Let us take the null hypothesis that there is no improvement due to course.
Applying the paired t-test:

t = |d |
S
n

Score before the course Score after the course Difference (di) d2
44 53 9 81
40 38 –2 4
61 69 8 64
52 57 5 25
32 46 14 196
44 39 –5 25
70 73 3 9
41 48 7 49
67 73 6 36
72 74 2 4
53 60 7 49
72 78 6 36
60 578
278 L Probability and Statistics L

∑ di 60
∴ d = = =5
n 12

and
2
s =
1 LM
∑d2 −
(∑ d ) 2 OP
n −1 N n Q
=
1 FG
578 −
( 60 ) 2 IJ = 1 (278) = 25.27
11 H 12 K 11
⇒ s = 5.026

5 17.32
∴ t = = = 3.446
5.026 5.026
12
As the calculated value of | t | = 3.446 > t0.05 for 11 degree of freedom which is 2.20, H0 is
rejected i.e., the course has improved performance.
Example 7.48. You are given the marks obtained by 11 students in two tests one
before and other after special coaching. Do the data reveal that special coaching is effective?
Score (Before Coaching) 23 20 19 21 18 20 18 17 23 16 19
Score (After Coaching) 24 19 22 18 20 22 20 20 23 20 17
Solution. Let us take the null hypothesis that there is no improvement due to coaching.
Applying the paired t-test:

t = |d |
S
n
2
Score before coaching Score after coaching Difference (di) d
23 24 1 1
20 19 –1 1
19 22 3 9
21 18 –3 9
18 20 2 4
20 22 2 4
18 20 2 4
17 20 3 9
23 23 0 0
16 20 4 16
19 17 –2 4
11 61
L Hypothesis Testing L 279

∑ d i 11
∴ d = = =1
n 11

and
2
s =
1 LM
∑d2 −
(∑ d ) 2 OP
n −1 N n Q
1 F
61 −
(11) 2 I = 1 (50) = 5
=
10 GH 11 JK 10
⇒ s = 2.236
1 3.316
∴ t = = = 1.483
2 .236 / 11 2 .236
As the calculated value of | t | = 1.483 < t0.05 for 10 degree of freedom which is 2.23, H0 is
accepted i.e., the coaching is effective.
Example 7.49. A certain stimulus when administered to each of the 12 patients resulted
in the following increase of blood pressure:
5, 2, 8, –1, 3, 0, –2, 1, 5, 0, 4 and 6
Can it be concluded that the stimulus when will, in general, be accompanied by an
increase in blood pressure ?
Solution. Let us take the null hypothesis that there is no significant difference in blood
pressure before and after administrating the stimulus, i.e., stimulus is effective.
Now,

d = 5 2 8 –1 3 0 –2 1 5 0 4 6
2
d = 25 4 64 1 9 0 4 1 25 0 16 36

∑ d 31
∴ d = = = 2 .58
n 12

and
2
S =
1 LM
∑d2 −
(∑ d ) 2 OP
n −1 N n Q
=
1 FG
185 −
( 31) 2 IJ
11 H 12 K
1
= (104.9 ) = 9.538
11
⇒ S = 3.088
2 .58
∴ t =
3.088 / 2
8.937
= = 2 .894
3.088
As the calculated value of | t | = 2.89 > t 0.05 for 11 degree of freedom which is 2.20,
H0 is rejected i.e., the stimulus will not be accompanied by an increase of blood pressure.
280 L Probability and Statistics L
Example 7.50. IQ test was administered to 5 persons before and after they were
trained. The results are given below:
IQ (Before training) 110 120 123 132 125

IQ (After training) 120 118 125 136 121


Test whether there is any change in IQ after the training programme.
Solution. Let us take the null hypothesis that there is no significant effect of the training.
Applying the paired t-test:
|d |
t =
S
n
2
IQ (Before Training) IQ (After Training) Difference (di) d
110 120 10 100
120 118 –2 4
123 125 2 4
132 136 4 16
125 121 –4 16
10 140

∴ ∑ d 10
d = = =2
n 5

and
2
S =
1 LM
∑d2 −
(∑ d ) 2 OP
n −1 N n Q
=
1FG
140 −
(10 ) 2 IJ = 1 (120) = 30
4 H 5 K 4
⇒ S = 5.477
2 4.472
∴ t = = = 0 .816
5 .447 / 5 5.477

As the calculated value of | t | = 0.816 < t0.05 for 4 degree of freedom which is 2.78, H0 is
accepted i.e., there is no change in IQ after the training programme.

EXER CISE 7
EXERCISE .8
7.8

1. A certain stimulus when administered to each of the 9 patients resulted in the following
increase of blood pressure: 7, 3, –1, 4, –3, 5, 6, –4, and 1
Can it be concluded that the stimulus when will, in general, be accompanied by an increase
in blood pressure? (Given for 8 d.f., t0.05 = 2.31). [Ans. H0 accepted at 5% level]
2. Fit and Fine Health Club has been advertising a rigorous programme for body conditioning.
The club claims that after 1 month in the programme, the average participant should be
L Hypothesis Testing L 281
able to do at least eight more push-ups in 2 minutes than he or she could do at the start.
Does the random sample of ten programme participants given below support the club’s
claim? Use the 0.05 level of significance. [Ans. H0 accepted at 5% level]
3. The following data show weekly production for 10 employees before change and after
change in the production technique.

Employee A B C D E F G H I J

Before change 24 26 20 21 23 30 32 25 23 23

After change 26 26 22 22 24 30 32 26 24 25

Test whether there is any significant in average production due to the changes in the
production technique. [Ans. H 0 rejected at 5% level]
4. The sales data of an item in six shops before and after a special promotional campaign are:

Shops A B C D E F

Sales before Promotional Campaign 53 28 31 48 50 42

Sales after Promotional Campaign 58 29 30 55 56 45

Can the campaign be judged to be a success? Test at 5% level of significance. Use paired
t-test. The significant value of t for the left tail test at 5% level for 5 degrees of freedom is
2.57. [Ans. H 0 rejected at 5% level]
5. Ten persons were appointed in an electrical position in an office. Their performance was
noted by giving a test and the means recorded out of 50. They were given 6 month’s
training and again they were given a test and marks were recorded out of 50.

Employees A B C D E F G H I J

Before training 25 20 35 15 42 28 26 44 35 48

After training 26 20 34 13 43 40 29 41 36 46

By applying the t-test can it be concluded that employees have benefited by the training?
(You are given for 9 d.f., t0.05 = 2.26) [Ans. H0 accepted at 5% level]
6. The following table gives the additional hours of sleep gained by 10 patients in an
experiment to test the effect of a drug. Do these data give evidence that the drug produces
additional hours of sleep?

Patients 1 2 3 4 5 6 7 8 9 10

Hours gained 0.7 0.1 0.2 1.2 0.1 3.4 3.7 0.8 3.8 2.0
[Ans. H0 accepted at 5% level]
7. A physical instructor claims that a particular exercise if done continuously for 7 days
reduces weight by 15 kgs. Five over weight girls did the exercise for 7 days and their
weights were observed as:

Girls 1 2 3 4 5

Weight before exercise 70 72 75 71 78

Weight after exercise 66 70 72 66 72

Is the exercise effective in reducing weight? [Ans. H 0 rejected at 5% level]


282 L Probability and Statistics L
8. Memory capacity of 9 students was tested before and after a course of meditation for a
month. State whether the course was effective or not from the data below (in same units):
Before 10 15 9 3 7 12 16 17 4
After 12 17 8 5 6 11 18 20 3
[Ans. H0 accepted at 5% level]

7.10.4. To Test the Significance of an Observed Correlation Coefficient


Let r be the observed correlation coefficient in random sample of n observations (xi, yi) from
a bivariate normal population; we need to test the hypothesis H0 that the sampled population
correlation coefficient φ is zero.
We can show that under H0, the test statistic t given by

r n−2
t =
1 − r2
is a t variate with (n – 2) degrees of freedom and thus we test the hypothesis accordingly.

SOLVED EXAMPLES
SOLVED

Example 7.51. A random sample of 15 pairs of observation from a normal population


gives a correlation coefficient of –0.5. Is it likely that the variables in the population are
uncorrelated?
Solution. Let us take the hypothesis that the variables in the population are uncorrelated.
r n−2
Applying t-test, t =
1 − r2
−0 .5 × 13 1.803
= =− = − 2 .08
1 − ( − 0 .5 ) 2 0 .8660
Number of degrees of freedom = 15 – 2 = 13.
As the calculated value of | t | = 2.08 < t0.05 for 13 degree of freedom which is 2.16, H0
is accepted i.e., the variables in the population are uncorrelated.
Example 7.52. How many pairs of observations must be included in a sample in order
that on observed correlation coefficient of value 0.42 shall have a calculated value of t
greater than 2.72?
r n−2
Solution. As, t =
1 − r2
We are given the value of t and r and we have to find n.

∴ 0.42
× n−2 > 2.72
(1 − ( 0.42 ) 2

0.42
or × n−2 > 2.72
0.908
L Hypothesis Testing L 283

2 .72 × 0.908
or n−2 > = 5 .88
0.42
or n – 2 > (5.88)2 = 34.57
or n = 36.57 or 37
Hence we should include 37 observations.

EXER CISE 7
EXERCISE .9
7.9

1. A random sample of 18 paired observations from a bivariate normal population gives a


correlation coefficient 0.52. Does this signify the existence of correlation in the sampled
population? [Ans. H0 accepted at 5% level]
2. A random sample of 27 pairs of observations from a normal population gives a correlation
coefficient of 0.42. Is it likely that the variables in the population are uncorrelated?
[Ans. H 0 rejected at 5% level]
3. The following table gives the ages in years of 10 husbands and their wives at marriage.
Compute the correlation coefficient and test for its significance.

Husbands Age 23 27 28 29 30 31 33 35 36 39

Wives Age 18 22 23 24 25 26 28 29 30 32

[Ans. H 0 rejected at 5% level]


4. Find the least value of r in a sample of 27 pairs from a bivariate normal population significant
at 5% level. [Ans. 0.381]

7.11. VARIANCE RATIO OR F-TEST


A large number of surveys are conducted to draw conclusion about the effect of certain
factors or treatments. In testing the significance of the difference of two means of two samples,
we assumed that the two samples came from the same population or population with equal
variance. F-test is used either for testing the hypothesis about the equality of two population
variances or the equality of two or more population means. The object of F-test is to discover
whether two independent estimates of population variance differ significantly or whether the two
samples may be regarded as drawn from the normal population having the same variance. Hence
before applying the t-test for the significance of the difference of two means, we have to test
for the equality of population variance by using F-test. For example, variances in product quality
resulting from two different production processes, variances in temperatures for two heating
devices, variance in the rate of return of two types of stocks and so on, are few areas where
comparison of variance is needed.
It was R.A. Fisher who introduced the term ‘variance’ in the analysis of statistical data in
1920. Fisher developed a technique of analyzing the variance of two or more variables for the
purpose of studying their characteristics. Since F test is based on the ratio of two variances, it
is also known as variance ratio test.
Let xi (i = 1, 2, … n1) and yj (j = 1, 2, … n2) be two independent random samples (with
means x and y respectively) drawn from normal populations with the same variance.
n 1 n 2

∑ ∑
1 1
Let S12 = ( x i − x ) 2 , S22 = ( y − y )2
n1 − 1 i =1 n 2 − 1 i =1 i
284 L Probability and Statistics L
The F-statistic is defined by the relation
S12
F = where S12 > S22
S 22
Numerator should always be more than denominator. In case S22 > S12 then we have

S 22
F =
S12
In the first case, we say that F has (n1 – 1, n2 – 1) degrees of freedom and in the second
case, we say that F has (n2 – 1, n1 – 1) degrees of freedom.
Form the above, it is concluded that the greater of the two values S12 and S22 is taken in the
numerator while calculating F.
The calculating value of F is compared with the table value for (n1 – 1, n2 – 1) or (n2 – 1,
n1 – 1) as the case may be at 5% or 1% level of significance. If calculated value of F is greater
than the table value then the F ratio is considered significant and the null hypothesis is rejected.
On the other hand, if the calculated value of F is less than the table value the null hypothesis
is accepted and it is inferred that both the samples have come from the population having the
same variance.
Assumptions
1. Independent random samples are drawn from each of two normal populations
2. The populations for each sample must be normally distributed
3. The variability of the measurements in the two populations is same and can be measured
by a common variance σ , i.e., σ 12 = σ 22 = σ 2 .
2

4. The ratio of σ 12 to σ 22 should be greater than or equal to 1 since larger value from

S12 and S22 is taken in the numerator.

SOLVED EXAMPLES
SOLVED

Example 7.53. The time taken by workers in performing a job by method I and
method II is given below:
Method I 20 16 26 27 23 22
Method II 27 33 42 35 32 34 38
Do the data show that the variances of time distribution from population from which
these samples are drawn do not differ significantly ?

Solution. From the given data,

n∑ i 6
x = 134 = 22 .3 and y = ∑ y i = 241 = 34.4
1 1 1 1
n1 = 6 and n2 = 7; x=
n 7
L Hypothesis Testing L 285
Computation:

x y (xi − x ) ( xi − x )2 ( yi − y ) ( yi − y ) 2

20 27 –2.3 5.29 –7.4 54.76


16 33 –6.3 39.69 –1.4 1.96
26 42 3.7 13.69 7.6 57.76
27 35 4.7 22.09 0.6 0.36
23 32 0.7 0.49 –2.4 5.76
22 34 –0.3 0.09 –0.4 0.16
38 3.6 12.96
134 241 81.34 133.72

∑ ( xi − x )2 = 81.34; ∑ ( yi − y ) 2 = 133.72
∑ ( x i − x ) 2 81.34
S12 = = = 16.26 and
n1 − 1 5
∑ ( y i − y ) 2 133.72
S22 = = = 22.29
n2 − 1 6

Let the null hypothesis H 0 : σ 12 = σ 22 (there is no significant difference between the two
variances) against the
Alternative hypothesis H1 : σ 12 ≠ σ 22 (there is a significant difference between the two
variances)
Level of significance α = 0.05, F(6, 5) at 0.05 = 4.95
S 22 22 .29
Now the test statistic F = = = 1.37
S12 16.26
The calculated value of ‘F’ is less than the table value of ‘F’ at 5% level of significance.
So we need not reject the null hypothesis H0. Hence we conclude that there is no significant
difference between the two variances at 0.05 level of significance.
Example 7.54. In a sample of 8 observations, the sum of square of deviations from
means is 94.5. In other sample of 10 observations, the sum of deviations from mean is
101.7. Test whether there is a significant difference of variance.
Solution. Let us take the hypothesis that the two workers are equally stable, i.e.,
H 0 : σ 12 = σ 22 .
We have n1 = 8 and n2 = 10
∑ ( x − x ) 2 = 94.5 and ∑ ( y − y ) 2 = 101.7

1 94.5
∴ S12 = ∑(x − x )2 = = 13.5
n1 − 1 8 −1
1 101.7
and S22 = ∑(y − y )2 = = 11.3
n2 − 1 10 − 1
286 L Probability and Statistics L

S12 13.5
Hence F = = = 1.195
S 22 11.3

As the calculated value of F = 1.94 < F0.05(7.9) which is 3.29, H0 is accepted i.e., the two
samples represent the same variance.
Example 7.55. A plant has installed two machines producing polythene bags. During
the installation, the manufacturer of the machine has started that the capacity of the
machine is to produce 20 bags in a day. Owing to various factors such as different
operators working on these machines, raw material, etc. there is a variation in the number
of bags produced at the end of the day. The company researcher has taken a random
sample of bags produced in 10 days for machine 1 and 13 days for machine 2, respectively.
The following data gives the number of units of an item produced on a sampled day by
the two machines:
Machine I 20 16 26 27 23 22 18 24 25 19
Machine II 27 33 42 35 32 34 38 28 41 43 30 37
How can the researcher determine whether the variance is from the same population
(population variance are equal) or it comes from different populations (population variance
are not equal)? Use 5% level of significance.
Solution. Let us take the hypothesis that there is no significant difference between the
production capacity of the two machines, i.e., H 0 : σ 12 = σ 22
Applying F-test:

S 22
F =
S12

Machine I Machine II

x ( x − x )2 y (y − y )2

20 4 27 65
16 36 33 4
26 16 42 49
27 25 35 0
23 1 32 9
22 0 34 1
18 16 38 9
24 4 28 49
25 9 41 36
19 9 43 64
30 25
220 120 37 4
420 314
L Hypothesis Testing L 287

220 420
x = = 22 and y = = 35
10 12
1
Now S12 = ∑( x − x )2
n1 − 1

120
= = 13.33
10 − 1

1 314
and S 22 = ∑(y − y )2 = = 28.55
n2 − 1 12 − 1

28.55
∴ F = = 2 .14
13.33
As the calculated value of F = 2.14 < F0.05(11.9) which is 3.16, H0 is accepted i.e., there is
no significant difference between the production capacity of the two machines. The results
obtained by the sample may be due to chance.
Example 7.56. Most individuals are aware of the fact that the average annual repair
costs for an automobile depends on the age of the automobile. A researcher is interested
in finding out whether the variance of the annual repair costs also increases with the age
of the automobile. A sample of 25 automobiles that are 4 years old cost of Rs. 850 and a
sample of 25 automobiles that are 2 years old showed a sample variance for the annual
repair costs of Rs. 300. Test the hypothesis that the variance in annual repair costs is more
for the older automobiles, for a 0.01 level of significance.
Solution. Let us take the hypothesis that there is no significant difference in the variance
of repair cost, i.e., H0 : σ 12 = σ 22
We have n1 = 25 and n2 = 25
S12 = 850 and S22 = 300

S12 850
∴ F = = = 2 .833
S 22 300
As the calculated value of F = 2.833 > F0.01(24, 24) which is 2.66, H0 is rejected i.e., there
is no significant difference in the variance of repair cost.
Example 7.57. The daily wages (in Rs.) of workers in two cities are as follows:
Size of the sample Standard deviation of wages
City A 22 2.9
City B 16 3.8

Test at 5% level, the equality of variances of the wage distribution in the two cities.
Solution. Let us take the hypothesis that there is equality of variances of the wage distribution
in the two cities, i.e., H 0 : σ 12 = σ 22
We have n1 = 22 and n2 = 16
s1 = 2.9 and s2 = 3.8
288 L Probability and Statistics L
As we have,
n1 22
S12 = s2 = ( 2 .9 ) 2 = 8.81
n1 − 1 1 21
n2 16
S 22 = n − 1 s 2 = 15 ( 3.8 ) = 15.40
2 2
and
2
Applying F-test:
S 22 15.40
F = = = 1.75
S12 8 .81
As the calculated value of F = 1.75 < F0.05(15, 21) which is 2.18, H0 is accepted i.e., there
is equality of variances of the wage distribution in the two cities.
Example 7.58. The following figures relate to the number of units of an item produced
per shift by two workers A and B for a number of days

A 16 17 18 19 20 21 22 24 26 29
B 19 22 23 25 26 28 29 30 31 32 35 36
Can it be inferred that worker A is more stable compared to worker B? Give your
answer using F-test at 5% level of significance. [Use F0.05(11, 9) = 3.16]
Solution. Let us take the hypothesis that the two workers are equally stable, i.e., H 0 : σ 12 = σ 22
S 22
Applying F-test F =
S12

Worker A Worker B

x ( x − x )2 y (y − y )2

16 25 19 81
17 16 22 36
18 9 23 25
19 4 25 9
20 1 26 4
21 0 28 0
22 1 29 1
24 9 30 4
26 25 31 9
27 36 32 16
35 49
210 126 36 64
336 298

210 336
x = = 21 and y = = 28
10 12
L Hypothesis Testing L 289

S12 = 1
Now, ∑( x − x )2
n1 − 1
126 1
= = 14 and S 22 = ∑(y − y )2
10 − 1 n2 − 1
298
= = 27.09
12 − 1
S12 27 .09
∴ F = = = 1.94
S22 14
As the calculated value of F = 1.94 < F0.05(11.9) which is 3.16, H0 is accepted i.e., the two
workers are equally probable.

EXER CISE 7
EXERCISE 7..10

1. Given the following information about two sample from two normal populations,
n1 = 9 , s1 = 1.97, n 2 = 7 , s 2 = 3.21.
Can it be concluded that both the samples have come from populations having the same
variability? [Ans. H0 accepted at 5% level]
2. The students of the same age group from two different management schools were compared
for variability in their statistical skill. A random sample of 25 students from one management
school has a variance of 16 marks while a random sample of 22 students from the other
management school has variance of 8 marks. Examine if the difference in variability is
significant. [Ans. H0 accepted at 5% level]
3. Two random samples were drawn from normal population and their values are:

A 66 67 75 76 82 84 88 90 92

B 64 66 74 78 82 85 87 92 93 95 97

Test whether the two populations have the same variance at the 5% level?
[Ans. H0 accepted at 5% level]
4. One sample of 10 bulbs gives a standard deviation of 9 hours of life and another sample of
11 bulbs gives a standard deviation of 10 hours of life. Can you say the variances are
different at 1% level of significance? [Ans. H0 accepted at 5% level]
5. Two bottle filling plants are supposed to fill 5 litres of water in each bottle. A researcher
has taken a random sample of 10 bottles from Plant I and 15 bottles from Plant II. The data
collected are provided in the table below:

Plant I 5.1 5.2 5.2 5.2 5.3 5.4 5.3 4.9 4.8 4.9

Plant II 4.9 4.8 4.7 5.1 5.2 5.3 5.4 4.9 4.8 5.1 5.2 4.8 4.9 5.1 5.2

How can the researcher determine whether the variance is from the same population or it
come from the different populations? Take 5% as the level of significance
[Ans. H0 accepted at 5% level]
6. It is known that the mean diameter of a steel pipe produced by two processes, A and B, is
practically the same but the standard deviation may differ. For a sample of 22 pipes produced
by A, the standard deviation is 2.9 m, while for a sample of 16 pipes produced by B; the
standard deviation is 3.8 m. Test whether the pipe produced by process A have the same
variability as those of process B. [Ans. H0 accepted at 5% level]
290 L Probability and Statistics L

7.12. ANALYSIS OF VARIANCE


The analysis of variance was introduced into the science of statistics by R. A. Fisher. It is
basically an arithmetic procedure for dividing a total sum of squares of deviations from the mean
into components associated with defined sources of variation. The method of the analysis of
variation has proven to be of greatest importance in connection with experimental data. The first
objective of the analysis of variation is to obtain a measure of the total variation within the series
and second objective is to find a measure of variation between or among the components. Then
the significance of difference between the variations in two series or more may be measured. In
other words, with the help of the technique of analysis of variation we can test the hypothesis
that the means of all the components constituting a population are equal to the mean of the
population or that the samples have come from the same population. Suppose we want to assess
the performance of students of various schools in a common examination. The mean scores of
each school will show a variation, as also the scores of individual students within each school.
It is at times difficult to tell at a glance whether the variations between schools are significant
compared to variations within schools. It is for this reason that techniques of analysis of variance
have been developed. The null hypothesis in the analysis of variance is that there is no variation
in the means of the populations from which various samples come, or in other words, all samples
are drawn from the same population. The technique of analysis of variance is referred to as
ANOVA. A table showing the source of variation, the sum of squares, degree of freedom, men
square (variance) and the formula for the F-ratio is known as ANOVA TABLE.

7.12.1. Assumptions in Analysis of Variance


The underlying assumptions for the study of analysis of variations are:
1. Individuals being observed have been randomly selected from the populations represented
by the samples.
2. All populations from which samples have been drawn are normally distributed.
3. Each one of the samples is independent of the other samples.
4. The variances for the population from which samples have been drawn are equal.
5. The effect of various components are additive.

7.12.2. Technique of Analysis of Variance (One-way Classification)


When the effect that one factor have on one dependent variable is studied, one way ANOVA
is used to compare the means of several different groups. The steps in carrying out the analysis
are:
STEP 1: Calculate variance between the samples. For calculating variance between the
samples we take the total of the deviations of the means of various samples from the grand
average and divide this total by the degrees of freedom. Thus the steps in calculating variance
between samples will be:
(a) Calculate the mean of each sample, i.e., X1 , X 2 etc.
(b) Calculate the grand average X . Its value is obtained as follows:
X1 + X 2 + ... + X K
X =
K
Sum of the observations in all the samples
=
Number of observations
L Hypothesis Testing L 291
(c) Take the difference between the means of the various samples and the grand average.
(d) Square these deviations and obtain the total which give sum of squares between the
samples (SSB);
(e) Mean square between the samples (MSB) is obtained by dividing the total obtained in step
(d) by the degree of freedom. The degree of freedom will be one less than the number of
samples, i.e., if there is 5 samples then the degree of freedom will be 5 – 1 = 4.
Sum of the squares of variations between the samples is also called sum of the squares of
the variations between the columns, i.e., SSC. Likewise MSB is also written as MSC. Therefore,
SSC
MSC = , where c is the number of columns.
c −1
STEP 2: Calculate variance within the samples. For calculating variance within the samples
we take the total of the sum of the deviation of various items from the mean values of the
respective samples and divide this total by the degrees of freedom. Thus the steps in calculating
variance between samples will be:
(a) Calculate the mean of each sample, i.e., X1 , X 2 etc.
(b) Take the deviations of the various items in a sample from the mean values of the
respective samples;
(c) Square these deviations and obtain the total which give sum of squares within the
samples (SSW);
(d) Mean square within the samples (MSW) is obtained by dividing this total obtained in
step (c) by the degree of freedom. The degree of freedom is obtained by deducting from
the total number of items the number of samples.
Sum of the squares of variations within the samples is also called sum of the squares due
to errors, i.e., SSE. Likewise MSW is also written as MSE. Therefore,
SSE
MSE = , where c = number of columns; r = number of rows.
c ( r − 1)
STEP 3: Calculate the ratio F as follows:
Variance between the samples MSC
F = =
Variance within the samples MSE
F is always computed within the variance between the sample means as the numerator and
the variance within the sample means as the denominator. This denominator is computed by
combining the variance within the K samples into a single measure.
ANOVA TABLE for one-way classification for equal sample size
Sources of Sum of Degree of Mean Calculation
Variation Squares Freedom Square of F
Between Samples SSC
MSC =
(Column Means) SSC c – 1 c −1 F=
MSC
Within Samples SSE
MSE
(Errors) SSE c(r – 1) MSE =
c ( r − 1)
Total SST cr – 1
292 L Probability and Statistics L
STEP 4: Compare the calculated value of F with the table value of F for the degrees of
freedom at a certain critical level (generally we take 5% level of significance). If the calculated
value of F is greater than the table value it is concluded that the difference in sample means is
significant, i.e., it could not have arisen due to fluctuations of simple sampling or in other
words, the sample do not come from the same population. On the other hand, if the calculated
value of F is less than the table value the difference is said to be not significant and due to
fluctuations of simple sampling.

7.12.3. Alternative Short-Cut Method for calculating SSB and SSW


The various steps of short-cut method are given below:
(i) Calculate the total (i.e., T) of all the observations in all K samples.
Thus, T = ∑ X1 + ∑ X 2 + ... + ∑ X K
(ii) Find the correction factor by using the formula

T2
Correction Factor = , where N is the number of observations.
N
(iii) Calculate the sum of the squares of all the values (or observations) in K samples and
T2
subtract the correction factor from this sum. This result gives the total sum of the
N
squares of the deviations SST.

SST = ∑ X1 + ∑ X 2 + ... ∑ X K
2 2 2
Thus,
(iv) Find the square of the sum of each sample and divide each such squared value by the
number of values in the corresponding sample and then calculate the total of all the
results thus obtained and subtract the correction factor from this total. This final result
gives the sum of the squares of deviations between the samples. Thus,

SSB =
LM (∑ X )
1
2
+
(∑ X2 )2
+ ... +
(∑ X K ) 2 OP − T
2

N n 1 n2 nK Q N
(v) Calculate SSW, i.e., the sum of the squares within the samples by subtracting SSB from
SST. Thus, SSW = SST – SSB = SSE.

MSB
(vi) Calculate MSB, MSW: F =
MSW

MSC
=
MSE

7.12.4. Technique of Analysis of Variance (Two-way Classification)


In two way classification, we compare the variance of two factors. The effect of one
factor is studied through column and the other through rows. The two way classification
ANOVA table is prepared to draw the inference. It is more appreciated that ANOVA table
sketch is first prepared and required values are computed and put in the table to draw the
inference ultimately.
L Hypothesis Testing L 293
ANOVA TABLE for two-way classification
Sources of Sum of Degree of Mean Calculation
Variation Squares Freedom Square of F
SSC
Between Column SSC c – 1 MSC =
c −1 MSC
F1 =
SSR MSE
Between Rows SSR cr – 1 MSR =
cr − 1 MSR
F2 =
SSE MSE
Errors (Residual) SSE ( c − 1) ( r − 1) MSE =
c ( r − 1)

where SSC = Sum of square of columns; SSR = Sum of square of rows and SSE = Sum of square
of residual error.

SOLVED EXAMPLES
SOLVED

Example 7.59. A manufacturing company has purchased 3 new machines (A, B, C) of


different makes and wishes to determine whether one of them is faster than the other in
producing a certain item. From hourly production figures are observed at random from
each machine and results are given below:
A B C
20 18 25
21 20 28
23 17 22
16 25 28
20 15 32
Use analysis of variance to test whether machines differ significantly.
(Table value of F at 5% level for v1 = 2 and v2 = 12 is 3.89).
Solution. Let us take the hypothesis that machine do not differ significantly.
Using analysis of variance technique we have the following table.

Machine A Machine B Machine C


X1 X2 X3

20 18 25
21 20 28
23 17 22
16 25 28
20 15 32
Total: 100 Total: 95 Total: 135

X1 = 20 X 2 = 19 X 3 = 27
294 L Probability and Statistics L

20 + 19 + 27 66
Combined mean of all the samples ( X ) = = = 22
3 3
Variance between samples
To obtain variation between samples calculate the square of deviations of the various samples
from the combined mean. The mean of machine A is 20 and the combined mean is 22. So we
will take the difference 20 and 22 and square it. Similarly for machine B the mean is 19 but
the combined mean is 22 and so we take the difference between 19 and 22 and square it and
so for machine C. Thus we have the following table.

Machine A Machine B Machine C

( X1 − X ) 2 ( X2 − X )2 ( X3 − X )2

4 9 25
4 9 25
4 9 25
4 9 25
4 9 25
Total: 20 Total: 45 Total: 125

∴ Sum of squares between machines (SSC) = 20 + 45 + 125 = 190

Variance within samples


Here we find the sum of squares of the deviation of various items in a sample from the mean
values of respective samples. Thus for the first sample, the mean is 20 and so we will take
deviations of items from 20. For second sample, the mean is 19 and so we take deviations from
19 and so on. The squared deviations are given in the following table.

Machine A Machine B Machine C


X1 ( X1 − X1 ) 2 X2 ( X2 − X2 ) 2 X3 ( X3 − X3 ) 2

20 0 18 1 25 4
21 1 20 1 28 1
23 9 17 4 22 25
16 16 25 36 28 1
20 0 15 16 32 25
Total: 26 Total: 58 Total: 56

∴ Sum of squares within machines (SSE) = 26 + 58 + 56 = 140


L Hypothesis Testing L 295
All the above results can be tabulated as follows:
ANOVA TABLE
Sources of Sum of Degree of Mean Calculation
Variation Squares Freedom Square of F
Between Samples
190
(Column Means) 190 2 MSC = = 95
2
Within Samples MSC 95
140 F= = = 8.14
(Errors) 140 12 MSC = = 11.67 MSE 11.67
12
Total 14
As the calculated value of F = 8.14 > F0.05 (2, 12) which is 3.89, H0 is rejected i.e., the three
machines differ significantly.
Short-Cut Method for computation of SSC, SST, SSE.

T2
Correction Factor, C = where T is the grand total of values in all of the samples, N the
N
total number of values.
330 × 330
C = = 7260
15
∴ Sum of squares between machines (SSC)
100 2 + 95 2 + 135 2
= − 7260
5
= 37250 − 7260 = 7450 − 7260 = 190
5
and Total Sum of squares of deviations (SST)
F 20 + 21 + 23 + 16 + 20 + 18
2 2 2 2 2 2
+ 20 2 + 17 2 + 25 2 + 15 2 I − 7260
= GH + 25 + 28 + 22 + 28 + 32
2 2 2 2 2 JK
= 7590 – 7260 = 330
∴ Sum of squares within machines (SSE) = SST – SSC = 330 – 190 = 140
Remaining solution is same as in the first method.
Example 7.60. The following figures relate to the number of units sold in five different
areas by four salesmen.

Area Number of units


A B C D
I 80 100 95 70
II 82 110 90 75
III 88 105 100 82
IV 85 115 105 88
V 75 90 80 65
296 L Probability and Statistics L
Is there a significant difference in the efficiency of these salesmen? Table value of F at 5%
level for ν1 = 3 and ν2 = 16 is 3.24.

Solution. Let us take the hypothesis that there is no significant difference in the performance
of the four salesmen.
Using analysis of variance technique we have the following table.

A B C D
X1 X2 X3 X4

80 100 95 70
82 110 90 75
88 105 100 82
85 115 105 88
75 90 80 65
Total: 410 Total: 520 Total: 470 Total: 380

X1 = 82 X 2 = 104 X 3 = 94 X 3 = 76

82 + 104 + 94 + 76 356
Combined mean of all the samples ( X ) = = = 89
4 4
Variance between samples
We have the following table.

A B C D
( X1 − X ) 2 ( X2 − X )2 ( X3 − X )2 ( X4 − X )2

49 225 25 169
49 225 25 169
49 225 25 169
49 225 25 169
49 225 25 169

Total: 245 Total: 1125 Total: 125 Total: 845

∴ Sum of squares between samples (SSC) = 245 + 1125 + 125 + 845 = 2340

Variance within samples


The squared deviations are given in the following table.
L Hypothesis Testing L 297

A B C D

X1 ( X1 − X1 ) 2 X2 ( X2 − X2 ) 2 X3 ( X3 − X3 ) 2 X4 ( X4 − X 4 ) 2

80 4 100 16 95 1 70 36
82 0 110 36 90 16 75 1
88 36 105 1 100 36 82 36
85 9 115 121 105 121 88 144
75 49 90 196 80 196 65 121
Total: 98 Total: 370 Total: 370 Total: 338

∴ Sum of squares within samples (SSE) = 98 + 370 + 370 + 338 = 1176


All the above results can be tabulated as follows:
ANOVA TABLE

Sources of Sum of Degree of Mean Calculation


Variation Squares Freedom Square of F

Between Samples
2340
(Column Means) 2340 3 MSC = = 780
3
Within Samples MSC 780
1176 F= = = 10.61
(Errors) 1176 16 MSC = = 73.5 MSE 73.5
16
Total 19
As the calculated value of F = 10.61 > F0.05(3, 16) which is 3.24, H0 is rejected i.e., there
is a there is a significant difference in the efficiency of the four salesmen.
Example 7.61. It is desired to compare three hospitals with regards to the number of
deaths per quarter. A sample of death records were selected from the records of each
hospital and the number of deaths was as given below. From these data, suggest a difference
in the number of deaths per quarter among three hospitals:

Hospital I Hospital II Hospital III


8 7 12
10 5 9
7 10 13
14 9 12
11 9 15

Solution. Let us take the hypothesis that there is no significant difference in the number
of deaths per quarter among the three hospitals.
298 L Probability and Statistics L
Using analysis of variance technique we have the following table.

Hospital I Hospital II Hospital III


X1 X2 X3
8 7 12
10 5 9
7 10 13
14 9 12
11 9 14
Total: 50 Total: 40 Total: 60

X1 = 10 X2 = 8 X 3 = 12

10 + 8 + 12 30
Combined mean of all the samples ( X ) = = = 10
3 3
Variance between Samples
Thus we have the following table

Hospital I Hospital II Hospital III

( X1 − X ) 2 ( X2 − X )2 ( X3 − X )2

0 4 4
0 4 4
0 4 4
0 4 4
0 4 4

Total: 0 Total: 20 Total: 20


∴ Sum of squares between machines (SSC) = 0 + 20 + 20 = 40
Variance within samples
The squared deviations are given in the following table.

Hospital I Hospital II Hospital III

X1 ( X1 − X1 ) 2 X2 ( X2 − X2 ) 2 X3 ( X3 − X3 ) 2

8 4 7 1 12 0
10 0 5 9 9 9
7 9 10 4 13 1
14 16 9 1 12 0
11 1 9 1 14 4
Total: 30 Total: 16 Total: 14
L Hypothesis Testing L 299
∴ Sum of squares within machines (SSE) = 30 + 16 + 14 = 60
All the above results can be tabulated as follows:
ANOVA TABLE

Sources of Sum of Degree of Mean Calculation


Variation Squares Freedom Square of F
Between Samples
40
(Column Means) 40 2 MSC = = 20
2 MSC 20
Within Samples F= = =4
60 MSE 5
(Errors) 60 12 MSC = =5
12
Total 14

As the calculated value of F = 4 > F0.05 (2, 12) which is 3.89, H0 is rejected i.e., the
difference is significant and we conclude that the data suggests a difference in the number of
deaths per quarter among the three hospitals.
Example 7.62. Set up two-way ANOVA table for the following per hectare yield for 4
varieties of wheat on 3 plots:

Plot of Land Yield


A B C D
I 3 4 6 6
II 6 4 5 3
III 6 6 4 7
Solution. Using analysis of variance technique we have the following table.

Plot of Land Yield


A B C D Total
I 3 4 6 6 19
II 6 4 5 3 18
III 6 6 4 7 23

Total: 15 Total: 14 Total: 15 Total: 16 Total: 60

T 2 60 2
Correction factor, C = = = 300
N 12
Sum of squares between columns (SSC)

15 2 + 14 2 + 15 2 + 16 2
= − 300 = 0.67
3
Sum of squares between rows (SSR)

19 2 + 18 2 + 23 2
= − 300 = 3.5
4
300 L Probability and Statistics L
and Total Sum of squares of deviations (SST)
= (32 + 62 + 62 + 42 + 42 + 62 + 62 + 52 + 42 +
62 + 32 + 72) – 300
= 320 – 300 = 20
∴ Errors sum of squares (SSE) = SST – (SSR + SSC) = 15.83
All the above results can be tabulated as follows:
ANOVA Table for two-way classification

Sources of Sum of Degree of Mean Calculation


Variation Squares Freedom Square of F

0.67
Between Column SSC = 0.67 3 MSC = = 0.22
3 0.22
F1 = = 0.08
2 .64
3.50
Between Rows SSR = 3.50 2 MSR = = 1.75
2 1.75
F2 = = 0.663
2 .64
15.83
Errors (Residual) SSE = 15.83 6 MSE = = 2 .64
6

Example 7.63. To study the performance of three detergents and three water
temperatures, the following whitness readings were obtained with specially designed
equipment.

Water Temperature Detergent A Detergent B Detergent C


Cold water 57 55 67
Warm water 49 52 68
Hot water 54 46 58

Perform a two way analysis using 5% level of significance (Given F at 5% = 6.94).


Solution. Let us take the hypothesis that
(i) there is no difference in whiteness due to three varieties of detergent
(ii) there is no difference in whiteness due to three different temperatures.
Using analysis of variance technique we have the following table.

Water Detergent
Temperature A B C Total
Cold water 57 55 67 179
Warm water 49 52 68 169
Hot water 54 46 58 158
Total: 160 Total: 153 Total: 193 Total: 506

T 2 506 2
Correction factor, C = = = 28448 .44
N 9
L Hypothesis Testing L 301
Sum of squares between columns (SSC)

160 2 + 153 2 + 193 2


= − 28448.44
3

86258
= − 28448.44 = 304.22
3
Sum of squares between rows (SSR)

179 2 + 169 2 + 158 2


= − 28448.44
3

85566
= − 28448.44 = 73.56
3
and Total Sum of squares of deviations (SST)
= (57 2 + 49 2 + 54 2 + 55 2 + 52 2 + 46 2 + 67 2 + 68 2 + 58 2 ) − 28448.44
= 28888 – 28448.44 = 439.56
∴ Error sum of squares (SSE) = SST – (SSR + SSC) = 61.78
All the above results can be tabulated as follows:
ANOVA TABLE for two-way classification

Sources of Sum of Degree of Mean Calculation


Variation Squares Freedom Square of F

304.22
Between Column SSC = 304.22 2 MSC = = 152 .11 152 .11
2 F1 = = 9.85
15.44
73.56
Between Rows SSR = 73.56 2 MSR = = 36.78
2 36.78
F2 = = 2 .38
15.44
61.78
Errors (Residual) SSE = 61.78 4 MSE = = 15.44
4

(i) As the calculated value of F1 = 9.85 > F0.05(2, 4) which is 6.94, H0 is rejectred i.e., the
difference between varieties is significant.
(ii) As the calculated value of F2 = 2.38 < F0.05(2, 4) which is 6.94, H0 is accepted i.e., the
water temperature does not make a significant difference.
Example 7.64. The following data represent the number of units of a product by 3
different workers using 3 different types of machines.
Workers Detergent A Detergent B Detergent C
X 8 32 20
Y 28 36 38
Z 6 28 14
302 L Probability and Statistics L
Test (i) whether the mean productivity is the same for the different machine types, and
(ii) whether the three workers differ with respect to mean productivity.
Solution. Let us take the hypothesis that
(i) the mean productivity is same for the three different machines
(ii) the workers differ with regard to mean productivity.

Workers Machines
A B C Total

X 8 32 20 60
Y 28 36 38 102
Z 6 28 14 48

Total: 42 Total: 96 Total: 72 Total: 210


Using analysis of variance technique we have the following table.
T 2 210 2
Correction factor, C = = = 4900
N 9
Sum of squares between columns (SSC)
42 2 + 96 2 + 72 2
= − 4900
3
16164
= − 4900 = 488
3
Sum of squares between rows (SSR)
60 2 + 102 2 + 48 2
= − 4900
3
16308
= − 4900 = 536
3
and Total Sum of squares of deviations (SST)
= (8 2 + 28 2 + 6 2 + 32 2 + 36 2 + 28 2 + 20 2 + 38 2 + 14 2 ) − 4900
= 6028 – 4900 = 1128
∴ Error sum of squares (SSE) = SST – (SSR + SSC) = 104
All the above results can be tabulated as follows:
ANOVA TABLE for two-way classification

Sources of Sum of Degree of Mean Calculation


Variation Squares Freedom Square of F

488
Between Column SSC = 488 2 MSC = = 244 244
2 F1 = = 9.38
536 26
Between Rows SSR = 536 2 MSR = = 268
2 268
F2 = = 10.31
104 26
Errors (Residual) SSE = 104 4 MSE = = 26
4
L Hypothesis Testing L 303
(i) As the calculated value of F1 = 9.28 > F0.05(2, 4) which is 6.94, H0 is rejected i.e., the
mean productivity is not same for the three different machines.
(ii) As the calculated value of F2 = 10.31 > F0.05(2, 4) which is 6.94, H0 is rejected i.e., the
Three workers differ with regard to mean productivity.

EXER CISE 7
EXERCISE 7..11

1. Define analysis of variance. Give its assumptions.


2. What do you mean by analysis of variance and its uses? How analysis of variance table is
set and how is test based on its performance?
3. Explain clearly the technique of analysis of variance for data with one way classification.
4. Indicate the usefulness of the study of analysis of variance in various fields of economic activities.
5. Discuss the fundamental principles of analysis of variance with special reference to the
assumptions made therein.
6. What is the relevance of analysis of variance in social sciences?
7. From the following data of four samples, calculate whether the four samples differ
significantly in variance.

I II III IV
9 13 19 14
11 12 13 10
13 10 17 13
9 15 7 17
8 5 9 16

[Ans. H0 accepted at 5% level]


8. To assess the significance of possible variation in performance between the grammar school
of a city, a common test was given to a number of 5 students taken at random from the
senior fifth form of each of the four schools concerned. The results are given below. Make
an analysis of variance of the data.

Schools
A B C D
8 12 18 13
10 11 12 9
12 9 16 12
8 14 6 16
7 4 8 15
[Ans. H0 accepted at 5% level]
9. Four salesmen were posted in different areas by a company. The number of units of
commodities X sold by them are as follows:

A 20 23 28 29
B 25 32 30 21
C 23 28 35 18
D 15 21 19 25
304 L Probability and Statistics L
On the basis of his information can it be concluded that there is a significant difference in
the performance of the four salesmen? (Given for ν1 = 3 and ν2 = 12, F0.05 = 3.24)
[Ans. H0 accepted at 5% level]
10. Bakewell Biscuits Pvt. Ltd. has launched a new brand in the four metros, Delhi, Mumbai,
Kolkata, and Chennai. After one month, the company realizes that there is a difference in
the retail price per pack of biscuits across cities. Before the launch, the company has
promised its employees and newly appointed retailers that the biscuits would be sold at a
uniform price in the country. The difference in the price can tarnish the image of the
company. In order to make a quick inference, the company collected data about the price
from six randomly selected stores across the four cities. Based on the sample information,
the price per pack of the biscuits (in rupees) is given below:

Delhi Mumbai Kolkata Chennai


22 19 18 21
22.5 19.5 17 20
21.5 19 18.5 21.5
22 20 17 20
22.5 19 18.5 21
21.5 21 17 20

Use one-way ANOVA to analysis the significant difference in the prices. Take 5% as the
level of significance. [Ans. H 0 rejected at 5% level]
11. A farmer applies three types of fertilizers on four separate plots. The figure on yield per
acre are tabulated below:

Fertilizers Yield
A B C D
Nitrogen 6 4 8 6
Potash 7 6 6 9
Phosphates 8 5 10 9

Find out if the plots are materially different in fertility, as also, if the three fertilizers make
any material difference in yields. [Ans. H0 accepted at 5% level]
12. A certain company has four salesmen A, B, C and D each of whom was sent for a month to
three districts, area country side ‘K’, outskirts of a city ‘O’ and shopping centre of city ‘S’.
the sale in Rs. is given below:

Districts Salesmen
A B C D

K 30 70 30 30
O 80 50 40 70
S 100 60 80 80

Calculate relevant F ratio by carrying out an analysis of variance (two-way classification).


[Ans. H0 accepted at 5% level]
13. In a certain factory, production can be accomplished by four different workers on five
different types of machines. A simple study, in context of a two way design withour repeated
values, is being made with two fold objectives of examining whether the four workers differ
L Hypothesis Testing L 305
with respect to mean productivity and whether the mean productivity is the same for the
five different machines. The researcher involved this study reports while analyzing the
gathered data as under:
(i) Sum of squares for variance between machine = 35.2
(ii) Sum of squares for variance between workmen = 53.8
(iii) Sum of squares for total variance = 174.2
Set up ANOVA table for the given information and draw the inference about variance at 5%
level of significance. [Ans. H0 accepted at 5% level for productivity of machines as well as workers]
14. Four different drugs have been developed for certain disease. These drugs are used in 3
different hospitals and the results given below show the number of cases of recovery from
the disease per 100 people who have taken the drugs.

Hospitals Drugs
A B C D

I 19 8 23 8
II 10 9 12 6
III 11 13 13 10

What conclusions can you draw? [Ans. H0 accepted at 5% level]


15. A tea company appoints four salesmen A, B, C and D and observes their sales in three
seasons–summer, winter and monsoon. The figures (in lakhs) are given in the following
table:

Seasons Drugs
A B C D
Summer 36 36 21 35
Winter 28 29 31 32
Monsoon 26 28 29 29

Carry out an analysis of variance and test whether there is any significant difference in the
salesmen and in the seasons, so far as sales are concerned.
[Ans. H 0 accepted at 5% level for sales of salesmen as well as season]

2
χ –TEST)
7.13. CHI-SQUARE TEST (χ
x−µ
If ‘x’ is normally distributed with mean µ and standard deviation σ then z = is a
2 σ
standard normal variate with mean 0 and variance 1, then z =
2 FG x − µ IJ is a Ch-square variate
H σ K
with 1 degree of freedom. Chi-square variate is denoted by the symbol χ .
2

If x1, x2, x3, ..., xn are n independent normal variates with means µ1, µ2, µ3, ..., µ n and
standard deviations σ1, σ2, σ3, ..., σn respectively, then
2 2 2 2

χ =
FG x − µ IJ
1 1
+
FG x − µ IJ + FG x − µ IJ
2 2 3 3
+ ...
FG x − µ IJ
n n
H σ K 1 H σ K H σ K
2 3 H σ K n
2

=
n
Fx −µ
∑ GH σ i i IJ
i =1 i K
is a Chi-square variate with n degrees of freedom.
306 L Probability and Statistics L
It is computed on the basis of frequencies in a sample and is applied only for qualitative
data such as intelligence, colour, immunity, health, response to drug, etc.
If a random variable X has a chi-square distribution with n degrees of freedom, we write
X − χ 2( n ) and its probability density function is given by:
1
f (x) = e − x /2 x n/2 − 1 ; 0 ≤ x < ∞
2 Γ (n / 2 )
n/2

where n is a parameter of the distribution which is a positive integer, also indicated as degrees
of freedom.
2
χ ) DISTRIBUTION
7.14. CHARACTERISTICS OF CHI-SQUARE (χ
1. Chi-square is always positively skewed i.e. χ value is always positive
2

2. Chi-square values increase with the increase in degree of freedom.


3. The standard deviation of χ distribution is equal to 2 ν where ν is the degree of
2

freedom.
4. The mean of the distribution is the number of degree of freedom.
5. The value of χ lies between zero and infinity, i.e., 0 ≤ χ 2 < ∞.
2

6. The sum of two χ distribution is again a χ distribution, i.e., if χ 1 and χ 2 are two
2 2 2 2

independent distributions, they have a χ distribution with n1 and n2 degree of freedom


2

respectively then χ1 + χ 2 is also a χ distribution with (n1 + n2) degree of freedom.


2 2 2

7. For different degrees of freedom, the shape of curve will be different.

2 d.o.f.
3 d.o.f.
1 d.o.f.
4 d.o.f. 5 d.o.f.

6 d.o.f.

Fig. 7.4.
2
8. Chi-square (χ ) is a statistic hypothesis and not a parameter.
2
χ)
7.15. USES OF CHI-SQUARE (χ
The χ test is very powerful test for testing the hypothesis of a number of statistical
2

problems. The important uses of χ test are:


2

1. To test the discrepancy between observed and expected frequencies.


The quantity χ describes the magnitude of discrepancy between theory and observation.
2

Chi-square test enables us to determine the degree of deviation between observed frequencies
L Hypothesis Testing L 307
and the theoretical frequencies and to conclude whether the deviation between observed
(experimental) frequencies and expected (theoretical) frequencies is due to error of
sampling or due to chance.

2. To determine the association between two or more attributes.


With the help of χ test we can find out whether two or more attributes are associated
2

or not. For example, suppose a researcher brought male and female participants into the
lab and asked them which color they prefer blue or green. The researcher believes that
color preference may be related to gender. Notice that both gender (male, female) and
color preference (blue, green) are categorical variables. If there is an association between
gender and color preference, we would expect that the proportion of men who prefer
blue would be different than the proportion of women who prefer blue. To determine
if an association exists between gender and color preference, the chi-square test computes
the distributions across the combination of your two factors that you would expect if
there were no association between them.

2
χ ) TEST
7.16. CONDITIONS FOR APPLYING CHI-SQUARE (χ
1. Every observation of the sample for this test should be independent of all other
observations.
2. The expected frequency of any item should not be less than 5
3. The total number of observations used for the test should be large i.e., n ≥ 50.
4. Chi-square is wholly dependent on degree of freedom
5. This test is used only for drawing inferences by testing hypothesis. It cannot be used for
estimation of parameter or any other value.
6. The frequencies used in χ should be absolute and relative in terms.
2

7. The observations collected for χ test should be on random sampling.


2

7.17. DEGREE OF FREEDOM


Case I: If the data is given in the form of a series of variables in a row or column, then
the degree if freedom = number of items in the series – 1 i.e. v = n – 1, where n the number
of variables in the series in a row or column.
Case II: For a contingency table (table which is essentially a display format used to analyze
and record the relationship between two or more categorical variables) with r rows and c
columns, the degrees of freedom (v) = (r – 1) (c – 1).

7.18. CHI SQUARE TEST OF GOODNESS OF FIT


This test was given by Karl Pearson in 1900 for testing the significance of the discrepancy
between theory and experiment. χ2 test provides a platform that can be used to ascertain whether
theoretical probability distributions coincide with the empirical sample distributions.
If Oi (i = 1, 2, ..., n) is a set of observed (experimental) frequencies and Ei (i = 1, 2, ..., n)
is the corresponding set of expected (theoretical or hypothetical) frequencies, then χ is
2

defined as
308 L Probability and Statistics L
n
(Oi − Ei ) 2
χ = ∑
2
i =1
Ei
where ∑Oi = ∑Ei = N (total frequency) and degrees of freedom = v = n – 1
Let the null hypothesis H0 be that there is no significance difference between the observed
(i.e. experimental) values and the corresponding expected (or theoretical) values. If calculated
value of χ2 is less than the tabulated value of χ2 at 5% level of significance, the fit is considered
to be good, i.e., the divergence between actual and expected frequencies is attributed to fluctuations
of simple sampling. If the calculated value of χ2 is greater than the tabulated value, the fit is
considered to be poor.
Remark
1. If χ = 0, the observed and theoretical frequencies agree exactly.
2

2. If χ2 > 0 they do not agree exactly.

SOLVED EXAMPLES
SOLVED

Example 7.65. The following table gives the number of aircraft accidents that occurred
during the various days of the week. Find whether the accidents are uniformly distributed
over the week. (Given that value of χ2 at 5% level of significance for 6 d.f. is 12.59)
Days Sun. Mon. Tue. Wed. Thu. Fri. Sat. Total
No. of accidents 14 16 8 12 11 9 14 84
Solution. Taking the hypothesis that the accidents are uniformly distributed over the week,
the expected frequency for each day
Total Frequency 84
= = = 12
No. of Days 7
n
(Oi − Ei ) 2
∴ χ = ∑
2
Ei ,
i =1

where Oi = observed frequency and Ei = expected frequency.


In this case,
(14 − 12 ) 2 (16 − 12 ) 2 (8 − 12 ) 2 (12 − 12 ) 2 (11 − 12 ) 2 (9 − 12 ) 2 (14 − 12 ) 2
χ2 = + + + + + +
12 12 12 12 12 12 12
1
= (4 + 16 + 16 + 0 + 1 + 9 + 4) = 4.12.
12
As the calculated value of χ = 4.12 < χ 6,0.05 which is 12.59, H0 is accepted i.e., the
2 2

accidents are uniformly distributed over the week.


Example 7.66. 200 digits were chosen at random from a set of tables. The frequencies
of the digits were:
Digit 0 1 2 3 4 5 6 7 8 9 Total
Frequency 18 19 23 21 16 25 22 20 21 15 200
L Hypothesis Testing L 309
Use the Chi-square test to assess the correctness of the hypothesis that the digits were
distributed in the equal number in the tables from which these were chosen. (χ χ2 for 9 d.f.
at 5% level = 16.9)
Solution. Taking the hypothesis that the digits were distributed in equal number in the
tables from which they were chosen, the expected frequency for each digit
Total Frequency 200
= = = 20
No. of Digits 10
n
(Oi − Ei ) 2
∴ χ = ∑
2
Ei ,
i =1

where Oi = observed frequency and Ei = expected frequency.


In this case,
(18 − 20) 2 (19 − 20) 2 (23 − 20) 2 (21 − 20) 2 (16 − 20)2
χ2 = + + + +
20 20 20 20 20
(25 − 20) 2 (22 − 20) 2 (20 − 20) 2 (21 − 20) 2 (15 − 20) 2
+ + + + +
20 20 20 20 20
1 86
= (4 + 1 + 9 + 1 + 16 + 25 + 4 + 0 + 1 + 25) = = 4.3.
20 20
As the calculated value of χ2 = 4.3 < χ29, 0.05 which is 16.9, H0 accepted i.e., the digits are
distributed in equal number is true.
Example 7.67. A die is thrown 270 times and the results of these throws are given
below:
No. appeared on the die 1 2 3 4 5 6
Frequency 40 32 29 59 57 59
Test whether the die is biased or not. (χχ2 for 5 d.f. at 5% level = 11.07).
Solution. Taking the hypothesis that the die is unbiased, the expected frequency for each
digit
Total Frequency 276
= = = 46
Total Number Appeared on Die 6
n
(Oi − Ei ) 2
∴ χ2 = ∑ Ei ,
i =1

where Oi = observed frequency and Ei = expected frequency.


In this case,
(40 − 46) 2 (32 − 46) 2 (29 − 46) 2 (59 − 46) 2 (57 − 46) 2 (59 − 46) 2
χ
2
= + + + + +
46 46 46 46 46 46
1 980
= (36 + 196 + 289 + 169 + 121 + 169) = = 21.30.
46 46
As the calculated value of χ = 21.30 > χ 5,0.05 which is 11.07, H0 is is rejected i.e., the
2 2

die is not unbiased or die is biased.


310 L Probability and Statistics L
Example 7.68. The theory predicts that the proportion of beans in the four groups A,
B, C and D should be 11:4:3:2. In an experiment it was observed that the number of four
groups A, B, C and D are 1070, 430, 330, 170. Does the result of the experiment support
χ2 for 3 d.f. at 5% level = 7.815).
theory ? (χ
Solution. Taking the hypothesis that the result of the experiment fits well the theory i.e.,
there is no significant difference between the observed and theoretical frequency.
The expected frequencies (E) can be computed as follows:
Total number of beams = 1070 + 430 + 330 + 170 = 2000
OA = Observed number of beans in group A = 1070
E A = Observed number of beans in group A
11
= × 2000 = 1100
20
4
Similarly, OB = 430 ⇒ EB = × 2000 = 400
20
3
OC = 330 ⇒ EC = × 2000 = 300
20
2
and OD = 170 ⇒ ED = × 2000 = 200
20
n
(Oi − Ei ) 2
∴ χ
2
= ∑ Ei ,
i =1

where Oi = Observed frequency and Ei = Expected frequency.


In this case,
(1070 − 1100) 2 (430 − 400) 2 (300 − 330) 2 (170 − 200) 2
χ
2
= + + +
1100 400 330 200
= 0.8182 + 2.25 + 3 + 4.5 = 10.5682.
As the calculated value of χ = 10.5682 > χ 3, 0.05 which is 7.815, H 0 is rejected i.e.,
2 2

experiment does not support theory.


Example 7.69. Honda Motors Ltd. conducted a field survey covering 200 respondents
on ‘how much is the satisfaction level of a particular brand is important to them’. The
results of the survey are:
Very satisfactory: 50 respondents
Satisfactory: 50 respondents
Neither satisfactory nor dissatisfactory: 20 respondents
Dissatisfactory: 40 respondents
Very dissatisfactory: 30 respondents
Is there any significant difference in importance rating for satisfaction level among the
respondents at 1% level of significance? (Given that χ2 for 4 d.f. at 1% level = 13.277).
Solution. Taking the hypothesis that there is no significant difference in importance among
the respondents, the expected number of respondents
L Hypothesis Testing L 311

Total Respondents 200


= = = 40
No. of Attributes 5
n
(Oi − Ei ) 2
∴ χ2 = ∑ Ei
,
i =1

where Oi = Observed frequency and Ei = Expected frequency.


In this case,
(50 − 40) 2 (60 − 40) 2 (20 − 40) 2 (40 − 40) 2 (30 − 40) 2
χ
2
= + + + +
40 40 40 40 40
1 1000
= (100 + 400 + 400 + 0 + 100) = = 25.
40 40
As the calculated value of χ = 25 > χ 4, 0.01 which is 13.277, H0 is rejected i.e., there is
2 2

a significant difference in importance among the respondents.


Example 7.70. The manager of a theatre complex with four theatres wanted to see
whether there was difference in popularity of the four movies currently showing for
Saturday afternoon with the following results: 86, 77, 84, 81 customers viewed movies 1,
2, 3 and 4 respectively. Complete the test to see whether there is a difference at the 5%
level of significance.
Solution. Taking the hypothesis that the four movies are equally popular, the expected
frequency of customer for each show
Total Customers 86 + 77 + 84 + 81 328
= = = = 82
Total Number of Theatres 4 4
n
(Oi − Ei ) 2
∴ χ2 = ∑ Ei ,
i =1

where, Oi = Observed frequency and Ei = Expected frequency.


In this case,
(86 − 82 ) 2 (77 − 82 ) 2 (84 − 82 ) 2 (81 − 82 ) 2
χ
2
= + + +
82 82 82 82
1 46
= (16 + 25 + 4 + 1) = = 0.5609.
82 82
As the calculated value of χ = 0.5609 < χ 23 , 0 . 05 which is 7.815, H0 is accepted i.e., there
2

is insufficient evidence to conclude that the four movies will not be equally popular. Hence we
can conclude that the four movies are equally popular.
Example 7.71. Records taken of the number of male and female births in 800 families
having four children are as follows:
No. of male births 0 1 2 3 4
No. of female births 4 3 2 1 0
No. of families 32 178 290 236 94
312 L Probability and Statistics L
Test whether the data are consistent with the hypothesis that the binomial law holds
1
and the chance of male birth is equal to that of female birth, namely p = q = .
2
Solution. Taking the hypothesis that male and female births are equally probable, i.e.,
1
p=q= where p is the probability of male birth and q be the probability of female birth. The
2
expected number of families would be obtained by the expansion of
n r n – r
f (r) = N. Cr p q
where N is the total frequency, f (r) is the number of families with r male children.
0 4−0
∴ f (0) = 800 . 4 C 0 FH 1 IK FH 1 IK = 50;
2 2
1 4 −1
f (1) = 800 . 4 C1 FH 1 IK FH 1 IK = 200
2 2
2 4−2
f (2) = 800 . 4 C 2 FH 1 IK FH 1 IK = 300;
2 2
3 4−3
f (3) = 800 . 4 C 3 FH 1 IK FH 1 IK = 200
2 2
4 4−4
f (4) = 800 . 4 C 4 FH 1 IK FH 1 IK = 50
2 2

(32 − 50) 2 (178 − 200) 2 (290 − 300) 2 (236 − 200) 2 (94 − 50) 2
∴ χ =
2
+ + + +
50 200 300 200 50
= (6.48 + 2.42 + 0.333 + 6.48 + 38.72) = 54.433.
As the calculated value of χ2 = 54.433 > χ24,0.05 which is 9.488, H0 is rejected i.e., the chance
of a male birth is not equal to that of a female birth.
Example 7.72. The following is the distribution of the hourly number of trucks arriving
at a company’s warehouse.
Trucks arriving per hour 0 1 2 3 4 5 6 7 8
Frequency 52 151 130 102 45 12 5 1 2
Fit a Poisson distribution and test for goodness of fit at the 5% level of significance.
Solution. Taking the hypothesis that Poisson fit is a good fit to the data, the mean of
Poisson distribution is given by
Σ fx
m =
Σf

= 0 × 52 + 1 × 151 + 2 × 130 + 3 × 102 + 4 × 45 + 5 × 12 + 6 × 5 + 7 × 1 + 8 × 2


52 + 151 + 130 + 102 + 45 + 12 + 5 + 7 × 1 + 2

= 1010 = 2.02.
500
L Hypothesis Testing L 313
By Poisson distribution the frequency of r successes is

−m mr
f (r) = N × e .
r!
where N is the total frequency, f (r) is the number of trucks arriving per hour.
(2.02 ) 0
∴ f (0) = 500 × e −2 .02 . = 500 × 0.132 = 66
0!
1
f (1) = 500 × e −2 .02 . (2.02 ) = 500 × 0.132 × 2.02 = 133.3 ≈ 134
1!
2
f (2) = 500 × e −2 .02 . (2.02 ) = 134.6 ≈ 135
2!
3
f (3) = 500 × e −2 .02 . (2.02 ) = 90.63 ≈ 91
3!
4
f (4) = 500 × e −2 .02 . (2.02 ) = 45.77 ≈ 46
4!
5
f (5) = 500 × e −2 .02 . (2.02 ) = 18.5 ≈ 19
5!
6
f (6) = 500 × e −2 .02 . (2.02 ) = 6.23 ≈ 6
6!
7
f (7) = 500 × e −2 .02 . (2.02 ) = 1.79 ≈ 2
7!
8
f (8) = 500 × e −2 .02 . (2.02 ) = 0.45 ≈ 1
8!
n
(Oi − Ei ) 2
∴ χ = ∑
2
, where Oi = Observed frequency and Ei = Expected frequency.
i =1
Ei
Expected frequencies are rounded up and adjusted so that frequency should be 500. Since
the frequency should be greater than or equal to 10. Last four values are added to the above
value. There are two linear constraints, one is for total and the other is for calculating m from
the observed values.
In this case,

(52 − 66) 2 (151 − 134) 2 (130 − 135) 2 (102 − 91) 2 (45 − 46) 2 (20 − 28) 2
χ =
2
+ + + + +
66 134 135 91 46 28

= 2.9696 + 2.1567 + 0.1851 + 1.3296 + 0.0217 + 2.2857 = 8.9484.


As the calculated value of χ2 = 8.9484 < χ24,0.05 which is 9.488, H0 is accepted i.e., the
Poisson distribution is a good fit to the given data.

7.19. CHI SQUARE TEST AS A TEST OF INDEPENDENCE


In many business situations, a market researcher might be interested in understanding the
relationship between two variables or to check whether they are independent of each other.
314 L Probability and Statistics L
Suppose we have ‘n’ observations classified according to some attributes. We may test whether
the attributes are related or not (independent). Thus we can find out whether there is any
association between marriage and failure, eye color of husband and wife, whether the quinine
is effective in controlling fever or not etc. In order to test whether the attributes are associated
or not we take the null hypothesis that “there is no association exist between the attributes” under
study or in other words the attributes are independent. The sample data set out into two-way
table, called contingency table. Expected frequency for any cell in a contingency table may be
calculated by using the formula:
Row total × Column total
Expected frequency =
Grand total
Ri × C j
i.e., E ij =
n
where R i = Sum to total of the rows in which Eij is lying
Cj = Sum to total of the columns in which Eij is lying
n = Total sample size
If the calculated value of χ2 is less than the table value at a certain level of significance,
then we say that there is no association between the attributes, i.e., they are independent
otherwise, if the calculated value of χ2 is more than the table value of χ2, then we say that there
is an association between the attributes, i.e., they are not independent.

SOLVED EXAMPLES
SOLVED

Example 7.73. From the following table regarding the colour of eyes of father and
son, test if the colour of son’s eye is associated with that of the father

Eye colour in son


Not Light Light
Not light 50 89

Eye Colour in father


Light 79 782
(Given χ 20.05 for 1 d.f. = 3.84).
Solution. Let us take the hypothesis that there is no association between the colour of the
father’s and that of the son’s eyes i.e., the two attributes are independent.
The above information can be arranged in the form of a 2 × 2 contingency table as follows:
Observed Frequencies (O)
Eye colour in son
Not light Light Total
Eye Colour in father Not light 50 89 139
Light 79 782 861
Total 129 871 1000
L Hypothesis Testing L 315
Expected frequency for each cell has been calculated by using the formula
Row total × Column total
=
Grand total
139 × 129
∴ E11 = ≈ 18
1000
139 × 871
E12 = ≈ 121
1000
129 × 861
E21 = ≈ 111
1000
861 × 871
E22 = ≈ 750
1000

Expected Frequencies (E)


Eye colour in son
Not light Light Total

Eye colour in father Not light 18 121 139


Light 111 150 861
Total 129 871 1000

n
(Oi − Ei ) 2
∴ χ
2
= ∑ Ei ,
i =1

where Oi = Observed frequency and Ei = Expected frequency.


In this case,
(50 − 18) 2 (79 − 111) 2 (89 − 121) 2 (782 − 750) 2
χ
2
= + + +
18 111 121 750
= 57.0 + 9.2 + 8.5 + 1.4 = 76.1
No. of degrees of freedom (ν) = (r – 1) (c – 1) = (2 – 1) (2 – 1) = 1.
As the calculated value of χ = 76.1 > χ 1,0.05 which is 3.84 H0 is rejected i.e., we conclude
2 2

that the colour of son’s eye and the colour of father’s eye are independent i.e. there is no
association between the colour of the father’s and that of the son’s eyes.

Example 7.74. A certain drug is claimed to be effective in curing colds. In an


experiment on 164 people with colds, half of them were given the drug and half of them
given sugar pills. The patients reactions to the treatment are recorded in the following
table. Test the hypothesis that the drug is no better than sugar pills for curing colds.

Helped Harmed No effect

Drug 104 20 40
Sugar pills 88 24 52
(Given χ 20.05 for 2 d.f. = 5.99).
316 L Probability and Statistics L
Solution. Let us take the hypothesis that drug is no better than sugar pills for curing colds
i.e., the two attributes are independent.
The above information can be arranged in the form of a 2 × 3 contingency table as follows :
Observed frequencies (O)

Helped Harmed No effect Total


Drug 104 20 40 164
Sugar Pills 88 24 52 164
Total 192 44 92 328
Expected frequency for each cell has been calculated by using the formula
Row total × Column total
=
Grand total
164 × 192
∴ E11 = = 96,
328
164 × 44
E12 = = 22 ,
328
164 × 92
E13 = = 46
328
192 × 164
E21 = = 96,
328
164 × 44
E22 = = 22 ,
328
164 × 92
E23 = = 46
328
Expected frequencies (E)
Helped Harmed No effect Total
Drug 96 22 46 164
Sugar Pills 96 22 46 164

Total 192 44 92 328

n
(Oi − Ei ) 2
∴ χ
2
= ∑ Ei
i =1

where Oi = Observed frequency and Ei = Expected frequency.


In this case,
(104 − 96) 2 (20 − 22 ) 2 (40 − 46) 2 (88 − 96) 2 (24 − 22 ) 2 (52 − 46) 2
χ
2
= + + + + +
96 22 46 96 22 46
= 0.667 + 0.182 + 0.783 + 0.667 + 0.182 + 0.783 = 3.264.
L Hypothesis Testing L 317
No. of degree of freedom (ν) = (r – 1) (c – 1) = (2 – 1)(3 – 1) = 2.
As the calculated value of χ = 3.264 < χ 2, 0.05 which is 5.99, H0 is accepted i.e., we
2 2

conclude that the result of the experiment does not provide any evidence against the hypothesis.
Therefore, drug is no better than sugar pills in curing colds.
Example 7.75. In an experiment on immunization of cattle from tuberculosis the
following results were obtained:
Affected Unaffected
Inoculated 12 28
Not inoculated 13 7

Examine the effect of vaccine in controlling the incidence of the disease.


Solution. Let us take the hypothesis that inoculation is not effective in preventing from
tuberculosis i.e., vaccine has no effect on the disease.
The above information can be arranged in the form of a 2 × 2 contingency table as follows:
Observed frequencies (O)
Affected Unaffected Total
Inoculated 12 28 40
Not inoculated 13 7 20
Total 25 35 60
Expected frequency for each cell has been calculated by using the formula
Row total × Column total
=
Grand total
40 × 25
∴ E11 = ≈ 17
60
40 × 35
E12 = ≈ 23
60
20 × 25
E21 = ≈8
60
20 × 35
E22 = ≈ 12
60
Expected frequencies (E)
Affected Unaffected Total
Inoculated 17 23 40
Not inoculated 8 12 20
Total 25 35 60
n
(Oi − Ei ) 2
∴ χ2 = ∑ Ei
i =1

where Oi = Observed frequency and Ei = Expected frequency.


318 L Probability and Statistics L
In this case,
(12 − 17) 2 (28 − 23) 2 (13 − 8) 2 (7 − 12 ) 2
χ
2
= + + +
17 23 8 12
= 1.47 + 1.087 + 3.125 + 2.083 = 7.765.
No. of degrees of freedom (ν) = (r – 1)(c – 1) = (2 – 1) (2 – 1) = 1.
As the calculated value of χ = 7.765 > χ 1,0.05 which is 3.84, H0 is rejected i.e., we conclude
2 2

that the result of the experiment does not support the hypothesis. Therefore, inoculation is
effective in preventing the disease.
Example 7.76. Prove that the value of χ for the 2 × 2 contingency table
2

a b
c d
is given by
N ( ad − bc ) 2
χ2 = where N = a + b + c + d.
(a + b) (c + d ) ( a + c) (b + d )
Solution. Using the hypothesis of independence of attributes, we have
Observed Frequencies (O)
a c a + c
b d b + d
a + b c + d N = a + b + c + d
Expected frequency for each cell has been calculated by using the formula
Row total × Column total
=
Grand total
(a + c)(a + b)
∴ E11 =
N
(a + c)(c + d )
E12 =
N
(b + d )(a + b)
E21 =
N
(b + d )(c + d )
E22 =
N
(a + c) (a + b) a(a + b + c + d ) − (a + c)(a + b) ad − bc
Now, a – E11 = a − = =
N N N
1
∴ (a − E11 ) 2 = (ad − bc) 2
N2
Similarly we can verify that
1
(b − E12 ) 2 = (ad − bc) 2 ,
N2
L Hypothesis Testing L 319

(c – E21)2 = 1
(ad − bc) 2 ,
N2
1
(d – E22)
2
= (ad − bc) 2
N2
n
(Oi − Ei ) 2
∴ χ
2
= ∑ Ei
i =1

where Oi = Observed frequency and Ei = Expected frequency.


In this case,

χ
2
=
(ad − bc) 2 LM 1 + 1 + 1 + 1 OP
N 2
NE E E E Q
11 12 21 22

=
(ad − bc) 2 LMRS 1 + 1 UV + RS 1 + 1 UVOP
N NT (a + b) (c + a) (a + b) (b + d) W T (a + c) (c + d) (b + d) (c + d) WQ
=
(ad − bc) 2 LM b + d + a + c + b+d+a+c OP .
N N (a + b ) (c + a ) ( b + d ) (a + b ) ( a + c )( b + d )(c + d ) Q
Example 7.77. The following table gives for a sample of married women, the level of
education and the marriage adjustment score:
Marriage Adjustment
Level of Education Very Low Low High Very High Total
College 24 97 62 58 241
High school 22 28 30 41 121
Middle school 32 10 11 20 73
Total 78 135 103 119 435

Can you conclude from the above data that the higher the level of education, the
greater is the degree of adjustment in marriage?
Solution. Let us take the hypothesis that there is no association between the level of
education and adjustment in marriage i.e., the two attributes are independent.
The above information can be arranged in the form of a 3 × 4 contingency table as follows :
Observed Frequencies (O)

Marriage Adjustment

Level of Education Very Low Low High Very High Total


College 24 97 62 58 241
High school 22 28 30 41 121
Middle school 32 10 11 20 73

Total 78 135 103 119 435


320 L Probability and Statistics L
Expected for each frequency cell has been calculated by using the formula
Row total × Column total
=
Grand total
241 × 78
∴ E11 = ≈ 43,
435
241 × 135
E12 = ≈ 75,
435
241 × 103
E13 = ≈ 57,
435
241 × 119
E14 = ≈ 66
435
121 × 78
E21 = ≈ 22,
435
121 × 135
E22 = ≈ 37,
435
121 × 103
E23 = ≈ 29,
435
121 × 119
E24 = ≈ 33
435
73 × 78
E31 = ≈ 13,
435
73 × 135
E32 = ≈ 23,
435
73 × 103
E33 = ≈ 17,
435
73 × 119
E34 = ≈ 20
435
Expected Frequencies (E)
Marriage Adjustment

Level of Education Very Low Low High Very High Total


College 43 75 57 66 241
High school 22 37 29 33 121
Middle school 13 23 17 20 73

Total 78 135 103 119 435


n
(Oi − Ei ) 2
∴ χ
2
= ∑ Ei ,
i =1
L Hypothesis Testing L 321
where Oi = Observed frequency and Ei = Expected frequency.
In this case,
(24 − 43) 2 (97 − 75) 2 (62 − 57) 2 (58 − 66) 2 (22 − 22 ) 2 (28 − 37) 2
χ
2
= + + + + +
43 75 57 66 22 37

(30 − 29) 2 (41 − 33) 2 (32 − 13) 2 (10 − 23) 2 (11 − 17) 2 (20 − 20) 2
+ + + + + +
29 33 13 23 17 20
= 8.40 + 6.45 + 0.44 + 0.97 + 0 + 2.19 + 0.03 + 1.94 + 27.77 + 7.35 + 2.12 + 0
= 57.66.
No. of degrees of freedom (ν) = (r – 1)(c – 1) = (3 – 1)(4 – 1) = 6
As the calculated value of χ = 57.66 > χ 6, 0.0.5 which is 12.59, H0 is rejected i.e., we
2 2

conclude that higher the level of education, the greater is the degree of adjustment in marriage.
Example 7.78. In a sample survey of public opinion, answer to the questions
(i) Do you drink?
(ii) Are you in favour of local option on sale of liquor? are tabulated below:

Question (i)
Yes No Total
Yes 56 31 87
Question (ii) No 18 6 24
Total 74 37 111
Can you infer whether or not the local option on the sale of liquor is dependent on
individual drink?
Solution. Let us take the hypothesis that the option on the sale of liquor is independent
or not associated with individual drinking.
The above information can be arranged in the form of a 2 × 2 contingency table as follows :
Observed frequencies (O)

Question (i)
Yes No Total

Yes 56 31 87
Question (ii) No 18 6 24
Total 74 37 111
Expected frequency for each cell has been calculated by using the formula:
Row total × Column total
=
Grand total
87 × 74
∴ E11 = = 58,
111
322 L Probability and Statistics L

87 × 37
E12 = = 29
111
74 × 24
E21 = = 16,
111
74 × 37
E22 = =8
111
Expected frequencies (E)

Question (i)
Yes No Total

Yes 56 31 87
Question (ii) No 18 6 24
Total 74 37 111

n
(Oi − Ei ) 2
∴ χ 2
= ∑ Ei ,
i =1

where Oi = Observed frequency and Ei = Expected frequency.


(56 − 58) 2 (18 − 16)2 (31 − 29) 2 (6 − 8) 2
χ
2
In this case, = + + +
58 16 29 8
= 0.0689 + 0.25 + 0.1379 + 0.5 = 0.9568.
No. of degrees of freedom (ν) = (r – 1)(c – 1) = (2 – 1)(2 – 1) = 1.
As the calculated value of χ = 0.9568 < χ 1,0.05 which is 3.84, H0 is accepted i.e., we
2 2

conclude that the sale of liquor is independent or not associated with the individual drinking.
Example 7.79. A movie producer is bringing out a new movie. In order to map out
his advertising campaign he wants to determine whether the movie will appeal most to a
particular age group or whether it will appeal equally to all age groups. The producer
takes a random sample from persons attending a preview of the movie, and obtains the
following results. Use test to derive the conclusion.
Age group
Under 20 20-39 40-59 60 and above
Linked the movie 320 80 110 200
Disliked the movie 50 15 70 60
Indifferent 30 5 20 40
(Given χ 0.05 for 6 d.f. = 12.59)
2

Solution. Let us take the hypothesis that the new movie appeal equally to people of
different age groups.
The above information can be arranged in the form of a 3 × 4 contingency table as follows :
L Hypothesis Testing L 323
Observed Frequencies (O)
Age group
Under 20 20-39 40-59 60 & above Total
Linked the movie 320 80 110 200 710
Disliked the movie 50 15 70 60 195
Indifferent 30 5 20 40 95
Total 400 100 200 300 1000
Expected frequency for each cell has been calculated by using the formula :
Row total × Column total
=
Grand total
710 × 400
∴ E11 = = 284,
1000
710 × 100
E12 = = 71,
1000
710 × 200
E13 = = 142,
1000
710 × 300
E14 = = 213
1000
195 × 400
E21 = = 78,
1000
195 × 100
E22 = = 19.5,
1000
195 × 200
E23 = = 39,
1000
195 × 300
E24 = = 58.5
1000
95 × 400
E31 = = 38,
1000
95 × 100
E32 = = 9.5,
1000
95 × 200
E33 = = 19,
1000
95 × 300
E34 = = 28.5
1000
Expected Frequencies (E)
Age group
Under 20 20-39 40-59 60 & above Total
Liked the movie 284 71 142 213 710
Disliked the movie 78 19.5 39 58.5 195
Indifferent 38 9.5 19 28.5 95
Total 400 100 200 300 1000
324 L Probability and Statistics L
n
(Oi − Ei )
2
∴ χ
2
= ∑ Ei ,
i =1
where Oi = Observed frequency and Ei = Expected frequency.
In this case,
(320 − 284) 2 (80 − 71) 2 (110 − 142 ) 2 (200 − 213) 2 (50 − 78) 2 (15 − 19.5) 2
χ
2
= + + + + +
284 71 142 213 78 19.5
(70 − 39) 2
(60 − 58.5) 2
(30 − 38) 2
(5 − 9.5) 2
(20 − 19) 2
(40 − 28.5) 2
+ + + + + +
39 58.5 38 9.5 19 20
= 4.5634 + 1.1408 + 7.2113 + 0.7934 + 10.0513 + 1.0385 + 24.6410 + 0.0385
+ 1.6842 + 2.1316 + 0.0527 + 0.0572 + 4.6403
= 57.987.
No. of degree of freedom (ν) = (r – 1)(c – 1) = (3 – 1)(4 – 1) = 6.
As the calculated value of χ = 57.987 > χ 6, 0.05 which is 12.59, H0 is rejected i.e., we
2 2

conclude that the new movie does not appeal equally to people of different age groups.
Example 7.80. From the following data find whether there is any significant liking in
the habit of taking soft drinks among the categories of employees.

Employees
Soft Drinks Clerks Teachers Officers Total
Pepsi 10 25 65 100
Thumps Up 15 30 65 110
Fanta 50 60 30 140
Total 75 115 160 350

[Use χ 4, 0.05 = 9.488]


2

Solution. From the given data,


Let the null hypothesis H0 : There is no significance difference among the categories of
employees, liking in the habit of taking soft drinks.
Alternative hypothesis H1 : There is a significant different among the categories of employees,
liking in the habit of taking soft drinks.
Level of significance : α = 0.05, degrees of freedom = (3 – 1) × (3 – 1) = 4 d.f.
χ 4(0.05) = 9.488
2

n
(Oi − Ei ) 2
The test statistic, χ2 = ∑ Ei
i =1

Now the expected cell frequency of


100 × 75 100 × 115
E11 = = 21.4 ; E12 = = 32 .9
350 350
100 × 160 110 × 75
E13 = = 45.7; E21 = = 23.6
350 350
110 × 115 110 × 160
E22 = = 36 .1; E23 = = 50.3
350 350
L Hypothesis Testing L 325
or we can take 110 – (23.6 + 36.1) = 50.3
140 × 75
E31 = = 30
350
140 × 115
E32 = = 46
350
140 × 160
E33 = = 64
350
or we can take as 140 – (30 + 46) = 64

Now the test statistic, χ =


R|10 − 21.4 2
+
25 − 32 .9 2 65 − 45.7 2 15 − 23.6 2 30 − 36.1 2
+ + +
2
S| 21.4 32 .9 45.7 23.6 36.1
T
+
65 − 50.3 2 50 − 36 2 60 − 46 2 30 − 64 2
+ + +
UV
50.3 36 46 64 W
= 60.23 > 9.488
Therefore calcualted value of χ is greater than the table value of χ . So we reject the null hypothesis.
2 2

Hence we conclude that there is a significance difference among the categories of employees,
liking in the habit of taking soft drinks.

EXER CISE 7
EXERCISE 7..12
1. The demand for a particular spare part in a factory was found to vary from day to day as
given below. Test the hypothesis that number of parts demanded does not depend on the
day of the week.
Days Mon. Tue. Wed. Thu. Fri. Sat.
No. of parts demanded 124 125 110 120 126 115
[Ans. χ 2 = 1.68; ν = 5; the demand does not depend on the day of the week]
2. The following table shows the distribution of digits in the numbers chosen at random from
a telephone directory:
Digits 0 1 2 3 4 5 6 7 8 9 Total
Frequency 1026 1107 997 966 1075 933 1107 972 964 853 10000
Test whether the digits may be taken to occur equally frequently in the directory.
[Ans. χ = 58.542; ν = 9; the digits do not occur uniformly in the directory]
2

3. A die is thrown 264 times with the following results:


No. appeared on the die 1 2 3 4 5 6
Frequency 40 32 28 58 54 60
Show that the die is biased. (Given that χ for 5 d.f. at 5% level is 11.07).
2

[Ans. χ = 22; ν = 5; the die is biased]


2

4. The following data given the number of aircraft accidents that occurred during the various
days of a week

Days Mon. Tue. Wed. Thu. Fri. Sat.


No. of accidents 15 19 13 12 16 15
326 L Probability and Statistics L
Test whether the accidents are uniformly distributed over the week.
[Ans. χ 2 = 11.07; ν = 5; the accidents may be regarded to occur uniformly over the week]
5. A sample analysis of examination results of 500 students was made. It was found that 220
students had failed, 170 had scored a third class, 90 were placed in second class and 20 got
a first class. Do these figures commensurate with the general examination result which is in
the ratio of 4 : 3 : 2 : 1 for the various categories respectively.
[Ans. χ 2 = 23.67; ν = 3; the figures do not commensurate with the general examination results]
6. A company is concerned about the increasing violent alterations between its employees.
The number of violent incidents recorded by the management during six randomly selected
months is given below:

Months Jan. Feb. Mar. Apr. May Jun

No. of violent incidents 55 65 68 72 80 85

Use 5% level of significance to determine whether the data fits a uniform distribution.
[Ans. χ 2 = 6.65; ν = 5; the no. of violent alterations is uniformly distributed over the month]
7. A survey of 320 families with 5 children revealed the following distribution:

No. of boys 0 1 2 3 4 5

No. of girls 5 4 3 2 1 0

No. of families 12 40 88 110 56 14

Is this result consistent with the hypothesis that male and female births are equally probable?
[Ans. χ 2 = 7.16; ν = 5; the male and female births are equally probable]
8. The theory predicts that the proportion of beans in the four groups A, B, C and D should be
9:3:3:1. In an experiment it was observed that the number of four groups A, B, C and D are
882, 313, 287 and 118. Does the experiment result support the theory?
(Given that χ2 for 3 d.f. at 5% level is 7.815)
[Ans. χ 2 = 4.7266; n = 3; the experimental result support the theory]
9. A book has 700 pages. The number of pages with various numbers of misprints is recorded
below. At 5% significant level are the misprints distributed according to Poisson law?

No. of misprints (X) 0 1 2 3 4 5

No. of pages with X misprints 616 70 10 2 1 1

[Ans. χ 2 = 38.812; ν = 2; the misprints are not distributed according to Poisson law]
10. Twelve dice were thrown 4096 times and a throw of 6 was considered a success. The
observed frequencies were as given below:
No. of successes 0 1 2 3 4 5 6 7 and over
Frequency 447 1145 1180 796 380 115 25 8
[Ans. χ = 3.76; ν = 5; the dice were unbiased]
2
Test whether the dice were unbiased.
11. Fit a binomial distribution for the following data and also test the goodness of fit.

x 0 1 2 3 4 5 6 Total

f 5 18 28 12 7 6 4 80

[Ans. χ 2 = 6.39; ν = 2; the binomial fit for the given distribution is not satisfactory]
L Hypothesis Testing L 327
12. Fit a Poisson distribution for the following distribution and also test the goodness of fit.

x 0 1 2 3 4 5 Total

f 142 156 69 27 5 1 400


[Ans. χ = 1.09; ν = 2; the Poisson fit for the given distribution is satisfactory]
2

13. In experiments on pea-breeding, Mendal got the following frequencies of seeds :


315 round and yellow, 101 wrinkled and yellow, 108 round and green, 32 wrinkled and
green : total 556. Theory predicts that the frequencies should be in the proportion 9:3:3:1.
Examine the correspondence between theory and experiment. (χ2 for 3 d.f. at 5% level = 7.815).
[Ans. χ 2 = 0.513; ν = 3; there is corresponding between theory and experiment]
14. The following data show defective articles produced by 4 machines:

Machine A B C D

Production time 1 1 2 3

No. of defectives 12 30 63 98

Do the figures indicate a significant difference in the performance of the machines?


[Ans. χ 2 = 11.82; ν = 3; there is significant difference in the performance of machines]
15. The following table gives the classification of 100 workers according to sex, and the nature
of work. Test whether nature of work is independent of the sex of the worker.

Skilled Unskilled Total


Male 40 20 60
Female 10 30 40
Total 50 50 100

(Given that χ for 3 d.f. at 5% level is 7.814)


2

[Ans. χ 2 = 16.16; ν = 1; nature of work is dependent on the sex of the worker]


16. In an experiment on immunization of cattle from tuberculosis the following results were
obtained:
Affected Unaffected
Inoculated 12 26
Not inoculated 16 8
Examine the effect of vaccine in controlling the susceptibility to the disease. (Given that χ2
for 1 d.f. at 5% level is 3.84)
[Ans. χ 2 = 9.478; ν = 1; vaccine is effective in controlling the susceptibility to tuberculosis]
17. Calculate the expected frequencies for the following data presuming the two attributes viz.,
condition of home and condition of child as independent
Condition of home

Clean Dirty
Condition of child Clean 70 50
Fairly clean 80 20
Dirty 35 45
328 L Probability and Statistics L
Use Chi-square test at 5% level of significance to state whether the two attributes are
independent. (Given that χ2 for 2 d.f. at 5% level is 5.99)
[Ans. χ 2 = 25.636; ν = 2; there exist association between the attributes]
18. The following table shows the results of inoculation against cholera in certain tea-estate:
Not attacked Attacked Total
Inoculated 469 31 500
Not inoculated 1315 185 1500
Total 1784 216 2000

Find out whether there is any significant association between inoculation and attack. (Given
that χ for 1 d.f. at 5% level is 3.84)
2

[Ans. χ 2 = 14.54; ν = 1; inoculation is effective in preventing the attack of cholera]


19. An insurance company has introduced a new insurance scheme for employees. In independent
sample of 100 males and 120 females when examined to know their views about new
schemes give the following results

For Against Indifferent


Male 25 40 35
Female 35 55 30

Test the hypothesis at 1% level. (Given that χ = 2.198; ν = 2; the sample come from
2

homogeneous population)
20. The employment bureau located in a city received 200 applications in the month of June,
2011 for registration. A tabular presentation of the applications according to sex and level
of education was found to be as under :

Sex

Male Female

Undergraduates 30 10
Graduates 70 20
Postgraduates 20 50

Do these data provide adequate evidence to indicate that the level of education is related
to sex? Use 5% level of significance. [Ans. χ 2 = 44.4; ν = 2; level of education is related to sex]
21. A survey of radio listeners’ preference for two types of music under various age groups
gave the following information.

Age group

Type of music 19-25 26-35 Above 36

Carnatic music 80 60 90

Film music 210 325 44

Indifferent 16 45 132

Is preference for type of music influenced by age?


[Ans. χ 2 = 373.40; ν = 4; preference for type of music influenced by age]
L Hypothesis Testing L 329
22. A certain drug was administered to 456 males out of a total 720 in a certain locality to test
its efficacy against typhoid. The incidence of typhoid is shown below. Find out the
effectiveness of the drug against the disease.

Infection No infection Total

Administering the drug 144 312 456

Without administering the drug 192 72 264

Total 336 384 720

[Ans. χ 2 = 113.745; ν = 1; the drug is certainly effective in controlling typhoid]


23. A marketing agency gives you the following information about the age-groups of the sample
informants and their liking for a particular model of car which a company plans to introduce:

Age group of Informants

Below 20 20-39 40-59 Total


Liked 125 420 60 605
Disliked 75 220 100 395

Total 200 640 160 1000


On the basis of above data, can it be concluded that the model appeal is independent of
the age-group of the informants?
[Ans. χ = 42.788; ν = 2; model appeal is not independent of the age-group of the informants]
2

24. On the basis of information given below about the treatment of 200 patients suffering from
a disease, state whether the new treatment is comparatively superior to the conventional
treatment

Favourable Not favourable Total


New 60 30 90
Conventional 40 70 110

Total 100 100 200


[Ans. χ = 18.18; ν = 1; new treatment is superior to conventional treatment]
2

25. Given the following contingency table for hair colour and eye colour. Find the value of χ .
2

Is there good association between the two?

Hair Colour

Fair Brown Black Total

Blue 15 5 20 40

Eye Colour Gray 20 10 20 50

Brown 25 15 20 60

Total 60 30 60 150

[Ans. χ = 3.646, ν = 4; hair colour and eye colour are independent]


2
330 L Probability and Statistics L
26. The Vice President (sales) of a garment company wants to determine whether a sale of the
company’s brand of jeans is independent of the age group. He has appointed a marketing
researcher for this purpose. This marketing researcher has taken a random sample of 703
consumers who have purchased jeans. The researcher conducted survey for three brands of
the jeans, namely, Brand I, Brand II and Brand III. The researcher has also divided the age
group into four groups as shown below:

Brand → Brand I Brand II Brand III Total

Age

15-25 65 75 72 212

26-35 60 40 64 164

36-45 45 52 50 147

46-55 55 65 60 180

Total 225 232 246 703

Determine whether brand preference in independent of age group at 5% level of significance.


[Ans. χ 2 = 7.23; ν = 6; brand preference is independent of age group]
27. A large electronic firm hires many workers with disabilities wants to determine whether
their disabilities affect such workers’ performance. Use the level of significance at 5% level
to decide on the basis of the following data, whether it is reasonable to maintain that the
disabilities have no effect on the workers’ performance.

Above Average Average Below Average

Blind 21 64 17

Deaf 16 49 14

No disability 29 93 28

[Ans. χ 2 = 0.18; ν = 4; disabilities have no effect on the workers’ performance]

GGG

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy