Hypothesis Testing II
Hypothesis Testing II
CHAPTER
Hypothesis Testing
INSIDE THIS CHAPTER
7.1. Sampling Distributions 7.2. Standard Error 7.3. Test of Significance 7.4. Testing of Statistical
Hypothesis 7.5. One-Tailed and Two-Tailed Tests 7.6. Errors in Sampling 7.7. Test of Significance For Large
Samples 7.8. Test of Significance for Small Samples 7.9. The ‘t’ Distribution or Student’s ‘t’ Distribution 7.10. Application
2
of the t-Distribution 7.11. Variance Ratio or F-Test 7.12. Analysis of Variance 7.13. Chi-Square Test (χ –
2 2
Test) 7.14. Characteristics of Chi-Square (χ ) Distribution 7.15. Uses of Chi-Square (χ ) 7.16. Conditions for
2
Applying Chi-Square (χ ) Test 7.17. Degree of Freedom 7.18. Chi Square Test of Goodness of FIT 7.19. Chi
Square Test as a Test of Independence
SE p = P(1 − P)
, if P is known
n
where P = population proportion, n = sample size
Acceptance region (1 – α)
(H0 is accepted)
–z α µ = µ0 +z α
2 2
Critical Values
Fig. 7.1. Acceptance and rejection regions of null hypothesis (two-tailed test)
L Hypothesis Testing L 237
In other words, all possible values which a test-statistic may assume can be divided into two
mutually exclusive groups: one group consisting of values which appear to be consistent with the
null hypothesis and the other having values which are unlikely to occur if Ho is true. The first
group is called the acceptance region and the second set of values is known as the rejection region
for a test. The rejection region is also called the critical region. The value(s) that separates the critical
region from the acceptance region is called the critical value(s). The critical value which can be in
the same units as the parameter or in the standardized units, is to be decided by the experimenter
keeping in view the degree of confidence he (she) is willing to have in the null hypothesis.
Step 5: Decision Rule
The last step is the decision about the null hypothesis i.e., whether to accept it or reject it.
In this regards, we compare the calculated value of the test statistic (which was found I step 2)
with the critical value (also called the standard table value of test statistic as computed from
step 4) at level of significance α and decide as under:
(a) If we test the hypothesis at, say, 5% level of significance and the observed set of results
have a probability of more than 5%, we consider that the difference between the sample
statistic and the hypothesized population parameter as not significant. In other words, if the
calculated value of statistic is less than the tabled vale at a specified level of significance α,
then the difference is not significant and this difference may be due to fluctuation of
sampling. So we accept the null hypothesis and reject the alternative hypothesis.
(b) On the other hand, if the observed set of results have a probability of less than 5%, we
consider that the difference between the sample statistic and the hypothesized population
parameter as significant. In other words, if the calculated value of statistic is more than
the tabled value at a specified level of significance (say, 5%), the computed value of test
statistic falls in the rejection region. So we reject the null hypothesis and accept the
alternative hypothesis.
x =
F σ IJ
N GH µ,
2
n K
H0 : There is no significant difference between sample mean ( x ) and population mean µ.
Test statistic :
x−µ
Z = ~ N (0, 1)
σ/ n
If σ is unknown, then it is estimated by sample variance i.e., σ = s (large n).
2 2 2
SOLVED EXAMPLES
SOLVED
Example 7.1. A random sample of 900 members has a mean 3.4 cms. Can it be
reasonably regarded as a sample from a large population of mean 3.25 cms and standard
deviation 2.61 cms?
Solution. Here n = 900, x = 3.4, µ = 3.25, σ = 1.61
H0 : The sample has been drawn from the normal population with mean µ = 3.2 and standard
deviation σ = 2.61.
H1 : µ ≠ 3.25 (two tailed test)
x−µ 3.4 − 3.25
Under H0, Z = = = 1.724
σ/ n 2.61 / 900
As the calculated value of | Z | = 1.724 < 1.96 the significant value of Z at 5% level of
significance, H0 is accepted i.e., the sample is drawn from the normal population with mean
µ = 3.2 and standard deviation σ = 2.61.
Example 7.2. A manufacturer claims that the average mileage of scooters of his company
is 40 kms/litre. A random sample of 38 scooters of the company showed an average mileage
of 42 kms/litre. Test the claim of the manufacturer on the assumption that the mileage of
scooter is normally distributed with a standard deviation of 2 kms/litre.
Solution. Here n = 38, x = 42 , µ = 40, σ=2
H0 : Mileage of scooter is normally distributed with a standard deviation of 2 kms/litre.
H1 : µ ≠ 40 (two tailed test).
x −µ 42 − 40
Under H0, Z = = = 6.16 .
σ/ n 2 / 38
As the calculated value of | Z | = 6.16 > 1.96 the significant value of Z at 5% level of
significance, H0 is rejected i.e. mileage of scooter is normally distributed with a standard
deviation of 2 kms/litre.
Example 7.3. A stenographer claims that she can type at the rate of 120 words per
minute. Can we reject her claim on the basis of 100 trials in which she demonstrates a
mean of 116 words with a standard deviation of 15 words? Use 5% level of significance.
Solution. Here n = 100, x = 116, µ = 120, s = 15
240 L Probability and Statistics L
116 − 120
= = − 2.67
15 / 100
As the calculated value of | Z | = 2.67 > 1.96 the significant value of Z at 5% level of
significance, H0 is rejected i.e. stenographer’s claim is not true.
Example 7.4. The mean life of a sample of 400 fluorescent bulbs produced by a
company is found to be 1570 hours with a standard deviation of 150 hours. Test the
hypothesis that the mean life time of the bulbs produced by the company is 1600
against the alternative hypothesis that it is greater than 1600 hours at 1% levels of
significance.
Solution. Here n = 400, µ = 1600, x = 1570, s = 150
H0 : Mean life time of bulbs is 1600 hours, that is H0 : µ = 1600.
H1 : µ > 1600 (right tailed test)
Under H0,
x −µ x −µ
Z = = , since σ is not known
σ/ n s/ n
x −µ
Z =
s/ n
x −µ x −µ
= =
σ/ n s/ n
38 − 32
=
5.8 / 64
= 8.27 > 1.645
The calculated value of Z is greater than the table value of Z. So we reject the null
hypothesis. Hence, we conclude that the average lifespan of mice is greater than 32 months, i.e.,
the nutritious food affects the average lifespan of mice.
Example 7.8. According to the norms established for a mechanical aptitude test,
persons who are 18 years old have an average height of 73.2 with a standard deviation of
8.6. If 45 randomly selected persons of that age averaged 76.7, test the null hypothesis
µ = 73.2 against the alternative hypothesis µ > 73.2 at the 0.01 level of significance.
242 L Probability and Statistics L
Solution. From the given data,
n = 45, x = 76.7, µ = 73.2, σ = 8.6
Null hypothesis H0 : µ = 73.2
Alternative hypothesis H1 : µ > 73.2 (Use right tailed test)
Level of significance = 99% or probability is 0.01 is 2.33 i.e., zα = 2.33
x−µ
The test statistic, Z =
σ/ n
76.7 − 73.2
=
8.6 / 45
= 2.73 > 2.33
Table value Zα = 2.33.
Calculated value of Z is greater than the table value of Z. So we reject the null hypothesis.
Hence we conclude that µ = 73.2.
Example 7.9. An oceanographer wants to check whether the depth of the ocean in a
certain region is 57.4 fathoms, as had previously been recorded. What can be conclude at
the level of significance α = 0.05, if reading taken at 40 random locations in the given
region yielded a mean of 59.1 fathoms with a standard deviation of 5.2 fathoms.
x−µ
∴ The test statistic, | Z | =
s/ n
59 .1 − 57 .4 1.7 × 6 .325
= =
5 .2 / 40 5.2
= 2.06 > 1.96
The calculated value of Z is greater than the table value of Z.
So we reject the null hypothesis. Hence the oceanographer concludes that the depth of the
ocean in a certain region is 57.4 rejected.
Example 7.10. A trucking firm suspects the claim that the average life time of certain
tyres is at least 28,000 miles. To check the claim the firm puts 40 of these tyres on its trucks
and gets a mean life time of 27,463 miles with a standard deviation of 1348 miles. What
can it conclude if the probability of a type 1 error is to be atmost 0.01 ?
Solution. From the given data,
µ = 28,000, n = 40, x = 27,463, s = 1348.
L Hypothesis Testing L 243
Null hypothesis, H0 : µ = 28,000
Alternative hypothesis H1 : µ < 28,000 (Use left tailed test)
The level of significance α = 0.01 is –2.33
x−µ
The test statistic, Z =
σ/ n
27463 − 28000
Z =
1348 / 40
= –2.52 < –2.33
So the calculated value of Z is less than the table value of Z. We reject the null hypothesis
(since it is left tail test).
Hence the claim is rejected.
Example 7.11. A sample of 100 iron bars is said to be drawn from a large number
of bars whose lengths are normally distributed with mean 4 feet and S.D. 0.6 ft ? If the
sample mean is 4.2 ft, can the sample be regarded as a truly random sample ?
Solution. From the given data,
n = 100,
µ = 4,
σ = 0.6,
x = 4.2
Let the null hypothesis H0 : µ = 4
Against the alternative hypothesis H1 : µ ≠ 4 (use two tail test)
Level of significance α is 0.05 is 1.96
x −µ
Then the test statistic |Z| =
S. E ( x )
x −µ
=
σ/ n
Therefore the test statistic,
x −µ
|Z| =
σ/ n
4 .2 − 4
=
0 .6 / 100
= 3.33 > 1.96
The calculated value of Z is greater than the table value of Z. So we reject the null
hypothesis. Hence we conclude that the sample does not come from the same population having
mean 4 and standard deviation of 0.6.
244 L Probability and Statistics L
EXER CISE 7
EXERCISE 7..1
1. Explain the concept of sampling distribution and standard error. Discuss the role of standard
error in the large sample theory.
2. What is test of significance? Explain the concept of standard error related to that.
3. Distinguish between:
(a) Parameter and Statistic
(b) Standard Deviation and Standard Error
(c) Left tailed test and right tailed test
(d) Type I error and Type II error
4. Distinguish between null and alternative hypothesis. State the null and alternative hypothesis
regarding population mean that lead to (i) left tailed test, (ii) right tailed test, and (iii) two-
tailed test.
5. Explain clearly the procedure followed in testing of a hypothesis.
6. A sample of 100 students is taken from a large population. The mean height of these
students is 65 inches and the standard deviation 4 inches. Can it be reasonably regarded
that the population mean height is 66 inches? [Ans. Difference is significant]
7. A random sample of 200 tins of groundnut oil gave an average weight of 4.95 kg with a
standard deviation 0.21 kg. Should we accept the hypothesis of net weight of 5 kg per tin
at 1% level of significance? [Ans. No]
8. The heights of college students in a city are normally distributed with S.D. 6 cms. A sample
of 1000 students has mean height 158 cms. Test the hypothesis that the mean height of
college students in the city is 160 cms. [Ans. H0 accepted at 5% level]
9. An auto company decided to introduce a new six cylinder car whose mean petrol
consumption is claimed to be lower than that of the existing auto engine. It was found that
the mean petrol consumption for the 50 cars was 10 km per litre with a standard deviation
of 3.5 km per litre. Test for the company at 5% level of significance, whether the claim the
new car petrol consumption is 9.5 km per litre on the average is acceptable.
[Ans. Company’s claim is acceptable]
10. It has previously been recorded that the average depth of ocean at a particular region is
67.4 fathoms. Is there reason to believe this at 0.01 level of significance if the reading at
40 random locations in that particular region showed a mean of 69.3 with standard deviation
of 5.4 fathoms. [Ans. Null hypothesis is accepted]
11. A sample of 64 students have a mean weight of 70 kg. Can this be regarded as a sample
from a population with mean weight 65 kgs and standard deviation 25 kg.
[ Ans. Null hypothesis is accepted]
12. The mean breaking strength of the cables supplied by a manufacturer is 1800 with a standard
deviation of 100. By a new technique in the manufacturing process it is claimed that the
breaking strength of the cables has increased. In order to test this claim a sample of 50
cables is tested. It is found that the mean breaking strength is 1850. Can we support the
claim at 0.01% level of significance? [Ans. No]
n1 s12 + n 2 s 22
Note 3: If σ is not known and σ1 and σ2, we use σ 2 = to calculate σ.
n1 + n 2
SOLVED EXAMPLES
SOLVED
Example 7.12. A college conducted both day and night classes intended to be identical.
A sample of 100 day students yields examination results as under:
x1 = 72.4 and s1 = 14.8
A sample of 200 night students yields examination results as under :
x 2 = 73.9 and s2 = 17.9
Are the two means statistically equal at 10% level ?
Solution. Here n1 = 100, n2 = 200
x1 = 72.4, x 2 = 73.9
σ 1 = 14.8, σ 2 = 17.9
H0 : The two means are statistically equal i.e., H0 : µ 1 = µ 2
H1 : µ 1 ≠ µ 2 (two tailed test)
x1 − x 2 72 .4 − 73.9
Under H0, Z = =
s12 s 22 (14.8 ) 2 (17.9 ) 2
+ +
n1 n2 100 200
72 .4 − 73.9
= = − 0 .7704
1.95
As the calculated value of | Z | = 0.7704 < 1.645, the significant value of Z at 10% level
of significance, H0 is accepted i.e., the two means are statistically equal.
246 L Probability and Statistics L
Example 7.13. Mean and standard deviation calculated from the weights in kg. of
students of two groups taken from two universities are given below:
47 − 49
= = − 1.47
1.36
As the calculated value of | Z | = 1.47 < 1.96 the significant value of Z at 5% level of
significance, H0 is accepted i.e., there is no significant differences between two mean level of
wages.
Example 7.15. A random sample of 200 villages was taken from a certain district and
the average population per village was found to be 485 with standard deviation of 50.
Another random sample of 200 villages from the same district gave an average population
L Hypothesis Testing L 247
of 510 per village with standard deviation of 40. Is the difference between the averages of
the two samples significant ? Justify your answer.
Solution. Here n1 = 200, n2 = 200
x1 = 485, x 2 = 510
s1 = 50, s2 = 40
H0 : There is no significant differences between two mean values of the samples i.e.,
H0 : µ 1 = µ 2
H1 : µ1 ≠ µ2 (two tailed test)
x1 − x 2 485 − 510
Under H0, Z = =
s12 s 22 (50 ) 2 ( 40 ) 2
+ +
n1 n2 200 200
485 − 510
= = − 5 .52
4.527
As the calculated value of | Z | = 5.52 > 1.96, the significant value of Z at 5% level of
significance. H0 is rejected i.e., there is a significant differences between mean values of the two
samples.
Example 7.16. The mean weight of 50 male students who showed above average
participation in school athletics was 68.2 kgs with a standard deviation of 2.5 kgs. While
50 male students who showed no interest in such participation had a mean weight of
67.5 kgs with a standard deviation of 2.8 kgs. Test the hypothesis that male students who
participate in school athletics are healthier than other male students.
Solution. Here n1 = 50, n2 = 50
x1 = 68.2, x 2 = 67.5
s1 = 2.5, s2 = 2.8
H0 : There is no differences between mean weight of male students who participate in
atheletics i.e., H0 : µ1 = µ 2
H1 : µ1 > µ2 (right tailed test)
x1 − x 2 68.2 − 67.5
Under H0, Z = =
s12 s 22 ( 2 .5 ) 2 (2 .8 ) 2
+ +
n1 n2 50 50
= − 0.7 = − 1.3188
0.5308
As the calculated value of | Z | = 1.3188 < 1.645 the significant value of Z at 5% level of
significance, H0 is accepted i.e., average weight of the male students who participate in school
atheletics is same as the average weight of other male students in school.
Example 7.17. The research investigator was interested in studying whether there is a
significant difference in the salaries of MBA grades in two metropolitan cities. A random
sample size 100 from Mumbai yields on average income of Rs. 20,150. Another random
sample of 60 from Chennai results in an average income of Rs. 20,250 if the variances of
both the populations are given as σ 12 = Rs. 40,000 and σ 12 = Rs. 32,400 respectively.
248 L Probability and Statistics L
Solution. From the given data, 1’s related to MBA grades in Mumbai and 2’s related to
MBA grades in Chennai
n1 = 100, x1 = 20,150, σ 12 = Rs. 40,000
and n2 = 60, x 2 = 20,250, σ 12 = Rs 32,400
To test the significance difference between the two population means µ1 and µ2 (or two
sample means x1 and x 2 ).
Let the null hypothesis H0 : µ1 = µ2
against the alternative hypothesis H1 : µ1 ≠ µ2 (use two tail test)
The critical region for Z, level of significance α and for 0.05 is 1.96.
Then we can use the test statistic
x1 − x 2 20150 − 20250
|Z| = =
σ 12 σ 22 40000 32400
+
+ 100 60
n1 n2
= 3.26 > 1.96
The calculated value of Z is greater than the table value of Z at 0.05 level of significance,
so we reject the null hypothesis. Hence we conclude that there is a significant difference between
the salaries of MBA grades in two metropolitan cities.
Example 7.18. IQ test on two groups of boys and girls gave the following results:
Mean of Girls = 78, S.D. = 10, n = 30
Mean of Boys = 78, S.D. = 13, n = 70
Is there any significance in the mean score of girls and boys at 5% level of significance ?
Solution. From the given data, 1’s related to girls and 2’s related to boys
n1 = 30, x1 = 78, s1 = 10
and n2 = 70, x 2 = 78, s2 = 13
To test the significance difference between the two population means µ1 and µ2 (or two
sample means x1 and x 2 ).
Let the null hypothesis H0 : µ1 = µ2
Against the alternative hypothesis H1 : µ1 ≠ µ2 (use two tail test)
The critical region for Z, level of significance α and for 0.05 is 1.96
Then we can use the test statistic
x1 − x 2 x1 − x 2
|Z| = =
s12 s 22 s12 s 22
+ +
n1 n2 n1 n 2
= 78 − 78
100 169
+
30 70
= 0 < 1.96
L Hypothesis Testing L 249
The calculate value of Z is less than the table value of Z at 0.05 level of significance. So we
need not reject the Null hypothesis. Hence we conclude that there is no significant difference
between the two groups’ girls and boys.
EXER CISE 7
EXERCISE .2
7.2
1. The means of two single large samples of 1000 and 2000 members are 67.5 inches and 68.0
inches respectively. Can the samples be regarded as drawn from the same population of
standard deviation 2.5 inches? [Ans. H 0 rejected at 5% level]
2. An examination was given to 50 students of Hindu College and 60 students at Hans Raj
College of Delhi University. The Hindu college has mean score 75 with S.D. 9 and Hans
Raj college has mean score 79 with S.D. 7. Is there a significant difference between the
mean score of two colleges? Test the performance at 5% level. [Ans. H 0 rejected at 5% level]
3. Intelligence Test given to two groups of boys and girls gave the following results:
Is there a significant difference in mean score of boys and girls? [Ans. H0 rejected at 5% level]
4. In two large populations there are 30% and 25% respectively of fair haired people. Is this
difference likely to be hidden in samples of 1200 and 900 respectively from the two
populations? [Ans. H 0 rejected at 5% level]
5. If 60 new entrants in a given university are found to have a mean height of 68.60 inches
and 50 seniors a mean height of 69.51 inches, is the evidence conclusive that the men
height of the seniors is greater than that of the new entrants? Assume the standard deviation
of the height to be 2.48 inches? [Ans. H 0 rejected at 5% level]
6. A man buys 50 electric bulbs of ‘Wipro’ and 50 electric bulbs of ‘Philips’. He finds that
‘Wipro’ bulbs gave an average life of 1500 hours with a standard deviation of 60 hours and
‘Philips’ bulbs gave an average life of 1512 hours with a standard deviation of 80 hours.
Is there a significant difference in the mean life of the two makes of bulbs?
[Ans. H0 accepted at 5% level]
7. In a survey of buying habits, 400 women shoppers are chosen at random in super market ‘A’
located in a certain section of the city. Their average weekly food expenditure is Rs. 250
with a standard deviation of Rs. 40. For 400 women shoppers chosen at random in super
market ‘B’ in another section of the city, the average weekly food expenditure is Rs. 220
with a standard deviation of Rs. 55. Test at 1% level of significance whether the average
weekly food expenditure of the two proportions of shoppers are equal.
[Ans. H 0 rejected at 5% level]
E(p) = E
F X I = 1 E ( X ) = nP = P
H nK n n
V(p) = V
F X I = 1 V ( X ) = nPQ = PQ
H nK n 2 2
n n
PQ
S.E.(p) =
n
p − E ( p) p−P
Z = = ~ N ( 0 , 1)
S . E .( p ) PQ / n
Note 1. The probable limits for the observed proportion of successes are given by
p ± 3 PQ / n .
Note 2. If P is not known, it is approximated by p and the limits for the proportion in population
(on the basis of sample proportions are taken as p ± Z α PQ / n , where Q = 1 – P and Zα is
the significant value of Z at level of significance α.
SOLVED EXAMPLES
SOLVED
Example 7.19. A die is thrown 9000 times and a throw of 3 or 4 is observed 3240 times.
Check whether the die can not be regarded as an unbiased one and find the limits between
which the probability of a throw of 3 or 4 lies.
Solution. Here n = 9000, x = 3240. Then
P = Probability of getting 3 or 4 in die
2 1
= =
6 3
2
∴ Q = 1− P =
3
X 3240
p = = = 0.36
n 9000
1
H0 : The die is unbiased i.e., H0 : P =
3
1
H1 : P ≠ (two tailed test)
3
L Hypothesis Testing L 251
1
p−P 0.36 −
Under H0, Z = = 3
PQ / n 1 2 1
× ×
3 3 9000
= 6.053
As the calculated value of | Z | = 6.053 > 1.96 the significant value of Z at 5% level of
significance, H0 is rejectred i.e., die is biased.
Then the probable limits for p are
PQ 0.36 × 0.64
p±3 = 0.36 ± 3
n 9000
= 0.36 ± 0.015 = 0.375 = 0.345
Hence, probability of getting 3 or 4 lies between 0.345 and 0.375
Example 7.20. A wholesaler in apples claims that only 4% of the apples supplied by
him are defective. A random sample of 600 apples contained 36 defective apples. Test the
claim of the wholesaler.
Solution. Here n = 600, x = 36. Then
P = Probability of getting a defective apple
4
= = 0.04
100
∴ Q = 1 – P = 0.96
X 36
p = = = 0.06
n 600
We have to test H0 : P = 0.04
H1 : P > 0.04 (right tailed test)
p−P 0 .06 − 0 .04
Under H0, Z = =
PQ / n 1
0 .04 × 0 .96 ×
600
= 2.5
As the calculated value of | Z | = 2 .5 > 1.645 the significant value of Z at 5% level of
significance, H0 is rejected.
Example 7.21. In a sample of 1000 people, 540 are rice eaters and the rest are wheat
eaters. Can we assume that both rice eater and wheat eater are equally popular at 1% level
of significance?
Solution. Here n = 1000, x = No. of rice eaters = 540. Then
1
P = Probability of rice eaters = = 0.5
2
∴ Q = 1 – P = 0.5
X 540
p = = = 0.54
n 1000
252 L Probability and Statistics L
H0 : Both rice and wheat are equally popular i.e., H0 : p = 0.5
H1 : p ≠ 0.5 (two tailed test)
p−P 0 .54 − 0 .5
Under H0, Z = =
PQ / n 1
0 .5 × 0.5 ×
1000
= 2.532
As the calculated value of | Z | 2.532 < 2.58 the significant value of Z at 1% level of
significance, H0 is accepted i.e., rice and wheat are equally popular.
Example 7.22. A sample of 900 days is taken from meteorological records of a certain
district and 100 of them are found to be foggy. What are the probable limits to the
percentage of foggy days in the district?
Solution. The proportion of foggy days in the sample of 900 days is
100 1
P = =
900 9
8
∴ Q = 1− P =
9
Probable limits for p are
PQ
p ± 3
n
0.1111 × 0.888
= 0.1111 ± 3 = 0.1111 ± 3 × 0.0105
900
= 0.0796 × 100% and 0.1426 × 100%
= 7.96% and 14.26%
Hence, the percentage of foggy days lies between 7.96 and 14.26.
Example 7.23. An insurance company states that 90% of its claims are settled within
30 days. A consumer group selected a simple random sample of 75 of the company’s claims
to test this statement. The consumer group found that 55 of the claims were settled within
30 days. At the 0.05 significance level, test the company’s claim that 90% of its claims are
settled within 30 days.
Solution. Here n = 75, x = No. of claims settled within 30 days = 55. Then
90
P = = 0.9
100
∴ Q = 1 – P = 0.1
X 55
p = = = 0.73
n 75
H0 : Claim that 90% of the company’s insurance claims are settled within 30 days is true i.e.,
H0 : p ≥ 0.9
H1 : p < 0.9 (left tailed test)
p−P 0 .73 − 0.9
Under H0, Z = = = –4.811
PQ / n 1
0.9 × 0 .1 ×
75
L Hypothesis Testing L 253
As the calculated value of Z is –4.811 is less than –1.645, the significant value of Z at 5%
level of significnace, H0 is rejected i.e., the sample data warrants acceptance of the claim that
90% of the company’s insurance claims are settled within 30 days.
Example 7.24. Twenty people were attacked by a disease and only 18 survived.
Will you reject the hypothesis that the survival rate if attacked by this diesease is 85% in
favour of the hypothesis that is more at 5% level.
Solution. From the given data, n = 20 and P = 0.85 and Q = 0.15
18
The observed proportion p = = 0.9.
20
Let the null hypothesis H0 : P = 0.85
Alternative hypothesis H1 : P > 0.85 (use right tail test)
The table value of Z at 5% level is 1.645
p−P
∴ The test statistic | Z | = S. E ( p)
PQ 0.85 × 0.15
S.E(p) = = = 0.08
n 20
0.9 − 0.85
∴ The test statistic | Z | = = 0.625 < 1.645
0.08
∴ The calculated Z value is less than the table value of Z. So we need not reject the null
hypothesis. Hence, we conclude that the survival rate is not more than 85%.
Example 7.25. A social worker believes that fewer than 25% of the couples in a certain
area have ever used any form of birth control. A random sample of 120 couples was
contacted. Twenty of them said that they have used. Test the belief of the social worker
at 0.05 level.
Solution. From the given data, n = 120.
20 1
The observed proportion p = = = 0.167
120 6
25
and population proportion P = = 0.25 Q = 0.75
100
Now the null hypothesis H0 : P = 0.25 and the alternative hypothesis H1 : P > 0.25 (Use right
tail test) since alternative hypothesis is of greater than type. The level of significance for right
tail test at 95% is 1.645.
p−P
Now the test statistic Z =
S. E ( p )
PQ 0.25 × 0.75
S.E(p) = = = 0.04
n 120
0 .167 − 0 .25
∴ Z = = –2.276 < 1.645
0.04
The calculated Z value is less than the table value of Z.
So, the statistic Z falls in the acceptance region. So we need not reject the null hypothesis.
Hence we conclude that the social worker believes true.
254 L Probability and Statistics L
EXER CISE 7
EXERCISE .3
7.3
1. A coin is tossed 1000 times and the head comes out 550 times. Can the deviation from
expected value be due to fluctuations of sampling? [Ans. Coin is unbiased]
2. A machine is producing bolts of which a certain fraction is defective. A random sample of
400 is taken from a large batch and is found to certain 30 defective bolts. Does this
indicate that the proportion of defectives is larger than that claimed by the manufacturer
where the manufacturer claims that only 5% of his products are defective? Find 95%
confidence of the proportion of defective bolts in batch.
[Ans. H 0 rejected at 5% level and limits are 0.07136 and 0.02865]
3. A politician claims that she will receive 60% of the votes in an upcoming election. The
results of a simple random sample of 100 voters showed that 50 of those sampled will vote
for her. Test the politician’s claim at the 0.05 level of significance.
[Ans. H 0 rejected at 5% level]
4. An auditor claims that 10% of customer’s ledger accounts are carrying mistakes of posting
and balancing. A random sample of 600 was taken to test the accuracy of posting and
balancing and 45 mistakes were found. Are these sample results consistent with the claim of
the auditor? Use 5% level of significance. [Ans. H 0 rejected at 5% level]
5. The full-time student body of a college is 50% men and 50% women. Suppose an
introductory chemistry class contains 30 men and 20 women. Does this sample provide
sufficient evidence at the 0.05 significance level to reject the hypothesis that the proportions
of male and female students who take this course are the same as in the general student
body? [Ans. H 0 rejected at 5% level]
6. A sales clerk in the department store claims that 60% of the shoppers entering the store
leave without making a purchase. A random sample of 50 shoppers showed that 35 of them
left without buying anything. Are these sample results consistent with the claim of the sales
clerk? Use 5%level of significance. [Ans. H0 accepted at 5% level]
S . E .( p1 − p 2 ) = pq
FG 1 +
1 IJ
Hn1 n2 K
where p = the pooled estimate of the actual proportion in the population. The value of p is
obtained as follows:
n p + n 2 p2 x + x2
p = 1 1 or p= 1
n1 + n 2 n1 + n 2
where q = 1 – p
Then we can use the test statistic
p1 − p2
Z =
S . E .( p1 − p2 )
p1 − p 2
∴ Z =
pq
FG 1 + 1 IJ
Hn n K
1 2
L Hypothesis Testing L 255
If | Z | < 1.96 (5% level of significance), the difference is regarded as due to random
sampling variation, i.e., as not significant.
The confidence limits for ‘P 1 – P2’ are then given by (p 1 – p 2) ± Z α/2 S.E(p1 – p2) i.e.,
Example 7.26. In a sample of 600 men from a certain city, 450 men are found to be
smokers. In a sample of 900 from another city, 450 are found to be smokers. Do the data
indicate that two cities are significantly different with respect to prevalence of smoking
habits among men?
Solution. Here we are given for one city
450
n1 = 600, p1 = proportion of smokers = = 0.75
600
Also for another city,
450
n2 = 900, p2 = proportion of smokers = = 0.5
900
n1 p1 + n 2 p2 450 + 450 900
∴ p = = = = 0.6
n1 + n 2 600 + 900 1500
and q = 1 – p = 1 – 0.6 = 0.4
Let us take the hypothesis that there is no significant difference in the smoking habits of two
cities i.e., H0 : P1 = P2 and H1 : P1 ≠ P2.
Using the Z-statistic as follows:
p1 − p 2
Z =
pq
FG 1 + 1 IJ
Hn n K
1 2
Z = p1 − p 2
pq
FG 1 + 1 IJ
Hn n K
1 2
Z = p1 − p 2
pq
FG 1 + 1 IJ
Hn n K
1 2
p1 − p 2 0 .38 − 0 .33
Now Z = =
pq
FG 1 + 1 IJ 0 .36 × 0 .64
F1 + 1I
Hn n K
1 2
H 150 100 K
= 0.806 < 1.645
Calculated Z value (= 0.806) is less than the table value of Z (1.645) at 0.05 level of
significance. So we need not reject the null hypothesis. Hence we conclude that there is no
significant difference between the allopathy and homeopathy treatment i.e., the reason to believe
that allopathy is better than homeopathy at 0.05 level of significance is rejected.
Example 7.32. On the basis of their total scores, 200 candidates of a civil service
examination are divided into two groups, the upper 30% and the remaining 70%. Consider
the first question of the examination. Among the first group 40 had correct answer,
whereas the second group, 80 had the correct answer. On the basis of these results, can one
conclude that the first question is not good at discriminating ability of the type being
examined here ?
Solution. From the given data, n1 = 30% of 200 = 60 and n2 = 70% of 200 = 140
40 80
∴ p1 = = 0.667, p2 = = 0.57
60 140
Let the null hypothesis H0 : P1 = P2
and the Alternative Hypothesis H1 : P1 ≠ P2 (use two tail test)
n1 + p1 + n 2 + p2
∴ p =
n1 + n 2
60 × 0.667 + 140 × 0.57
=
60 + 140
= 40 + 80 = 0.6
200
120
= = 0.6
200
and q = 1 – 0.6 = 0.4
S.E(p1 – p2) = pq
FG 1 +
1 IJ = 0.6 × 0.4 F1 + 1I
Hn1 n2 K H 60 140 K
= 0.0756
p1 − p 2
∴ The test statistic | Z | = S . E ( p1 − p 2 )
0 .667 − 0 .571
=
0 .0750
= 1.269 < 1.96
260 L Probability and Statistics L
Since the calculated the value of Z is less than the table value of Z, so we need not reject
the null hypothesis.
Hence we conclude that there is no significant difference between the two proportions, i.e.,
the first question is good enough in discriminating the ability of the candidates of both groups.
Example 7.33. A study shows that 16 of 200 tractors produced on one assembly line
required extensive adjustments before they could be shipped, while the same was true for
14 of 400 tractors produced on another assembly line. At the 0.01 level of significance, does
this support the claim that the second production line does not superior work.
Solution. Given n1 = 200, n2 = 400
x1 16
The observed proportion of first assembly line p1 = n = 200 = 0.08
1
x 14
The observed proportion of the first assembly line p2 = 2 = = 0.035
n 2 400
n1 p1 + n 2 p2
p =
n1 + n 2
200 × 0.08 + 400 × 0.035
=
200 + 400
16 + 14
=
200 + 400
= 0.05 then q = 0.95
From the given data, to test p1 ≥ p2 i.e. p1 ≤ p2
Let Null hypothesis H0 : P1 = P2
Alternative hypothesis H1 : P1 < P2 (use left tailed test)
0.08 − 0 .035
∴ Z =
0 .95 × 0 .05
F1 + 1I
H 200 400 K
= 2.37 > –2.33
The calculated value of Z is greater than the table value of Z(–2.33). So we need not reject
the null hypothesis. Hence we conclude that P1 = P2 i.e., do not support the claim that the second
production line is not superior to first production line.
EXER CISE 7
EXERCISE .4
7.4
1. In a random sample of 1000 persons from town A, 400 are found to be rice eaters and in a
sample of 800 persons from town B, 400 are found to be rice eaters. Do these data reveal a
significant difference between two towns as far as the proportion of rice consumption is
concerned? [Ans. H0 accepted at 5% level]
2. A machine puts out 10 defective articles in a sample of 200. After overhauling it produced
4 defectives in a sample of 100. Has the machine improved? [Ans. H0 accepted at 5% level]
3. In two large population, there are 30% and 25% people of blue eyed respectively. Is this
difference likely to be hidden in the sample of 1200 and 900 respectively from the two
populations? [Ans. H 0 rejected at 5% level]
L Hypothesis Testing L 261
4. Test the significance of the difference between proportions from the following data:
Sample I 100 24
Sample II 300 48
5. In a referendum submitted to the student body at a university 850 men and 566 women
voted. 530 of the men and 304 of the women voted yes. Does this indicate a significant
difference of opinion on the matter, at the 1% level, between men and women students?
[Ans. H 0 rejected at 5% level]
6. In a random sample of 500 persons from Delhi, 200 are to be consumers of Cheese. In a
sample of 400 from Noida, 200 are found to be consumers of cheese. Discuss the question
whether the data reveal a significant difference between Delhi and Noida as far as the
proportion of cheese consumers is concerned. [Ans. H 0 rejected at 5% level]
7.7.5. Hypothesis Testing for Difference between Two Population Standard Deviations
If s1 and s2 are the standard deviations of two independent samples, then under the null
hypothesis H0 : σ1 = σ2, i.e., the sample standard deviations do not differ significantly, the
statistic
s1 − s 2
Z =
σ 12 σ2
+ 2
2 n1 2 n 2
where σ1 and σ2 are population standard deviations.
When population standard deviation is not known, then
s1 − s 2
Z =
s12 s2
+ 2
2 n1 2 n 2
SOLVED EXAMPLES
SOLVED
Example 7.34. The mean yield of two sets of plots and their variability are as given
below. Examine (i) whether the difference in the mean yields of two sets of plots is
significant and (ii) whether the difference in the variability in yields is significant.
x1 − x 2 1258 − 1243
Under H0, Z = =
σ 12 σ 22 ( 34 ) 2 (28 ) 2
+ +
n1 n2 40 60
15
= = 2 .315
6.478
As the calculated value of | Z | = 2.315 > 1.96 the significant value of Z at 5% level of
significance, H0 is rejected i.e., there is a significant between mean yields of two sets of plots.
(ii) H0 : There is no differences between the variability of two sets of plots i.e., H0 : σ 1 = σ 2 ;
H1 : σ1 ≠ σ2 (two tailed test)
σ1 − σ 2 34 − 28
Under H0, Z = =
σ 12 σ2 ( 34 ) 2 (28 ) 2
+ 2 +
2 n1 2 n 2 80 120
6
= = 1.31
4.580
As the calculated value of | Z | = 1.31 < 1.96 the significant value of Z at 5% level of
significance, H0 is accepted i.e., there is no significant between the variability of two sets of
plots.
Example 7.35. The mean produce of rice of a sample of 50 fields is 200 lb. per acre
with a standard deviation of 10 lb. Another sample of 75 fields gives the mean at 220 lb
with a standard deviation of 12 lb. Assuming the standard deviation of the mean field at
11 lb. for the universe, find at 1% level if the two results are consistent.
Solution. Here n1 = 100 n2 = 150
σ 1 = 100 σ 2 = 12
x1 = 200 x 2 = 220
x1 − x 2 84 − 81
Under H0, Z = =
σ 12 σ 22 (10 ) 2 (12 ) 2
+ +
n1 n2 121 81
= 0.1859
As the calculated value of | Z | = 0.1859 < 1.96 the significant value of Z at 5% level of
signifance, H0 is accepted i.e., sample means do not differ significantly.
(ii) H0 : There is no differences between the variability of two samples i.e., H0 : σ1 = σ2
H1 : σ1 ≠ σ2 (two tailed test)
σ1 − σ 2 10 − 12
Under H0, Z = =
σ 12 σ 22 (10 ) 2 (12 ) 2
+ +
2 n1 2 n2 242 162
= –1.7526
As the calculated value of | Z | = 1.7526 < 1.96 the significant value of Z at 5% level of
significance, H0 is accepted i.e., there is no significant between the variability of two samples.
EXER CISE 7
EXERCISE .5
7.5
1. In a survey of incomes of two classes of workers, two random samples gave the following
details:
Examine whether the standard deviations are significant? [Ans. H0 accepted at 5% level]
2. Random samples drawn from two countries gave the following data relating to the heights
of adult males:
264 L Probability and Statistics L
India America
Is the difference between the standard deviations significant? [Ans. H0 accepted at 5% level]
3. The yield of wheat in a random sample of 1000 farms in a certain area has S.D. of 192 kg.
Another random sample of 1000 farms gives a S.D. of 224 kg. Are the standard deviations
significantly different? [Ans. H 0 rejected at 5% level]
n (Variance)
∴ σ 2s =
n −1
1 Variance
or σs = .
n n −1
Also ( n − 1) σ 2s = n (Variance)
t-distribution
with v = 15
t-distribution
with v = 5
Fig. 7.2.
3. It is bell shaped curve just like a normal curve with its tail a little higher above the
abscissa than the normal curve. Its spread increases as degree of freedom ‘k’ increases.
This means that for the same value of t-variate and, the normal variate, the area beyond
t is larger than the area beyond x as is shown in the Fig. 7.2.
4. t-distribution has only one parameter k, the degree of freedom.
5. The constants of t distribution are as follows:
Mean = 0 for k ≥ 2
k
Variance σ = for k ≥ 3
2
k−2
6. The area under t-distribution curve for t < t0 is determined by the equation
∞
P (t < t 0 ) = z
−∞
f (t ) dt
266 L Probability and Statistics L
Students and other readers need not integrate actually for the area as the tables of area
under the curve for different values of t are available and vice versa. (see table for
student’s ‘t’ distribution)
7. t-distribution to normal distribution as k increases. For practical purposes, t is taken as
equivalent to the normal distribution provided k ≥ 30. t-distribution has tremendous
utility in testing of hypothesis about one population mean or about equality of two
population means when standard deviation of population is not known.
aim is to test
H0 : There is no significant difference the sample mean x and the population mean µ i.e.,
H0 : µ = µ 0
We use the test statistic
x −µ
t = ~ tn −1 where x is the mean of the sample
s/ n
2 1
and s = ∑ ( x − x ) 2 with degree of freedom (n – 1)
n −1
The table giving the value of t required for significance at various levels of probability
and for different degree area called t-tables which are given in statistical tables by Fishers
and Yates. The computed value is compared with the tabulated value at 5% or 1% levels of
significance and at (n – 1) degree of freedom and accordingly the null hypothesis is accepted
or rejected.
x − tα . s / n < µ < x + tα . s / n
SOLVED EXAMPLES
SOLVED
Example 7.37. A random sample of size 20 from a normal population has mean 42
and standard deviation of 5. Test the hypothesis that the population mean is 45. Use 5%
level of significance.
Solution. Here n = 20, x = 42, µ = 45, s = 5
H0 : There is no significant difference between the sample mean and population mean, i.e.,
µ = 45.
H1 : µ ≠ 45 (two tailed test)
x −µ 42 − 45
Under H0, t = = = − 2 .683
s/ n 5 / 20
The tabulated value of t at 5% level for 19 degree of freedom is t0.05 = 2.09
As the calculated value of | t | = 2.683 > t0.05 for 19 degree of freedom, H0 is rejected i.e.,
there is significant difference between the sample mean and population mean.
Example 7.38. The average breaking strength of steel rods is specified to be 18.5
thousand kg. For this a sample of 14 rods was tested. The mean and standard deviation
obtained were 17.85 and 1.955, respectively. Test the significance of the deviation
Solution. Here n = 14, x = 17.85, µ = 18.5, s = 1.955
H0 : There is no significant deviation in the breaking strength, i.e., µ = 18.5
H1 : µ ≠ 18.5 (two tailed test)
x −µ 17.85 − 18.5
Under H0, t = = = − 1.24
s/ n 1.955 / 14
The tabulated value of t at 5% level for 13 degree of freedom is t0.05 = 2.16.
As the calculated value of | t | = 1.24 < t 0 . 05 for 13 degree of freedom, H0 is accepted i.e., there
is no significant deviation in the breaking strength.
Example 7.39. The nine items of a sample had the following values:
45, 47, 50, 52, 48, 47, 49, 53, 51.
Does the mean of nine items differ significantly from the assumed population mean of
47.5.
Solution. Since n = 9 (<30), we use t-test. We have
268 L Probability and Statistics L
2
x d = x – A d
45 –4 16
47 –2 4
50 1 1
52 3 9
48 –1 1
47 –2 4
49 0 0
53 4 16
51 2 4
1 55
∴ x = A + ∑ d = 49 + 1 = 49.11
n 9
and s2 =
1 LM
∑d2 −
(∑ d )2 OP = 1 F 55 − 1 I = 6.86
n −1 N n Q 8 H 9K
⇒ s = 6.86 = 2 .62
H0 : There is no significant difference between the mean of the population from which the
sample is drawn is 47.5, i.e., µ = 47.5
H1 : µ ≠ 47.5 (two tailed test)
x −µ 49.11 − 47.5
Under H0, t = = = 1.85
s/ n 2 .62 / 9
The tabulated value of t at 5% level for 8 degree of freedom is t0.05 = 2.31
As the calculated value of | t | = 1.85 < t0.05 for 8 degree of freedom, H0 is accepted i.e.,
there is no there is no significant difference between the mean of the population from which the
sample is drawn is 47.5.
Example 7.40. A drug manufacturer has installed a machine which automatically fills
5 gm of drug in each phial. A random sample of fills was taken and it was found to contain
5.02 gm on an average in a phial. The standard deviation of the sample was 0.002 gms.
Test at 5% level of significance if the adjustment in the machine is in order.
Solution. Here n = 10, x = 5.02, µ = 5, s = 0.002
H0 : The adjustment in the machine is in order, i.e., µ = 5.
H1 : µ ≠ 5 (two tailed test)
x −µ 5.02 − 5
Under H0, t = = = 33.33
s/ n 0.002 / 10
The tabulated value of t at 5% level for 9 degree of freedom is t0.05 = 2.26
As the calculated value of | t | = 33.33 > t0.05 for 9 degree of freedom, H0 is rejected i.e.,
the adjustment in the machine is not in order.
L Hypothesis Testing L 269
Example 7.41. The lifetime of electric bulbs for a random sample of 10 from a large
consignment gave the following data:
Item 1 2 3 4 5 6 7 8 9 10
Life in ‘000 hours 4.2 4.6 3.9 4.1 5.2 3.8 3.9 4.3 4.4 5.6
Can we accept the hypothesis that the average lifetime of bulb is 4000 hours?
Solution. H0 : There is no significant difference in the sample mean and population mean,
i.e., µ = 4000 hrs.
H1 : µ ≠ 4000 hrs (two tailed test)
Applying t-test:
x 4.2 4.6 3.9 4.1 5.2 3.8 3.9 4.3 4.4 5.6
x−x –0.2 0.2 –0.5 –0.3 0.8 –0.6 –0.5 –0.1 0 1.2
∑ x 44
As x = = = 4.4
n 10
∑( x − x )2 3.12
and s = = = 0 .589
n −1 9
x − µ 4.4 − 4
Under H0, t = = = 2.123
s 0 .589
n 10
The tabulated value of t at 5% level for 9 degree of freedom is t0.05 = 2.26
As the calculated value of | t | = 2.123 < t0.05 for 9 degree of freedom, H0 is accepted i.e.,
the average life time of bulbs could be 4000 hrs.
Example 7.42. A machine is designed to produce insulating washers for electrical
devices of average thickness of 0.025 cm. A random sample of 10 washers was found to
have a mean thickness of 0.024 cm with a standard deviation of 0.002 cm. Test the
significance of the deviation at 5% level.
Solution. Given n = 10, µ 0.025; x = 0.024 and s = 0.002
Let the null hypothesis H0 : µ = 0.025 against the
Alternative hypothesis H1 : µ ≠ 0.025 (is of ≠ type)
Level of significance α = 0.05 = 2.26 for 9 d.f.
EXER CISE 7
EXERCISE .6
7.6
x1 − x 2
t =
1 1
S +
n1 n 2
where x1 is the mean of the first sample, x 2 is the mean of the second sample, and
1
S2 = ∑ ( x1 − x1 ) 2 + ∑ ( x 2 − x 2 ) 2
n1 + n 2 − 2
n1 s12 + n 2 s 22
=
n1 + n 2 − 2
The significance of t for (n1 + n2 – 2) d.f. is tested in the same as discussed in the previous
sections.
SOLVED EXAMPLES
SOLVED
Example 7.43. Two salesmen A and B are working in a certain district. From a sample
survey conducted by the head office, the following results were obtained. State whether
there is any significant difference in the average sales between the two salesmen:
272 L Probability and Statistics L
A B
No. of sales 20 18
Average sales (in Rs.) 170 205
Standard deviation (in Rs.) 20 25
Solution. Here n1 = 20, n2 = 18
x1 = 170, x 2 = 205
s1 = 20, s2 = 25
H0 : There is no significant difference in the average sales between the two salesmen
1
∴ ∑ ( x1 − x1 ) 2 ⇒
2
20 = ∑ ( x1 − x1 ) 2 = 8000
20
Similarly,
1
∑ ( x 2 − x 2 ) 2 ⇒ ∑ ( x 2 − x 2 ) 2 = 11250
2
25 =
18
1
Now, S =
2
∑ ( x1 − x1 ) 2 + ∑ ( x 2 − x 2 ) 2
n1 + n 2 − 2
1 19250
= (8000 + 11250 ) = = 534 .72
20 + 18 − 2 36
∴ S = 534.72 = 23.12
Under H0,
t = x1 − x 2
1 1
S +
n1 n 2
170 − 205 35
= = − 23.12 × 0.3248
1 1
23.12 +
20 18
35
= − = − 4.66
7.5050
As the calculated value of | t | = 4.66 > t0.05 for 36 degree of freedom, H0 is rejected i.e.,
there is a significant difference in the average sales between the two salesmen.
Example 7.44. The mean life of a random sample of 10 light bulbs was found to be
1456 hours with a S.D. of 423 hours. A second sample of 17 bulbs chosen at random from
a different batch showed a mean life of 1280 hours with a S.D. of 398 hours. Is there a
significant difference between the mean life of the two batches?
Solution. Here n1 = 10, n2 = 17
x1 = 1456, x 2 = 1280
s1 = 423, s2 = 398
H0 : There is no significant difference in the mean life of bulbs of the two batches
L Hypothesis Testing L 273
1
Now,
2
S = ∑ ( x1 − x1 ) 2 + ∑ ( x 2 − x 2 ) 2
n1 + n 2 − 2
n1 s12 + n 2 s 22 1
= 10 + 17 − 2 (10 × 423 + 17 × 398 )
2 2
=
n1 + n 2 − 2
4482158
= = 179286.32
25
∴ S = 179286.32 = 423.42
x1 − x 2 1456 − 1280
Under H0, t = =
1 1 1 1
S + 423.42 +
n1 n 2 10 17
= 176
423.42 × 0 .3985
176
= = 1.04
168.69
As the calculated value of | t | = 1.04 < t0.05 for 25 degree of freedom which is 2.06, H0
is accepted i.e., there is no significant difference in the mean life of bulbs of the two batches.
Example 7.45. Below are given the gain of weights (in lbs.) of lions on two diet X
and Y:
Diet X 25 32 30 32 24 14 32
Diet Y 24 34 22 30 42 31 40 30 32 35
Test at 5% level of significance whether the two diets differ significantly in increasing
weight.
Solution. H0 : The two means do not differ significantly
∑ x i 189 ∑ x 2 320
We have x1 = = = 27 and x2 = = = 32
n 7 n 10
Diet X Diet Y
x1 x1 − x1 ( x1 − x1 ) 2 x2 x2 − x2 ( x2 − x2 )2
25 –2 4 24 –8 64
32 –5 25 34 2 4
30 3 9 22 –10 100
32 5 25 30 –2 4
24 –3 9 42 10 100
14 –13 169 31 –1 1
32 5 25 40 8 64
30 –2 4
189 266 35 3 9
320 350
274 L Probability and Statistics L
1
Now,
2
S = ∑ ( x1 − x1 ) 2 + ∑ ( x 2 − x 2 ) 2
n1 + n 2 − 2
1
= ( 266 + 350 ) = 616 = 41.066
7 + 10 − 2 15
∴ S = 41.066 = 6.408
x1 − x 2 27 − 32 5
Under H0, t = = =−
1 1 1 1 6.408 × 0.4928
S + 6 .408 +
n1 n 2 7 10
3
= − = − 1.583
3.157
As the calculated value of | t | = 1.583 < t0.05 for 15 degree of freedom which is 2.13, H0 is
accepted i.e., the two means do not differ significantly.
Example 7.46. Two laboratories A and B carry out independent estimates of fat-
content in ice-cream made by a firm. A sample is taken from each batch, halved, and the
separate halves sent to the two laboratories. The fat-content (in grams) obtained by the
laboratories are recorded below:
Batch No. 1 2 3 4 5 6 7 8 9 10
Lab. A 7 8 7 3 8 6 9 4 7 8
Lab. B 9 8 8 4 7 7 9 6 6 6
Is there a significant difference between mean fat content obtained by the two laboratories
A and B?
Solution. H0 : There is no significant difference between the mean fat content obtained by
the two laboratories, A and B.
∑ x1 67 ∑ x 2 70
We have x1 = = = 6 .7 and x 2 = = = 7. 0
n 10 n 10
Lab. A Lab. B
x1 X1 = x 1 – A 1 X12 = ( x1 − A1 ) 2
x2 X 2 = x 2 − A2 X 22 = ( x 2 − A2 ) 2
A1 = 8 A2 = 8
7 –1 1 9 1 1
8 0 0 8 0 0
7 –1 1 8 0 0
3 –5 25 4 –4 16
8 0 0 7 –1 1
6 –2 4 7 –1 1
9 1 1 9 1 1
4 –4 16 6 –2 4
7 –1 1 6 –2 4
8 0 0 6 –2 4
67 –13 49 70 –10 32
L Hypothesis Testing L 275
2 1
∑ X 2
−
LM
( ∑ X1 ) 2
+ ∑ X 2
−
(∑ X2 )2 OP
Now, S = n +n −2 1 2
1 2 N
n1 n1 Q
( −13) 2 ( −10 ) 2
49 − + 32 −
= 10 10
10 + 10 − 2
54.1
= = 3.009
18
∴ S = 3.009 = 1.732
x1 − x 2
Under H0, t =
1 1
S +
n1 n 2
6.7 − 7 0. 3
= =
1 1 1.732 × 0 .4472
1.732 +
10 10
0.3
= − = − 0.387
0.7745
As the calculated value of | t | 0.387 < t0.05 for 18 degree of freedom which is 2.10, H 0
is accepted i.e., the mean fat contents obtained by two laboratories A and B do not differ
significantly.
EXER CISE 7
EXERCISE .7
7.7
1. Strength tests carried out on samples of two yarns spun to the same count gave the following
results:
The strengths are expressed in pounds. Is the difference in mean strengths significant of the
sources from which the samples are drawn? [Ans. H0 accepted at 5% level]
2. The mean weekly sale of the Cadbury’s chocolate bar in a chain of candy stores was 146.3
bars per store. After an advertising campaign the mean weekly sales in 22 stores for a
typical week increased to 153.7 and showed a standard deviation of 17.2. Is the evidence
conclusive that the advertising was successful? You are given that for 21 degree of freedom,
the value of t is 2.08 at 5% level of significance. [Ans. H0 accepted at 5% level]
3. Two independent samples of 8 and 7 items respectively had the following values:
Sample I 9 11 13 11 15 9 12 14
Sample II 10 12 10 14 9 8 10
6. The heights of six randomly chosen sailors are in inches: 63, 65, 68, 69, 71 and 72. Those
of 10 randomly chosen soldiers are 61, 62, 65, 66, 69, 70, 71, 72 and 73. Discuss the light
that these data throw on the suggestion that sailors are on the average taller than soldiers.
[Ans. H0 accepted at 5% level]
7. Samples of two types of electric bulbs were tested for lengths of life and the following data
were obtained:
Type I Type II
Number in the sample 8 7
Mean of the sample (in hours) 1134 1024
Standard deviation of the sample (in hours) 35 40
8. Eight pots growing three wheat plants each were exposed to a high tension discharge, while
nine similar pots were enclosed in an earthen wire case. The number of tillers in each pot
were as follows:
Caged 17 26 18 25 27 28 26 23 17
Electrified 16 16 22 16 21 18 15 20
See whether electrification exercises any real effect of the average tillers at 5% level of
significance. [Ans. H 0 rejected at 5% level]
∑ d1 n LM (∑ d ) 2 OP
∑
1 1
Then we compute d = and S 2 = (di − d ) 2 = ∑d2 −
n n − 1 i =1 n −1N n Q
The test statistic for paired observation is defined by the following formula
t = |d |
S
n
where n is the number of pairs of difference.
SOLVED EXAMPLES
SOLVED
t = |d |
S
n
Score before the course Score after the course Difference (di) d2
44 53 9 81
40 38 –2 4
61 69 8 64
52 57 5 25
32 46 14 196
44 39 –5 25
70 73 3 9
41 48 7 49
67 73 6 36
72 74 2 4
53 60 7 49
72 78 6 36
60 578
278 L Probability and Statistics L
∑ di 60
∴ d = = =5
n 12
and
2
s =
1 LM
∑d2 −
(∑ d ) 2 OP
n −1 N n Q
=
1 FG
578 −
( 60 ) 2 IJ = 1 (278) = 25.27
11 H 12 K 11
⇒ s = 5.026
5 17.32
∴ t = = = 3.446
5.026 5.026
12
As the calculated value of | t | = 3.446 > t0.05 for 11 degree of freedom which is 2.20, H0 is
rejected i.e., the course has improved performance.
Example 7.48. You are given the marks obtained by 11 students in two tests one
before and other after special coaching. Do the data reveal that special coaching is effective?
Score (Before Coaching) 23 20 19 21 18 20 18 17 23 16 19
Score (After Coaching) 24 19 22 18 20 22 20 20 23 20 17
Solution. Let us take the null hypothesis that there is no improvement due to coaching.
Applying the paired t-test:
t = |d |
S
n
2
Score before coaching Score after coaching Difference (di) d
23 24 1 1
20 19 –1 1
19 22 3 9
21 18 –3 9
18 20 2 4
20 22 2 4
18 20 2 4
17 20 3 9
23 23 0 0
16 20 4 16
19 17 –2 4
11 61
L Hypothesis Testing L 279
∑ d i 11
∴ d = = =1
n 11
and
2
s =
1 LM
∑d2 −
(∑ d ) 2 OP
n −1 N n Q
1 F
61 −
(11) 2 I = 1 (50) = 5
=
10 GH 11 JK 10
⇒ s = 2.236
1 3.316
∴ t = = = 1.483
2 .236 / 11 2 .236
As the calculated value of | t | = 1.483 < t0.05 for 10 degree of freedom which is 2.23, H0 is
accepted i.e., the coaching is effective.
Example 7.49. A certain stimulus when administered to each of the 12 patients resulted
in the following increase of blood pressure:
5, 2, 8, –1, 3, 0, –2, 1, 5, 0, 4 and 6
Can it be concluded that the stimulus when will, in general, be accompanied by an
increase in blood pressure ?
Solution. Let us take the null hypothesis that there is no significant difference in blood
pressure before and after administrating the stimulus, i.e., stimulus is effective.
Now,
d = 5 2 8 –1 3 0 –2 1 5 0 4 6
2
d = 25 4 64 1 9 0 4 1 25 0 16 36
∑ d 31
∴ d = = = 2 .58
n 12
and
2
S =
1 LM
∑d2 −
(∑ d ) 2 OP
n −1 N n Q
=
1 FG
185 −
( 31) 2 IJ
11 H 12 K
1
= (104.9 ) = 9.538
11
⇒ S = 3.088
2 .58
∴ t =
3.088 / 2
8.937
= = 2 .894
3.088
As the calculated value of | t | = 2.89 > t 0.05 for 11 degree of freedom which is 2.20,
H0 is rejected i.e., the stimulus will not be accompanied by an increase of blood pressure.
280 L Probability and Statistics L
Example 7.50. IQ test was administered to 5 persons before and after they were
trained. The results are given below:
IQ (Before training) 110 120 123 132 125
∴ ∑ d 10
d = = =2
n 5
and
2
S =
1 LM
∑d2 −
(∑ d ) 2 OP
n −1 N n Q
=
1FG
140 −
(10 ) 2 IJ = 1 (120) = 30
4 H 5 K 4
⇒ S = 5.477
2 4.472
∴ t = = = 0 .816
5 .447 / 5 5.477
As the calculated value of | t | = 0.816 < t0.05 for 4 degree of freedom which is 2.78, H0 is
accepted i.e., there is no change in IQ after the training programme.
EXER CISE 7
EXERCISE .8
7.8
1. A certain stimulus when administered to each of the 9 patients resulted in the following
increase of blood pressure: 7, 3, –1, 4, –3, 5, 6, –4, and 1
Can it be concluded that the stimulus when will, in general, be accompanied by an increase
in blood pressure? (Given for 8 d.f., t0.05 = 2.31). [Ans. H0 accepted at 5% level]
2. Fit and Fine Health Club has been advertising a rigorous programme for body conditioning.
The club claims that after 1 month in the programme, the average participant should be
L Hypothesis Testing L 281
able to do at least eight more push-ups in 2 minutes than he or she could do at the start.
Does the random sample of ten programme participants given below support the club’s
claim? Use the 0.05 level of significance. [Ans. H0 accepted at 5% level]
3. The following data show weekly production for 10 employees before change and after
change in the production technique.
Employee A B C D E F G H I J
Before change 24 26 20 21 23 30 32 25 23 23
After change 26 26 22 22 24 30 32 26 24 25
Test whether there is any significant in average production due to the changes in the
production technique. [Ans. H 0 rejected at 5% level]
4. The sales data of an item in six shops before and after a special promotional campaign are:
Shops A B C D E F
Can the campaign be judged to be a success? Test at 5% level of significance. Use paired
t-test. The significant value of t for the left tail test at 5% level for 5 degrees of freedom is
2.57. [Ans. H 0 rejected at 5% level]
5. Ten persons were appointed in an electrical position in an office. Their performance was
noted by giving a test and the means recorded out of 50. They were given 6 month’s
training and again they were given a test and marks were recorded out of 50.
Employees A B C D E F G H I J
Before training 25 20 35 15 42 28 26 44 35 48
After training 26 20 34 13 43 40 29 41 36 46
By applying the t-test can it be concluded that employees have benefited by the training?
(You are given for 9 d.f., t0.05 = 2.26) [Ans. H0 accepted at 5% level]
6. The following table gives the additional hours of sleep gained by 10 patients in an
experiment to test the effect of a drug. Do these data give evidence that the drug produces
additional hours of sleep?
Patients 1 2 3 4 5 6 7 8 9 10
Hours gained 0.7 0.1 0.2 1.2 0.1 3.4 3.7 0.8 3.8 2.0
[Ans. H0 accepted at 5% level]
7. A physical instructor claims that a particular exercise if done continuously for 7 days
reduces weight by 15 kgs. Five over weight girls did the exercise for 7 days and their
weights were observed as:
Girls 1 2 3 4 5
r n−2
t =
1 − r2
is a t variate with (n – 2) degrees of freedom and thus we test the hypothesis accordingly.
SOLVED EXAMPLES
SOLVED
∴ 0.42
× n−2 > 2.72
(1 − ( 0.42 ) 2
0.42
or × n−2 > 2.72
0.908
L Hypothesis Testing L 283
2 .72 × 0.908
or n−2 > = 5 .88
0.42
or n – 2 > (5.88)2 = 34.57
or n = 36.57 or 37
Hence we should include 37 observations.
EXER CISE 7
EXERCISE .9
7.9
Husbands Age 23 27 28 29 30 31 33 35 36 39
Wives Age 18 22 23 24 25 26 28 29 30 32
∑ ∑
1 1
Let S12 = ( x i − x ) 2 , S22 = ( y − y )2
n1 − 1 i =1 n 2 − 1 i =1 i
284 L Probability and Statistics L
The F-statistic is defined by the relation
S12
F = where S12 > S22
S 22
Numerator should always be more than denominator. In case S22 > S12 then we have
S 22
F =
S12
In the first case, we say that F has (n1 – 1, n2 – 1) degrees of freedom and in the second
case, we say that F has (n2 – 1, n1 – 1) degrees of freedom.
Form the above, it is concluded that the greater of the two values S12 and S22 is taken in the
numerator while calculating F.
The calculating value of F is compared with the table value for (n1 – 1, n2 – 1) or (n2 – 1,
n1 – 1) as the case may be at 5% or 1% level of significance. If calculated value of F is greater
than the table value then the F ratio is considered significant and the null hypothesis is rejected.
On the other hand, if the calculated value of F is less than the table value the null hypothesis
is accepted and it is inferred that both the samples have come from the population having the
same variance.
Assumptions
1. Independent random samples are drawn from each of two normal populations
2. The populations for each sample must be normally distributed
3. The variability of the measurements in the two populations is same and can be measured
by a common variance σ , i.e., σ 12 = σ 22 = σ 2 .
2
4. The ratio of σ 12 to σ 22 should be greater than or equal to 1 since larger value from
SOLVED EXAMPLES
SOLVED
Example 7.53. The time taken by workers in performing a job by method I and
method II is given below:
Method I 20 16 26 27 23 22
Method II 27 33 42 35 32 34 38
Do the data show that the variances of time distribution from population from which
these samples are drawn do not differ significantly ?
n∑ i 6
x = 134 = 22 .3 and y = ∑ y i = 241 = 34.4
1 1 1 1
n1 = 6 and n2 = 7; x=
n 7
L Hypothesis Testing L 285
Computation:
x y (xi − x ) ( xi − x )2 ( yi − y ) ( yi − y ) 2
∑ ( xi − x )2 = 81.34; ∑ ( yi − y ) 2 = 133.72
∑ ( x i − x ) 2 81.34
S12 = = = 16.26 and
n1 − 1 5
∑ ( y i − y ) 2 133.72
S22 = = = 22.29
n2 − 1 6
Let the null hypothesis H 0 : σ 12 = σ 22 (there is no significant difference between the two
variances) against the
Alternative hypothesis H1 : σ 12 ≠ σ 22 (there is a significant difference between the two
variances)
Level of significance α = 0.05, F(6, 5) at 0.05 = 4.95
S 22 22 .29
Now the test statistic F = = = 1.37
S12 16.26
The calculated value of ‘F’ is less than the table value of ‘F’ at 5% level of significance.
So we need not reject the null hypothesis H0. Hence we conclude that there is no significant
difference between the two variances at 0.05 level of significance.
Example 7.54. In a sample of 8 observations, the sum of square of deviations from
means is 94.5. In other sample of 10 observations, the sum of deviations from mean is
101.7. Test whether there is a significant difference of variance.
Solution. Let us take the hypothesis that the two workers are equally stable, i.e.,
H 0 : σ 12 = σ 22 .
We have n1 = 8 and n2 = 10
∑ ( x − x ) 2 = 94.5 and ∑ ( y − y ) 2 = 101.7
1 94.5
∴ S12 = ∑(x − x )2 = = 13.5
n1 − 1 8 −1
1 101.7
and S22 = ∑(y − y )2 = = 11.3
n2 − 1 10 − 1
286 L Probability and Statistics L
S12 13.5
Hence F = = = 1.195
S 22 11.3
As the calculated value of F = 1.94 < F0.05(7.9) which is 3.29, H0 is accepted i.e., the two
samples represent the same variance.
Example 7.55. A plant has installed two machines producing polythene bags. During
the installation, the manufacturer of the machine has started that the capacity of the
machine is to produce 20 bags in a day. Owing to various factors such as different
operators working on these machines, raw material, etc. there is a variation in the number
of bags produced at the end of the day. The company researcher has taken a random
sample of bags produced in 10 days for machine 1 and 13 days for machine 2, respectively.
The following data gives the number of units of an item produced on a sampled day by
the two machines:
Machine I 20 16 26 27 23 22 18 24 25 19
Machine II 27 33 42 35 32 34 38 28 41 43 30 37
How can the researcher determine whether the variance is from the same population
(population variance are equal) or it comes from different populations (population variance
are not equal)? Use 5% level of significance.
Solution. Let us take the hypothesis that there is no significant difference between the
production capacity of the two machines, i.e., H 0 : σ 12 = σ 22
Applying F-test:
S 22
F =
S12
Machine I Machine II
x ( x − x )2 y (y − y )2
20 4 27 65
16 36 33 4
26 16 42 49
27 25 35 0
23 1 32 9
22 0 34 1
18 16 38 9
24 4 28 49
25 9 41 36
19 9 43 64
30 25
220 120 37 4
420 314
L Hypothesis Testing L 287
220 420
x = = 22 and y = = 35
10 12
1
Now S12 = ∑( x − x )2
n1 − 1
120
= = 13.33
10 − 1
1 314
and S 22 = ∑(y − y )2 = = 28.55
n2 − 1 12 − 1
28.55
∴ F = = 2 .14
13.33
As the calculated value of F = 2.14 < F0.05(11.9) which is 3.16, H0 is accepted i.e., there is
no significant difference between the production capacity of the two machines. The results
obtained by the sample may be due to chance.
Example 7.56. Most individuals are aware of the fact that the average annual repair
costs for an automobile depends on the age of the automobile. A researcher is interested
in finding out whether the variance of the annual repair costs also increases with the age
of the automobile. A sample of 25 automobiles that are 4 years old cost of Rs. 850 and a
sample of 25 automobiles that are 2 years old showed a sample variance for the annual
repair costs of Rs. 300. Test the hypothesis that the variance in annual repair costs is more
for the older automobiles, for a 0.01 level of significance.
Solution. Let us take the hypothesis that there is no significant difference in the variance
of repair cost, i.e., H0 : σ 12 = σ 22
We have n1 = 25 and n2 = 25
S12 = 850 and S22 = 300
S12 850
∴ F = = = 2 .833
S 22 300
As the calculated value of F = 2.833 > F0.01(24, 24) which is 2.66, H0 is rejected i.e., there
is no significant difference in the variance of repair cost.
Example 7.57. The daily wages (in Rs.) of workers in two cities are as follows:
Size of the sample Standard deviation of wages
City A 22 2.9
City B 16 3.8
Test at 5% level, the equality of variances of the wage distribution in the two cities.
Solution. Let us take the hypothesis that there is equality of variances of the wage distribution
in the two cities, i.e., H 0 : σ 12 = σ 22
We have n1 = 22 and n2 = 16
s1 = 2.9 and s2 = 3.8
288 L Probability and Statistics L
As we have,
n1 22
S12 = s2 = ( 2 .9 ) 2 = 8.81
n1 − 1 1 21
n2 16
S 22 = n − 1 s 2 = 15 ( 3.8 ) = 15.40
2 2
and
2
Applying F-test:
S 22 15.40
F = = = 1.75
S12 8 .81
As the calculated value of F = 1.75 < F0.05(15, 21) which is 2.18, H0 is accepted i.e., there
is equality of variances of the wage distribution in the two cities.
Example 7.58. The following figures relate to the number of units of an item produced
per shift by two workers A and B for a number of days
A 16 17 18 19 20 21 22 24 26 29
B 19 22 23 25 26 28 29 30 31 32 35 36
Can it be inferred that worker A is more stable compared to worker B? Give your
answer using F-test at 5% level of significance. [Use F0.05(11, 9) = 3.16]
Solution. Let us take the hypothesis that the two workers are equally stable, i.e., H 0 : σ 12 = σ 22
S 22
Applying F-test F =
S12
Worker A Worker B
x ( x − x )2 y (y − y )2
16 25 19 81
17 16 22 36
18 9 23 25
19 4 25 9
20 1 26 4
21 0 28 0
22 1 29 1
24 9 30 4
26 25 31 9
27 36 32 16
35 49
210 126 36 64
336 298
210 336
x = = 21 and y = = 28
10 12
L Hypothesis Testing L 289
S12 = 1
Now, ∑( x − x )2
n1 − 1
126 1
= = 14 and S 22 = ∑(y − y )2
10 − 1 n2 − 1
298
= = 27.09
12 − 1
S12 27 .09
∴ F = = = 1.94
S22 14
As the calculated value of F = 1.94 < F0.05(11.9) which is 3.16, H0 is accepted i.e., the two
workers are equally probable.
EXER CISE 7
EXERCISE 7..10
1. Given the following information about two sample from two normal populations,
n1 = 9 , s1 = 1.97, n 2 = 7 , s 2 = 3.21.
Can it be concluded that both the samples have come from populations having the same
variability? [Ans. H0 accepted at 5% level]
2. The students of the same age group from two different management schools were compared
for variability in their statistical skill. A random sample of 25 students from one management
school has a variance of 16 marks while a random sample of 22 students from the other
management school has variance of 8 marks. Examine if the difference in variability is
significant. [Ans. H0 accepted at 5% level]
3. Two random samples were drawn from normal population and their values are:
A 66 67 75 76 82 84 88 90 92
B 64 66 74 78 82 85 87 92 93 95 97
Test whether the two populations have the same variance at the 5% level?
[Ans. H0 accepted at 5% level]
4. One sample of 10 bulbs gives a standard deviation of 9 hours of life and another sample of
11 bulbs gives a standard deviation of 10 hours of life. Can you say the variances are
different at 1% level of significance? [Ans. H0 accepted at 5% level]
5. Two bottle filling plants are supposed to fill 5 litres of water in each bottle. A researcher
has taken a random sample of 10 bottles from Plant I and 15 bottles from Plant II. The data
collected are provided in the table below:
Plant I 5.1 5.2 5.2 5.2 5.3 5.4 5.3 4.9 4.8 4.9
Plant II 4.9 4.8 4.7 5.1 5.2 5.3 5.4 4.9 4.8 5.1 5.2 4.8 4.9 5.1 5.2
How can the researcher determine whether the variance is from the same population or it
come from the different populations? Take 5% as the level of significance
[Ans. H0 accepted at 5% level]
6. It is known that the mean diameter of a steel pipe produced by two processes, A and B, is
practically the same but the standard deviation may differ. For a sample of 22 pipes produced
by A, the standard deviation is 2.9 m, while for a sample of 16 pipes produced by B; the
standard deviation is 3.8 m. Test whether the pipe produced by process A have the same
variability as those of process B. [Ans. H0 accepted at 5% level]
290 L Probability and Statistics L
T2
Correction Factor = , where N is the number of observations.
N
(iii) Calculate the sum of the squares of all the values (or observations) in K samples and
T2
subtract the correction factor from this sum. This result gives the total sum of the
N
squares of the deviations SST.
SST = ∑ X1 + ∑ X 2 + ... ∑ X K
2 2 2
Thus,
(iv) Find the square of the sum of each sample and divide each such squared value by the
number of values in the corresponding sample and then calculate the total of all the
results thus obtained and subtract the correction factor from this total. This final result
gives the sum of the squares of deviations between the samples. Thus,
SSB =
LM (∑ X )
1
2
+
(∑ X2 )2
+ ... +
(∑ X K ) 2 OP − T
2
N n 1 n2 nK Q N
(v) Calculate SSW, i.e., the sum of the squares within the samples by subtracting SSB from
SST. Thus, SSW = SST – SSB = SSE.
MSB
(vi) Calculate MSB, MSW: F =
MSW
MSC
=
MSE
where SSC = Sum of square of columns; SSR = Sum of square of rows and SSE = Sum of square
of residual error.
SOLVED EXAMPLES
SOLVED
20 18 25
21 20 28
23 17 22
16 25 28
20 15 32
Total: 100 Total: 95 Total: 135
X1 = 20 X 2 = 19 X 3 = 27
294 L Probability and Statistics L
20 + 19 + 27 66
Combined mean of all the samples ( X ) = = = 22
3 3
Variance between samples
To obtain variation between samples calculate the square of deviations of the various samples
from the combined mean. The mean of machine A is 20 and the combined mean is 22. So we
will take the difference 20 and 22 and square it. Similarly for machine B the mean is 19 but
the combined mean is 22 and so we take the difference between 19 and 22 and square it and
so for machine C. Thus we have the following table.
( X1 − X ) 2 ( X2 − X )2 ( X3 − X )2
4 9 25
4 9 25
4 9 25
4 9 25
4 9 25
Total: 20 Total: 45 Total: 125
20 0 18 1 25 4
21 1 20 1 28 1
23 9 17 4 22 25
16 16 25 36 28 1
20 0 15 16 32 25
Total: 26 Total: 58 Total: 56
T2
Correction Factor, C = where T is the grand total of values in all of the samples, N the
N
total number of values.
330 × 330
C = = 7260
15
∴ Sum of squares between machines (SSC)
100 2 + 95 2 + 135 2
= − 7260
5
= 37250 − 7260 = 7450 − 7260 = 190
5
and Total Sum of squares of deviations (SST)
F 20 + 21 + 23 + 16 + 20 + 18
2 2 2 2 2 2
+ 20 2 + 17 2 + 25 2 + 15 2 I − 7260
= GH + 25 + 28 + 22 + 28 + 32
2 2 2 2 2 JK
= 7590 – 7260 = 330
∴ Sum of squares within machines (SSE) = SST – SSC = 330 – 190 = 140
Remaining solution is same as in the first method.
Example 7.60. The following figures relate to the number of units sold in five different
areas by four salesmen.
Solution. Let us take the hypothesis that there is no significant difference in the performance
of the four salesmen.
Using analysis of variance technique we have the following table.
A B C D
X1 X2 X3 X4
80 100 95 70
82 110 90 75
88 105 100 82
85 115 105 88
75 90 80 65
Total: 410 Total: 520 Total: 470 Total: 380
X1 = 82 X 2 = 104 X 3 = 94 X 3 = 76
82 + 104 + 94 + 76 356
Combined mean of all the samples ( X ) = = = 89
4 4
Variance between samples
We have the following table.
A B C D
( X1 − X ) 2 ( X2 − X )2 ( X3 − X )2 ( X4 − X )2
49 225 25 169
49 225 25 169
49 225 25 169
49 225 25 169
49 225 25 169
∴ Sum of squares between samples (SSC) = 245 + 1125 + 125 + 845 = 2340
A B C D
X1 ( X1 − X1 ) 2 X2 ( X2 − X2 ) 2 X3 ( X3 − X3 ) 2 X4 ( X4 − X 4 ) 2
80 4 100 16 95 1 70 36
82 0 110 36 90 16 75 1
88 36 105 1 100 36 82 36
85 9 115 121 105 121 88 144
75 49 90 196 80 196 65 121
Total: 98 Total: 370 Total: 370 Total: 338
Between Samples
2340
(Column Means) 2340 3 MSC = = 780
3
Within Samples MSC 780
1176 F= = = 10.61
(Errors) 1176 16 MSC = = 73.5 MSE 73.5
16
Total 19
As the calculated value of F = 10.61 > F0.05(3, 16) which is 3.24, H0 is rejected i.e., there
is a there is a significant difference in the efficiency of the four salesmen.
Example 7.61. It is desired to compare three hospitals with regards to the number of
deaths per quarter. A sample of death records were selected from the records of each
hospital and the number of deaths was as given below. From these data, suggest a difference
in the number of deaths per quarter among three hospitals:
Solution. Let us take the hypothesis that there is no significant difference in the number
of deaths per quarter among the three hospitals.
298 L Probability and Statistics L
Using analysis of variance technique we have the following table.
X1 = 10 X2 = 8 X 3 = 12
10 + 8 + 12 30
Combined mean of all the samples ( X ) = = = 10
3 3
Variance between Samples
Thus we have the following table
( X1 − X ) 2 ( X2 − X )2 ( X3 − X )2
0 4 4
0 4 4
0 4 4
0 4 4
0 4 4
X1 ( X1 − X1 ) 2 X2 ( X2 − X2 ) 2 X3 ( X3 − X3 ) 2
8 4 7 1 12 0
10 0 5 9 9 9
7 9 10 4 13 1
14 16 9 1 12 0
11 1 9 1 14 4
Total: 30 Total: 16 Total: 14
L Hypothesis Testing L 299
∴ Sum of squares within machines (SSE) = 30 + 16 + 14 = 60
All the above results can be tabulated as follows:
ANOVA TABLE
As the calculated value of F = 4 > F0.05 (2, 12) which is 3.89, H0 is rejected i.e., the
difference is significant and we conclude that the data suggests a difference in the number of
deaths per quarter among the three hospitals.
Example 7.62. Set up two-way ANOVA table for the following per hectare yield for 4
varieties of wheat on 3 plots:
T 2 60 2
Correction factor, C = = = 300
N 12
Sum of squares between columns (SSC)
15 2 + 14 2 + 15 2 + 16 2
= − 300 = 0.67
3
Sum of squares between rows (SSR)
19 2 + 18 2 + 23 2
= − 300 = 3.5
4
300 L Probability and Statistics L
and Total Sum of squares of deviations (SST)
= (32 + 62 + 62 + 42 + 42 + 62 + 62 + 52 + 42 +
62 + 32 + 72) – 300
= 320 – 300 = 20
∴ Errors sum of squares (SSE) = SST – (SSR + SSC) = 15.83
All the above results can be tabulated as follows:
ANOVA Table for two-way classification
0.67
Between Column SSC = 0.67 3 MSC = = 0.22
3 0.22
F1 = = 0.08
2 .64
3.50
Between Rows SSR = 3.50 2 MSR = = 1.75
2 1.75
F2 = = 0.663
2 .64
15.83
Errors (Residual) SSE = 15.83 6 MSE = = 2 .64
6
Example 7.63. To study the performance of three detergents and three water
temperatures, the following whitness readings were obtained with specially designed
equipment.
Water Detergent
Temperature A B C Total
Cold water 57 55 67 179
Warm water 49 52 68 169
Hot water 54 46 58 158
Total: 160 Total: 153 Total: 193 Total: 506
T 2 506 2
Correction factor, C = = = 28448 .44
N 9
L Hypothesis Testing L 301
Sum of squares between columns (SSC)
86258
= − 28448.44 = 304.22
3
Sum of squares between rows (SSR)
85566
= − 28448.44 = 73.56
3
and Total Sum of squares of deviations (SST)
= (57 2 + 49 2 + 54 2 + 55 2 + 52 2 + 46 2 + 67 2 + 68 2 + 58 2 ) − 28448.44
= 28888 – 28448.44 = 439.56
∴ Error sum of squares (SSE) = SST – (SSR + SSC) = 61.78
All the above results can be tabulated as follows:
ANOVA TABLE for two-way classification
304.22
Between Column SSC = 304.22 2 MSC = = 152 .11 152 .11
2 F1 = = 9.85
15.44
73.56
Between Rows SSR = 73.56 2 MSR = = 36.78
2 36.78
F2 = = 2 .38
15.44
61.78
Errors (Residual) SSE = 61.78 4 MSE = = 15.44
4
(i) As the calculated value of F1 = 9.85 > F0.05(2, 4) which is 6.94, H0 is rejectred i.e., the
difference between varieties is significant.
(ii) As the calculated value of F2 = 2.38 < F0.05(2, 4) which is 6.94, H0 is accepted i.e., the
water temperature does not make a significant difference.
Example 7.64. The following data represent the number of units of a product by 3
different workers using 3 different types of machines.
Workers Detergent A Detergent B Detergent C
X 8 32 20
Y 28 36 38
Z 6 28 14
302 L Probability and Statistics L
Test (i) whether the mean productivity is the same for the different machine types, and
(ii) whether the three workers differ with respect to mean productivity.
Solution. Let us take the hypothesis that
(i) the mean productivity is same for the three different machines
(ii) the workers differ with regard to mean productivity.
Workers Machines
A B C Total
X 8 32 20 60
Y 28 36 38 102
Z 6 28 14 48
488
Between Column SSC = 488 2 MSC = = 244 244
2 F1 = = 9.38
536 26
Between Rows SSR = 536 2 MSR = = 268
2 268
F2 = = 10.31
104 26
Errors (Residual) SSE = 104 4 MSE = = 26
4
L Hypothesis Testing L 303
(i) As the calculated value of F1 = 9.28 > F0.05(2, 4) which is 6.94, H0 is rejected i.e., the
mean productivity is not same for the three different machines.
(ii) As the calculated value of F2 = 10.31 > F0.05(2, 4) which is 6.94, H0 is rejected i.e., the
Three workers differ with regard to mean productivity.
EXER CISE 7
EXERCISE 7..11
I II III IV
9 13 19 14
11 12 13 10
13 10 17 13
9 15 7 17
8 5 9 16
Schools
A B C D
8 12 18 13
10 11 12 9
12 9 16 12
8 14 6 16
7 4 8 15
[Ans. H0 accepted at 5% level]
9. Four salesmen were posted in different areas by a company. The number of units of
commodities X sold by them are as follows:
A 20 23 28 29
B 25 32 30 21
C 23 28 35 18
D 15 21 19 25
304 L Probability and Statistics L
On the basis of his information can it be concluded that there is a significant difference in
the performance of the four salesmen? (Given for ν1 = 3 and ν2 = 12, F0.05 = 3.24)
[Ans. H0 accepted at 5% level]
10. Bakewell Biscuits Pvt. Ltd. has launched a new brand in the four metros, Delhi, Mumbai,
Kolkata, and Chennai. After one month, the company realizes that there is a difference in
the retail price per pack of biscuits across cities. Before the launch, the company has
promised its employees and newly appointed retailers that the biscuits would be sold at a
uniform price in the country. The difference in the price can tarnish the image of the
company. In order to make a quick inference, the company collected data about the price
from six randomly selected stores across the four cities. Based on the sample information,
the price per pack of the biscuits (in rupees) is given below:
Use one-way ANOVA to analysis the significant difference in the prices. Take 5% as the
level of significance. [Ans. H 0 rejected at 5% level]
11. A farmer applies three types of fertilizers on four separate plots. The figure on yield per
acre are tabulated below:
Fertilizers Yield
A B C D
Nitrogen 6 4 8 6
Potash 7 6 6 9
Phosphates 8 5 10 9
Find out if the plots are materially different in fertility, as also, if the three fertilizers make
any material difference in yields. [Ans. H0 accepted at 5% level]
12. A certain company has four salesmen A, B, C and D each of whom was sent for a month to
three districts, area country side ‘K’, outskirts of a city ‘O’ and shopping centre of city ‘S’.
the sale in Rs. is given below:
Districts Salesmen
A B C D
K 30 70 30 30
O 80 50 40 70
S 100 60 80 80
Hospitals Drugs
A B C D
I 19 8 23 8
II 10 9 12 6
III 11 13 13 10
Seasons Drugs
A B C D
Summer 36 36 21 35
Winter 28 29 31 32
Monsoon 26 28 29 29
Carry out an analysis of variance and test whether there is any significant difference in the
salesmen and in the seasons, so far as sales are concerned.
[Ans. H 0 accepted at 5% level for sales of salesmen as well as season]
2
χ TEST)
7.13. CHI-SQUARE TEST (χ
x−µ
If ‘x’ is normally distributed with mean µ and standard deviation σ then z = is a
2 σ
standard normal variate with mean 0 and variance 1, then z =
2 FG x − µ IJ is a Ch-square variate
H σ K
with 1 degree of freedom. Chi-square variate is denoted by the symbol χ .
2
If x1, x2, x3, ..., xn are n independent normal variates with means µ1, µ2, µ3, ..., µ n and
standard deviations σ1, σ2, σ3, ..., σn respectively, then
2 2 2 2
χ =
FG x − µ IJ
1 1
+
FG x − µ IJ + FG x − µ IJ
2 2 3 3
+ ...
FG x − µ IJ
n n
H σ K 1 H σ K H σ K
2 3 H σ K n
2
=
n
Fx −µ
∑ GH σ i i IJ
i =1 i K
is a Chi-square variate with n degrees of freedom.
306 L Probability and Statistics L
It is computed on the basis of frequencies in a sample and is applied only for qualitative
data such as intelligence, colour, immunity, health, response to drug, etc.
If a random variable X has a chi-square distribution with n degrees of freedom, we write
X − χ 2( n ) and its probability density function is given by:
1
f (x) = e − x /2 x n/2 − 1 ; 0 ≤ x < ∞
2 Γ (n / 2 )
n/2
where n is a parameter of the distribution which is a positive integer, also indicated as degrees
of freedom.
2
χ ) DISTRIBUTION
7.14. CHARACTERISTICS OF CHI-SQUARE (χ
1. Chi-square is always positively skewed i.e. χ value is always positive
2
freedom.
4. The mean of the distribution is the number of degree of freedom.
5. The value of χ lies between zero and infinity, i.e., 0 ≤ χ 2 < ∞.
2
6. The sum of two χ distribution is again a χ distribution, i.e., if χ 1 and χ 2 are two
2 2 2 2
2 d.o.f.
3 d.o.f.
1 d.o.f.
4 d.o.f. 5 d.o.f.
6 d.o.f.
Fig. 7.4.
2
8. Chi-square (χ ) is a statistic hypothesis and not a parameter.
2
χ)
7.15. USES OF CHI-SQUARE (χ
The χ test is very powerful test for testing the hypothesis of a number of statistical
2
Chi-square test enables us to determine the degree of deviation between observed frequencies
L Hypothesis Testing L 307
and the theoretical frequencies and to conclude whether the deviation between observed
(experimental) frequencies and expected (theoretical) frequencies is due to error of
sampling or due to chance.
or not. For example, suppose a researcher brought male and female participants into the
lab and asked them which color they prefer blue or green. The researcher believes that
color preference may be related to gender. Notice that both gender (male, female) and
color preference (blue, green) are categorical variables. If there is an association between
gender and color preference, we would expect that the proportion of men who prefer
blue would be different than the proportion of women who prefer blue. To determine
if an association exists between gender and color preference, the chi-square test computes
the distributions across the combination of your two factors that you would expect if
there were no association between them.
2
χ ) TEST
7.16. CONDITIONS FOR APPLYING CHI-SQUARE (χ
1. Every observation of the sample for this test should be independent of all other
observations.
2. The expected frequency of any item should not be less than 5
3. The total number of observations used for the test should be large i.e., n ≥ 50.
4. Chi-square is wholly dependent on degree of freedom
5. This test is used only for drawing inferences by testing hypothesis. It cannot be used for
estimation of parameter or any other value.
6. The frequencies used in χ should be absolute and relative in terms.
2
defined as
308 L Probability and Statistics L
n
(Oi − Ei ) 2
χ = ∑
2
i =1
Ei
where ∑Oi = ∑Ei = N (total frequency) and degrees of freedom = v = n – 1
Let the null hypothesis H0 be that there is no significance difference between the observed
(i.e. experimental) values and the corresponding expected (or theoretical) values. If calculated
value of χ2 is less than the tabulated value of χ2 at 5% level of significance, the fit is considered
to be good, i.e., the divergence between actual and expected frequencies is attributed to fluctuations
of simple sampling. If the calculated value of χ2 is greater than the tabulated value, the fit is
considered to be poor.
Remark
1. If χ = 0, the observed and theoretical frequencies agree exactly.
2
SOLVED EXAMPLES
SOLVED
Example 7.65. The following table gives the number of aircraft accidents that occurred
during the various days of the week. Find whether the accidents are uniformly distributed
over the week. (Given that value of χ2 at 5% level of significance for 6 d.f. is 12.59)
Days Sun. Mon. Tue. Wed. Thu. Fri. Sat. Total
No. of accidents 14 16 8 12 11 9 14 84
Solution. Taking the hypothesis that the accidents are uniformly distributed over the week,
the expected frequency for each day
Total Frequency 84
= = = 12
No. of Days 7
n
(Oi − Ei ) 2
∴ χ = ∑
2
Ei ,
i =1
is insufficient evidence to conclude that the four movies will not be equally popular. Hence we
can conclude that the four movies are equally popular.
Example 7.71. Records taken of the number of male and female births in 800 families
having four children are as follows:
No. of male births 0 1 2 3 4
No. of female births 4 3 2 1 0
No. of families 32 178 290 236 94
312 L Probability and Statistics L
Test whether the data are consistent with the hypothesis that the binomial law holds
1
and the chance of male birth is equal to that of female birth, namely p = q = .
2
Solution. Taking the hypothesis that male and female births are equally probable, i.e.,
1
p=q= where p is the probability of male birth and q be the probability of female birth. The
2
expected number of families would be obtained by the expansion of
n r n – r
f (r) = N. Cr p q
where N is the total frequency, f (r) is the number of families with r male children.
0 4−0
∴ f (0) = 800 . 4 C 0 FH 1 IK FH 1 IK = 50;
2 2
1 4 −1
f (1) = 800 . 4 C1 FH 1 IK FH 1 IK = 200
2 2
2 4−2
f (2) = 800 . 4 C 2 FH 1 IK FH 1 IK = 300;
2 2
3 4−3
f (3) = 800 . 4 C 3 FH 1 IK FH 1 IK = 200
2 2
4 4−4
f (4) = 800 . 4 C 4 FH 1 IK FH 1 IK = 50
2 2
(32 − 50) 2 (178 − 200) 2 (290 − 300) 2 (236 − 200) 2 (94 − 50) 2
∴ χ =
2
+ + + +
50 200 300 200 50
= (6.48 + 2.42 + 0.333 + 6.48 + 38.72) = 54.433.
As the calculated value of χ2 = 54.433 > χ24,0.05 which is 9.488, H0 is rejected i.e., the chance
of a male birth is not equal to that of a female birth.
Example 7.72. The following is the distribution of the hourly number of trucks arriving
at a company’s warehouse.
Trucks arriving per hour 0 1 2 3 4 5 6 7 8
Frequency 52 151 130 102 45 12 5 1 2
Fit a Poisson distribution and test for goodness of fit at the 5% level of significance.
Solution. Taking the hypothesis that Poisson fit is a good fit to the data, the mean of
Poisson distribution is given by
Σ fx
m =
Σf
= 1010 = 2.02.
500
L Hypothesis Testing L 313
By Poisson distribution the frequency of r successes is
−m mr
f (r) = N × e .
r!
where N is the total frequency, f (r) is the number of trucks arriving per hour.
(2.02 ) 0
∴ f (0) = 500 × e −2 .02 . = 500 × 0.132 = 66
0!
1
f (1) = 500 × e −2 .02 . (2.02 ) = 500 × 0.132 × 2.02 = 133.3 ≈ 134
1!
2
f (2) = 500 × e −2 .02 . (2.02 ) = 134.6 ≈ 135
2!
3
f (3) = 500 × e −2 .02 . (2.02 ) = 90.63 ≈ 91
3!
4
f (4) = 500 × e −2 .02 . (2.02 ) = 45.77 ≈ 46
4!
5
f (5) = 500 × e −2 .02 . (2.02 ) = 18.5 ≈ 19
5!
6
f (6) = 500 × e −2 .02 . (2.02 ) = 6.23 ≈ 6
6!
7
f (7) = 500 × e −2 .02 . (2.02 ) = 1.79 ≈ 2
7!
8
f (8) = 500 × e −2 .02 . (2.02 ) = 0.45 ≈ 1
8!
n
(Oi − Ei ) 2
∴ χ = ∑
2
, where Oi = Observed frequency and Ei = Expected frequency.
i =1
Ei
Expected frequencies are rounded up and adjusted so that frequency should be 500. Since
the frequency should be greater than or equal to 10. Last four values are added to the above
value. There are two linear constraints, one is for total and the other is for calculating m from
the observed values.
In this case,
(52 − 66) 2 (151 − 134) 2 (130 − 135) 2 (102 − 91) 2 (45 − 46) 2 (20 − 28) 2
χ =
2
+ + + + +
66 134 135 91 46 28
SOLVED EXAMPLES
SOLVED
Example 7.73. From the following table regarding the colour of eyes of father and
son, test if the colour of son’s eye is associated with that of the father
n
(Oi − Ei ) 2
∴ χ
2
= ∑ Ei ,
i =1
that the colour of son’s eye and the colour of father’s eye are independent i.e. there is no
association between the colour of the father’s and that of the son’s eyes.
Drug 104 20 40
Sugar pills 88 24 52
(Given χ 20.05 for 2 d.f. = 5.99).
316 L Probability and Statistics L
Solution. Let us take the hypothesis that drug is no better than sugar pills for curing colds
i.e., the two attributes are independent.
The above information can be arranged in the form of a 2 × 3 contingency table as follows :
Observed frequencies (O)
n
(Oi − Ei ) 2
∴ χ
2
= ∑ Ei
i =1
conclude that the result of the experiment does not provide any evidence against the hypothesis.
Therefore, drug is no better than sugar pills in curing colds.
Example 7.75. In an experiment on immunization of cattle from tuberculosis the
following results were obtained:
Affected Unaffected
Inoculated 12 28
Not inoculated 13 7
that the result of the experiment does not support the hypothesis. Therefore, inoculation is
effective in preventing the disease.
Example 7.76. Prove that the value of χ for the 2 × 2 contingency table
2
a b
c d
is given by
N ( ad − bc ) 2
χ2 = where N = a + b + c + d.
(a + b) (c + d ) ( a + c) (b + d )
Solution. Using the hypothesis of independence of attributes, we have
Observed Frequencies (O)
a c a + c
b d b + d
a + b c + d N = a + b + c + d
Expected frequency for each cell has been calculated by using the formula
Row total × Column total
=
Grand total
(a + c)(a + b)
∴ E11 =
N
(a + c)(c + d )
E12 =
N
(b + d )(a + b)
E21 =
N
(b + d )(c + d )
E22 =
N
(a + c) (a + b) a(a + b + c + d ) − (a + c)(a + b) ad − bc
Now, a – E11 = a − = =
N N N
1
∴ (a − E11 ) 2 = (ad − bc) 2
N2
Similarly we can verify that
1
(b − E12 ) 2 = (ad − bc) 2 ,
N2
L Hypothesis Testing L 319
(c – E21)2 = 1
(ad − bc) 2 ,
N2
1
(d – E22)
2
= (ad − bc) 2
N2
n
(Oi − Ei ) 2
∴ χ
2
= ∑ Ei
i =1
χ
2
=
(ad − bc) 2 LM 1 + 1 + 1 + 1 OP
N 2
NE E E E Q
11 12 21 22
=
(ad − bc) 2 LMRS 1 + 1 UV + RS 1 + 1 UVOP
N NT (a + b) (c + a) (a + b) (b + d) W T (a + c) (c + d) (b + d) (c + d) WQ
=
(ad − bc) 2 LM b + d + a + c + b+d+a+c OP .
N N (a + b ) (c + a ) ( b + d ) (a + b ) ( a + c )( b + d )(c + d ) Q
Example 7.77. The following table gives for a sample of married women, the level of
education and the marriage adjustment score:
Marriage Adjustment
Level of Education Very Low Low High Very High Total
College 24 97 62 58 241
High school 22 28 30 41 121
Middle school 32 10 11 20 73
Total 78 135 103 119 435
Can you conclude from the above data that the higher the level of education, the
greater is the degree of adjustment in marriage?
Solution. Let us take the hypothesis that there is no association between the level of
education and adjustment in marriage i.e., the two attributes are independent.
The above information can be arranged in the form of a 3 × 4 contingency table as follows :
Observed Frequencies (O)
Marriage Adjustment
(30 − 29) 2 (41 − 33) 2 (32 − 13) 2 (10 − 23) 2 (11 − 17) 2 (20 − 20) 2
+ + + + + +
29 33 13 23 17 20
= 8.40 + 6.45 + 0.44 + 0.97 + 0 + 2.19 + 0.03 + 1.94 + 27.77 + 7.35 + 2.12 + 0
= 57.66.
No. of degrees of freedom (ν) = (r – 1)(c – 1) = (3 – 1)(4 – 1) = 6
As the calculated value of χ = 57.66 > χ 6, 0.0.5 which is 12.59, H0 is rejected i.e., we
2 2
conclude that higher the level of education, the greater is the degree of adjustment in marriage.
Example 7.78. In a sample survey of public opinion, answer to the questions
(i) Do you drink?
(ii) Are you in favour of local option on sale of liquor? are tabulated below:
Question (i)
Yes No Total
Yes 56 31 87
Question (ii) No 18 6 24
Total 74 37 111
Can you infer whether or not the local option on the sale of liquor is dependent on
individual drink?
Solution. Let us take the hypothesis that the option on the sale of liquor is independent
or not associated with individual drinking.
The above information can be arranged in the form of a 2 × 2 contingency table as follows :
Observed frequencies (O)
Question (i)
Yes No Total
Yes 56 31 87
Question (ii) No 18 6 24
Total 74 37 111
Expected frequency for each cell has been calculated by using the formula:
Row total × Column total
=
Grand total
87 × 74
∴ E11 = = 58,
111
322 L Probability and Statistics L
87 × 37
E12 = = 29
111
74 × 24
E21 = = 16,
111
74 × 37
E22 = =8
111
Expected frequencies (E)
Question (i)
Yes No Total
Yes 56 31 87
Question (ii) No 18 6 24
Total 74 37 111
n
(Oi − Ei ) 2
∴ χ 2
= ∑ Ei ,
i =1
conclude that the sale of liquor is independent or not associated with the individual drinking.
Example 7.79. A movie producer is bringing out a new movie. In order to map out
his advertising campaign he wants to determine whether the movie will appeal most to a
particular age group or whether it will appeal equally to all age groups. The producer
takes a random sample from persons attending a preview of the movie, and obtains the
following results. Use test to derive the conclusion.
Age group
Under 20 20-39 40-59 60 and above
Linked the movie 320 80 110 200
Disliked the movie 50 15 70 60
Indifferent 30 5 20 40
(Given χ 0.05 for 6 d.f. = 12.59)
2
Solution. Let us take the hypothesis that the new movie appeal equally to people of
different age groups.
The above information can be arranged in the form of a 3 × 4 contingency table as follows :
L Hypothesis Testing L 323
Observed Frequencies (O)
Age group
Under 20 20-39 40-59 60 & above Total
Linked the movie 320 80 110 200 710
Disliked the movie 50 15 70 60 195
Indifferent 30 5 20 40 95
Total 400 100 200 300 1000
Expected frequency for each cell has been calculated by using the formula :
Row total × Column total
=
Grand total
710 × 400
∴ E11 = = 284,
1000
710 × 100
E12 = = 71,
1000
710 × 200
E13 = = 142,
1000
710 × 300
E14 = = 213
1000
195 × 400
E21 = = 78,
1000
195 × 100
E22 = = 19.5,
1000
195 × 200
E23 = = 39,
1000
195 × 300
E24 = = 58.5
1000
95 × 400
E31 = = 38,
1000
95 × 100
E32 = = 9.5,
1000
95 × 200
E33 = = 19,
1000
95 × 300
E34 = = 28.5
1000
Expected Frequencies (E)
Age group
Under 20 20-39 40-59 60 & above Total
Liked the movie 284 71 142 213 710
Disliked the movie 78 19.5 39 58.5 195
Indifferent 38 9.5 19 28.5 95
Total 400 100 200 300 1000
324 L Probability and Statistics L
n
(Oi − Ei )
2
∴ χ
2
= ∑ Ei ,
i =1
where Oi = Observed frequency and Ei = Expected frequency.
In this case,
(320 − 284) 2 (80 − 71) 2 (110 − 142 ) 2 (200 − 213) 2 (50 − 78) 2 (15 − 19.5) 2
χ
2
= + + + + +
284 71 142 213 78 19.5
(70 − 39) 2
(60 − 58.5) 2
(30 − 38) 2
(5 − 9.5) 2
(20 − 19) 2
(40 − 28.5) 2
+ + + + + +
39 58.5 38 9.5 19 20
= 4.5634 + 1.1408 + 7.2113 + 0.7934 + 10.0513 + 1.0385 + 24.6410 + 0.0385
+ 1.6842 + 2.1316 + 0.0527 + 0.0572 + 4.6403
= 57.987.
No. of degree of freedom (ν) = (r – 1)(c – 1) = (3 – 1)(4 – 1) = 6.
As the calculated value of χ = 57.987 > χ 6, 0.05 which is 12.59, H0 is rejected i.e., we
2 2
conclude that the new movie does not appeal equally to people of different age groups.
Example 7.80. From the following data find whether there is any significant liking in
the habit of taking soft drinks among the categories of employees.
Employees
Soft Drinks Clerks Teachers Officers Total
Pepsi 10 25 65 100
Thumps Up 15 30 65 110
Fanta 50 60 30 140
Total 75 115 160 350
n
(Oi − Ei ) 2
The test statistic, χ2 = ∑ Ei
i =1
Hence we conclude that there is a significance difference among the categories of employees,
liking in the habit of taking soft drinks.
EXER CISE 7
EXERCISE 7..12
1. The demand for a particular spare part in a factory was found to vary from day to day as
given below. Test the hypothesis that number of parts demanded does not depend on the
day of the week.
Days Mon. Tue. Wed. Thu. Fri. Sat.
No. of parts demanded 124 125 110 120 126 115
[Ans. χ 2 = 1.68; ν = 5; the demand does not depend on the day of the week]
2. The following table shows the distribution of digits in the numbers chosen at random from
a telephone directory:
Digits 0 1 2 3 4 5 6 7 8 9 Total
Frequency 1026 1107 997 966 1075 933 1107 972 964 853 10000
Test whether the digits may be taken to occur equally frequently in the directory.
[Ans. χ = 58.542; ν = 9; the digits do not occur uniformly in the directory]
2
4. The following data given the number of aircraft accidents that occurred during the various
days of a week
Use 5% level of significance to determine whether the data fits a uniform distribution.
[Ans. χ 2 = 6.65; ν = 5; the no. of violent alterations is uniformly distributed over the month]
7. A survey of 320 families with 5 children revealed the following distribution:
No. of boys 0 1 2 3 4 5
No. of girls 5 4 3 2 1 0
Is this result consistent with the hypothesis that male and female births are equally probable?
[Ans. χ 2 = 7.16; ν = 5; the male and female births are equally probable]
8. The theory predicts that the proportion of beans in the four groups A, B, C and D should be
9:3:3:1. In an experiment it was observed that the number of four groups A, B, C and D are
882, 313, 287 and 118. Does the experiment result support the theory?
(Given that χ2 for 3 d.f. at 5% level is 7.815)
[Ans. χ 2 = 4.7266; n = 3; the experimental result support the theory]
9. A book has 700 pages. The number of pages with various numbers of misprints is recorded
below. At 5% significant level are the misprints distributed according to Poisson law?
[Ans. χ 2 = 38.812; ν = 2; the misprints are not distributed according to Poisson law]
10. Twelve dice were thrown 4096 times and a throw of 6 was considered a success. The
observed frequencies were as given below:
No. of successes 0 1 2 3 4 5 6 7 and over
Frequency 447 1145 1180 796 380 115 25 8
[Ans. χ = 3.76; ν = 5; the dice were unbiased]
2
Test whether the dice were unbiased.
11. Fit a binomial distribution for the following data and also test the goodness of fit.
x 0 1 2 3 4 5 6 Total
f 5 18 28 12 7 6 4 80
[Ans. χ 2 = 6.39; ν = 2; the binomial fit for the given distribution is not satisfactory]
L Hypothesis Testing L 327
12. Fit a Poisson distribution for the following distribution and also test the goodness of fit.
x 0 1 2 3 4 5 Total
Machine A B C D
Production time 1 1 2 3
No. of defectives 12 30 63 98
Clean Dirty
Condition of child Clean 70 50
Fairly clean 80 20
Dirty 35 45
328 L Probability and Statistics L
Use Chi-square test at 5% level of significance to state whether the two attributes are
independent. (Given that χ2 for 2 d.f. at 5% level is 5.99)
[Ans. χ 2 = 25.636; ν = 2; there exist association between the attributes]
18. The following table shows the results of inoculation against cholera in certain tea-estate:
Not attacked Attacked Total
Inoculated 469 31 500
Not inoculated 1315 185 1500
Total 1784 216 2000
Find out whether there is any significant association between inoculation and attack. (Given
that χ for 1 d.f. at 5% level is 3.84)
2
Test the hypothesis at 1% level. (Given that χ = 2.198; ν = 2; the sample come from
2
homogeneous population)
20. The employment bureau located in a city received 200 applications in the month of June,
2011 for registration. A tabular presentation of the applications according to sex and level
of education was found to be as under :
Sex
Male Female
Undergraduates 30 10
Graduates 70 20
Postgraduates 20 50
Do these data provide adequate evidence to indicate that the level of education is related
to sex? Use 5% level of significance. [Ans. χ 2 = 44.4; ν = 2; level of education is related to sex]
21. A survey of radio listeners’ preference for two types of music under various age groups
gave the following information.
Age group
Carnatic music 80 60 90
Indifferent 16 45 132
24. On the basis of information given below about the treatment of 200 patients suffering from
a disease, state whether the new treatment is comparatively superior to the conventional
treatment
25. Given the following contingency table for hair colour and eye colour. Find the value of χ .
2
Hair Colour
Blue 15 5 20 40
Brown 25 15 20 60
Total 60 30 60 150
Age
↓
15-25 65 75 72 212
26-35 60 40 64 164
36-45 45 52 50 147
46-55 55 65 60 180
Blind 21 64 17
Deaf 16 49 14
No disability 29 93 28
GGG