Sampling
Sampling
Sampling is a
popular method to collect the data. The fundamental assumption behind the
sampling method is that if the units of a sample are selected at random, its
characteristics will almost be same as they exists in the universe.
Sample
Importance of sampling
Samples are devices for learning about large masses by observing a few individ-
uals. In fact is that we are living in the age of sampling.
Merits
Demerits
Sampling theory
1
Parameter and statistic
2
2. Type II error
Accepting a null hypothesis when it is false.
True position H0 ) Accepted H0 rejected
H0 is true Correct decision Type I error
H0 is not true Type II error Correct decision
The maximum possibility of type I error is known as level of significance and
it is determined in advance. e.g if level of significance is fixed at 5 %, it means
that there is a possibility of making a type I error in 5 out of 100 cases (rejection
of a true null hypothesis). We can minimize the type I error by reducing the
level of significance. However, controlling the type error, the chances of type II
error (acceptance of null hypothesis) increases.
Critical value The value is obtained from a standard table at a particular
level of significance.
Two tailed test and One tailed test
3
Critical value Zα 1% 5%
Two tailed test |zα | = 2.58 |zα | = 1.96
Right tailes test |zα | = 2.33 |zα | = 1.64
Left tailed test zα = −2.33 zα = −1.64
Example
In the study of mean, the null hypothesis H0 = µ = µ0
Now, the possible alternate hypothesis be
1. H1 : µ 6= µ0 (i.e. µ > µ0 or µ < µ0 ). It is a two tailed test.
2. H1 : µ < µ0 (one tailed test or left sided test).
3. H1 : µ > µ0 (one tailed test or right sided test)
Test of Significance for Single Mean
Under the null hypothesis (H0 ): the sample has been drawn from a population
with mean µ and variance σ 2 , i.e., there is no significant difference between the
sample mean (x) and population mean (µ), the test statistic (for large samples),
is
x−µ
Z=
√σ
n
If the SD of the population is not known, then we use
x−µ
Z=
√s
n
Here, s is the SD of the sample.
Prob: A sample of 900 members has a mean 3.4 cms. Is the sample from a
large population of mean 3.25 cms. and s.d. 2.61 cms. ?
Find the 95 % confidence limits of true mean.
Solution
4
Null hypothesis (H0 ): The sample has been drawn from the population with
mean(µ) = 3.25 cms. and S.D. σ = 2.61 cms.
Here, we are given
x = 3.4cms., n = 900, µ = 3.25cms and σ = 2.61cms.
3.40 − 3.25 0.15 × 30
Z= 2.61 = = 1.73
√
900
2.61
Since |Z| < 1.96, therefore H0 can be accepted at 5 % level of significance.
95 % confidence limits are x ± 1.96 √σn ⇒ 3.40 ± 1.96 × √2.61
900
⇒ 3.40 ± 0.1705
Hence the limits are 3.5705 and 3.2295.
Prob: A sample of 1600 units is found to have a mean of 3.4 cms. Can it be
reasonably regarded as a simple sample from a large population with mean 3.2
cms and SD 2.3 cms.
Solution
Here, n = 1600, µ = 3.2, x = 3.4 and σ = 2.3
H0 : The sample is drawn from a population with mean 3.2 cms.
Now,
|x − µ| |3.4 − 3.2|
|Z| = σ = 2.3 = 3.478
√ √
n 1600
Since |Z| > 3, therefore we will reject H0 .
Problem A population has a mean of 159.7 cms and SD 4.5 cms. How large
a sample would be necessary to make the standard error of the mean less than
or equal to 0.5 cm.
Soultion
Given, n =?, x = 159.7 and σ = 4.5 and SE < 0.5
σ 4.5 √ √
SE = √ ⇒ 0.5 = √ ⇒ 0.5 × n = 4.5 ⇒ n = 9
n n
We get n = 81.
Therefore, the size of the sample is 81 at least.
Solution
H0 : Machine is working properly.
5
Given, n = 100, µ = 2Kg, x = 1.94Kg and s = 0.10Kg
Now,
|x − µ| |1.94 − 2|
|Z| = s = 0.10 =6
√ √
n 100
Since the calculated value of Z is greater than the tabular value, therefore
we reject H0 . Hence, machine is not working properly.
Problem In the past the average length of an outgoing telephone call from a
business office has been 143 seconds. A manager wishes to check whether that
average has decreased after the introduction of policy changes. A sample of 100
telephone calls produced a mean of 133 seconds, with a standard deviation of
35 seconds. Perform the relevant test at the 1 % level of significance.
Solution
H0 : µ = 143 H1 : µ < 143
Given n = 100, x = 133, s = 35
Now,
|x − µ| |133 − 143|
|Z| = s = 35 = 2.85
√ √
n 100
The calculated value of Z is greater than the tabular value hence we reject
H0 . OR
x−µ 133 − 143
Z= = = −2.85
√s √35
n 100
The calculated value of Z is lesser than the tabular value hence we reject H0 .
Problem
The average household size in a certain region several years ago was 3.14
persons. A sociologist wishes to test, at the 5 % level of significance, whether it
is different now. Perform the test using the information collected by the sociol-
ogist: in a random sample of 75 households, the average size was 2.98 persons,
with sample standard deviation 0.82 person.
Solution
H0 : µ = 3.14 H1 : µ 6= 3.14
Given n = 75, x = 2.98, s = 0.82
Now,
|x − µ| |2.98 − 3.14|
|Z| = s = 0.82 = 1.68
√ √
n 75
The calculated value of Z is less than the tabular value hence we accept H0 .
Small sample test or t-test
The t-test used to test the significance of
6
1. The mean of a small sample
2. The difference between the means of two small samples or to compare two
small samples
Test of significance of the mean of small sample
Steps involved
To calculate the significance of sample mean at 5 % level of significance
• H0 : The population mean (µ) is equal to the given value of the mean (i.e.
µ = µ0 ).
x−µ √
(x−µ) n
• calculate t = s or t = s .
√
n
Since the calculated value is less than the tabular value, we accept H0 .
Problem
The height of 9 children selected at random from a given colony had a mean
63.5 cms. and variance 6.25 cms. Test, at 5 % level of significance, the hypoth-
esis that the children of the given colony are on average 65 cms long and not
less than 65 cm. in all. (The value of t for 8 d.f. at 5 % level of significance is
2.262)
Solution
H0 : The average height of the children is 65 cms. or µ = 65. H1 : µ < 65
n = 9, x = 63.5cms., variance = 6.25 (or SD = 2.5) and µ = 65
Using the formula
|x − µ| |63.5 − 65|
t= = = 1.8
√s 2.5
√
n 9
Since the calculated value is less than the tabular value, we accept H0 .
Problem Six boys are selected at random from a school and their marks
in Mathematics found to be 63,63,64,66,60 and 68 out of 100. In the light of
7
these marks discuss the general observations that the mean in Mathematics in
the school were 66. (The value of t for 5 d.f. at 5 % level of significance is 2.571)
Solution
H0 : µ = 66
Marks di = (xi − 64) d2
63 -1 1
63 -1 1
64 0 0
66 2 4
60 -4 16
68 4 16
P P P 2
x = 384 d=0 d = 38
P
xi 384
x= = = 64
n 6
s P s
d2 38
s= = = 2.756
(n − 1) (6 − 1)
Using the formula
|x − µ| |64 − 66|
t= = = 1.777
√s 2.756
√
n 6
Since the calculated value is less than the tabular value, we accept H0 .
χ2 test
Chi-square test is a measurement which
• tell about magnitude of difference between actual or observed frequencies
(fo ) and corresponding theoretical or expected frequencies (fe ).
• explains that whether difference is significant or due to sample fluctuations?
X (f0 − fe )2
2
χ =
fe
Use of Chi-square test
• Test of independence
• Test of goodness of fit
Problem The following figures show the distribution of digits in numbers
chosen at random from a telephone directory) :
8
Digits 0 1 2 3 4 5 6 7 8 9 Total
frequency 1026 1107 997 966 1075 933 1107 972 964 853 10,000
Test whether the digits may be taken to occur equally frequently in the direc-
tory. (The tabular value of χ2 at 5 % level of significance for 9 degree of freedom
is 16.919. )
Problem
In an anti malaria campaign in a certain area, quinine was administered to
812 persons out of the total population of 3,248. The number of fever cases is
given below
Treatment Fever No fever
Quinine 20 792
No Quinine 220 2216
Discuss the usefulness of quinine in checking malaria. The tabular value of
2
χ at 5 % level of significance for 1 degree of freedom is 3.841.
9
Quinine No quinine Total
No fever 792 2216 3008
Fever 20 220 240
Total 812 2436 3248
H0 : Quinine is not effective in treating malaria
Quinine No quinine Total
812×3008 2436×3008
No fever 3248 = 752 3248= 2256 3008
812×240 2436×240
Fever 3248= 60 3248= 180 240
Total 812 2436 3248
X (f0 − fe )2 (792 − 752)2 (20 − 60)2 (2216 − 2256)2 (220 − 180)2
χ2 = = + + +
fe 752 60 2256 180
= 2.128 + 26.667 + 0.709 + 8.889 = 38.393
The calculated value of χ2 is much greater than the tabular value, hence we
reject our null hypothesis.
10
The calculated value of χ2 is less than the tabular value, hence we accept our
null hypothesis.
Problem A set of five similar coins is tossed 320 times and the result is given
in the following table
No. of heads 0 1 2 3 4 5
frequency 6 27 72 112 71 32
Test the hypothesis that data followed a binomial distribution. (The tabular
value of χ2 at 5 % level of significance for 5 degree of freedom is 11.07.)
5
( 12 )5 = 10
P (r = 0) = 320 0
5
( 21 )5 = 50
P (r = 1) = 320 1
5
( 21 )5 = 100
P (r = 2) = 320 2
5
( 12 )5 = 100
P (r = 3) = 320 3
5
( 12 )5 = 50
P (r = 4) = 320 4
5
( 21 )5 = 10
P (r = 5) = 320 5
11