BSC Sample Surveys Unit I Part I
BSC Sample Surveys Unit I Part I
BASIC CONCEPT OF SAMPLE SURVEYS Sampling units: Elementary units or groups of such units, which, besides being clearly
defined, identifiable and observable, are convenient for purposes of sampling, are called
A sample survey is a method of drawing an inference about the characteristics of a population sampling units. For example, in a family budget enquiry, usually a family is considered as a
or universe by observing a part of the population. For example, when one has to make an sampling unit, since it is formed to be convenient for sampling for ascertaining the required
inference about a large lot and is not practicable to examine each individual member of the lot, information. In a crop survey, a farm or a group of farms owned or operated by a household
one always takes help of sample surveys, that is to say one examines only a few member of the may be considered as the sampling units.
lot and, on the basis of this sample information, one makes decisions about the whole lot.
Sampling frame: For using sampling methods in the collection of data, it is essential to
Thus, a person wanting to purchase a basket of oranges may examine a few oranges from the
have a frame of all the sampling units belonging to the population to be studied with their
basket and on that basis make his decision about the whole basket.
proper identification particulars and such a frame is called the sampling frame. This may be a
Such methods are extensively used by government bodies throughout the world for assessing, list of units with their identification particulars.
different characteristics of national economy as are required for taking decisions regarding the
As the sampling frame forms the basic material from which a sample is drawn, it should be
impositions of taxes, fixation of prices and minimum wages etc. and for planning and
insured that the frame contains all the sampling units of the population under consideration but
projection of future economic structure, for estimation of yield rates and acreages under
excludes units of any other population.
different crops, number of unemployed persons in the labour forces, construction of cost of
living indices for persons in different professions and so on. Sample: A sample is a subset of a population selected to obtain information concerning the
characteristics of the population. In other words, one or more sampling units selected from a
Sample survey techniques are extensively used in market research surveys for assessing the
population according to some specified procedure are said to constitute a sample.
preferential pattern of consumers for different types of products, the potential demand for a
new product which a company wishes to introduce, scope for any diversification in the Random sample: A random or probability sample is a sample drawn in such a manner that
production schedule, and so on. each unit in the population has a predetermined probability of selection.
Thus, sampling may become unavoidable because we may have limited resources in terms of Estimator: An estimator is a statistic obtained by a specified procedure for estimating a
money and / or man hours, or it may be preferred because of practical convenience. population parameter. The estimator is a random variable, as its value differs from sample to
sample and the samples are selected with specified probabilities.
Sampling is first broadly classified as Subjective and Objective.
The particular value, which the estimator takes for a given sample, is known as an estimate.
Any type of sampling which depends upon the personal judgment or discretion of the sampler
himself is called Subjective. But the sampling method which is fixed by a sampling rule or is The difference between the estimator (t ) and the parameter (θ ) is called error.
independent of the sampler’s own judgment is Objective sampling.
An estimator (t ) is said to be unbiased estimator for the parameter (θ ) if, E (t ) = θ ,
Objective Sampling otherwise biased. Thus bias is given by
E (t − θ ) = B(t )
Non-probabilistic Probabilistic and mixed The mean of squares of error taken from θ is called mean square error (MSE ) .
Mathematically it is defined as
In non-probabilistic objective sampling, there is a fixed sampling rule but there is no
probability attached to the mode of selection, e.g. selecting every 5 − th individual from a list. MSE (t ) = E (t − θ ) 2 .
If, however, the selection of the first individual is made in such a manner that each of the first
The MSE may be considered to be a measure of accuracy with which the estimator t
10 gets an equal chance of being selected, it becomes a case of mixed sampling, if for each
estimates the parameter θ .
individual there is a definite pre-assigned probability of being selected, the sampling is said to
be probabilistic. The expected value of the squared deviation of the estimator from its expected value is termed
sampling variance. It is a measure of the divergence of the estimator from its expected value
Elementary unit or simply unit: It is an element or a group of elements, on which
and is given by
observations can be made or from which the required statistical information can be ascertained
according to a well defined procedure, examples of unit are person, family, household, farm, V (t ) = E [t − E (t )]2 .
factory, tree, a period of time such as an hour, day etc.
This measure of variability may be termed the precision of the estimator t .
Population: The collection of all units of a specified type in a given region at a particular
point or a period of time is termed as a population or inverse. For example, a population of The relation between MSE and sampling variance or between accuracy and precision can be
persons, families, farms, cattle, houses or automobiles in a region or a population of trees or a obtained as
birds in a forest etc.
MSE (t ) = E (t − θ ) 2 = E[t − E (t ) + E (t ) − θ ]2
A population is said to be finite population or an infinite population according to as the
number of units in it is finite or infinite. = E[t − E (t )]2 + [ E (t ) − θ ]2 = V (t ) + [ B(t )]2 , since E [t − E (t )] = 0 .
Basic concept of samples Surveys 3 4
This shows that MSE of t is the sum of the sampling variance and the square of the bias. Basic principles of sample surveys
However, if t is an unbiased estimator of θ , the MSE and sampling variance are the same.
Two basic principles for sample surveys are
The square root of the sampling variance is termed as standard error of the estimator t .
i) Validity
The ratio of the standard error of the estimator to the expected value of the estimator is
ii) Optimization
known as relative standard error or the coefficient of variation of the estimator.
The principle of optimization takes into account the factors of
Sample space: The collection of all possible sample, sequence, sets is called the sample
space. a) Efficiency
Sampling design: The combination of the sample space and the associated probability b) Cost
measure is called a sampling design. For example, let N = 4 , n = 2 and the probability of By validity, we mean that the sample should be so selected that the results could be interpreted
selection for different samples is objectively in terms of probability. The principle will be satisfied by selecting a probability
Sample (1, 2) (1, 3) (1, 4) (2, 3) ( 2, 4) (3, 4) sample, which ensures that there is some definite, pre-assigned probability for each individual
of the population.
Probability 1/ 6 1/ 6 1/ 6 1/ 6 1/ 6 1/ 6
Efficiency is measured by the inverse of the sample variance of the estimator.
The above table gives the sampling design. Cost is measured by the expenditure incurred in terms of money or man-hours. The principle
of optimization insures that a given level of efficiency will be reached with minimum cost or
Sampling and complete enumeration
that the maximum possible efficiency will be attained with a given level of cost.
The total count of all units of the population for a certain characteristics is known as complete
enumeration, also termed census survey. The money, man-power and time required for Sampling and non-sampling errors
carrying out complete enumeration will generally be large and there are many situations with
The error which arises due to only a sample (a part of population) being used to estimate the
limited means where complete enumeration will not be possible, where recourse to selection of
population parameters and draw inferences about the population is termed sampling error or
a few units will be helpful. When only a part, called sample, is selected from the population
sampling fluctuation. Whatever may be the degree of cautiousness in selecting a sample;
and examine, it is called sample enumeration or sample survey.
there will always be a difference between the parameter and its corresponding estimate. This
A sample survey will usually be less expensive then a census survey and the desired error is inherent and unavoidable in any and every sampling scheme. A sample with the
information will obtain in less time. This does not imply that economy is the only consideration smallest sampling error will always be considered a good representative of the population.
in conducting a sample survey. It is most important that a degree of accuracy of results is also This error can be reduced by increasing the size of the sample (number of units selected in the
maintained. Occasionally, the technique of sample survey is applied to verify that the results sample). In fact, the decrease in sampling error is inversely proportional to the square root of
obtained from the census surveys. The main advantages or merits of sample survey over the sample size and the relationship can be examined graphically as below:
census survey may be outlined as follows:
i) Reduced cost of survey,
ii) Greater speed of getting results, Sample
size
iii) Greater accuracy of results,
iv) Greater scope, and
v) Adaptability Sampling error
Sample survey has its own limitations and the advantages of sampling over complete
When the sample survey becomes a census survey, the sampling error becomes zero.
enumeration can be derived only if
i) the units are drawn in a scientific manner Non-sampling error
ii) an appropriate sampling technique is used, and The non-sampling errors primarily arise at the following stages:
iii) the size of units selected in the sample is adequate. i) Failure to measure some of units in the selected sample
ii) Observational errors due to defective measurement technique
iii) Errors introduced in editing, coding and tabulating the results.
Basic concept of samples Surveys 5
Non-sampling errors are present in both the complete enumeration survey and the sample
survey. In practice, the census survey results may suffer from non-sampling errors although
these may be free from sampling error. The non-sampling error is likely to increase with
increase in sample size, while sampling error decreases with increase in sample size.
8
SIMPLE RANDOM SAMPLING By the sampling wor , the number of possible samples will be N C n = 5C3 = 10 , which are as
follows:
A procedure for selecting a sample of size n out of a finite population of size N in which
each of the possible distinct samples has an equal chance of being selected is called random (1, 3, 6), (1, 3, 8), (1, 3, 9), (1, 6, 8), (1, 6, 9), (1, 8, 9), (3, 6, 8), (3, 6, 9), (3, 8, 9), (6, 8, 9).
sampling or simple random sampling.
Theory of simple random sampling with replacement
We may have two distinct types of simple random sampling as follows:
N , population size.
i) Simple random sampling with replacement (srswr ) .
n , sample size.
ii) Simple random sampling without replacement (srswor ) .
Yi , value of the i − th unit of the population.
Simple random sampling with replacement (srswr )
yi , value of the i − th unit of the sample.
In sampling with replacement a unit is selected from the population consisting of N units, its
N
content noted and then returned to the population before the next draw is made, and the
Y = ∑ Yi , population total.
process is repeated n times to give a sample of n units. In this method, at each draw, each of
i =1
1
the N units of the population gets the same probability of being selected. Here the same
N 1 N
unit of the population may occur more than once in the sample (order in which the sample
Y = ∑ Yi , population mean.
N i =1
1
units are obtained is regarded). There are N n samples, and each has an equal probability
Nn 1 n
of being selected.
y= ∑ yi , sample mean.
n i =1
Note: If order in which the sample units are obtained is ignored (unordered), then in such
1 N 1 N
case the number of possible samples will be σ2 = ∑
N i =1
(Yi − Y ) 2 = ∑ Yi2 − Y 2 , population variance.
N i =1
N
C n + N (1+ N −1C1 + N −1C 2 + L + N −1C n−2 ) .
1 N 1 N 2
Simple random sampling without replacement ( srswor ) S2 = ∑
N − 1 i =1
(Yi − Y ) 2 = ∑ Yi − N Y 2 , population mean square.
N − 1 i =1
Suppose the population consist of N units, then, in simple random sampling without
replacement a unit is selected, its content noted and the unit is not returned to the population 1 n 1 n 2
s2 = ∑ ( yi − y ) 2 = ∑ y i − n y 2 , sample mean square.
before next draw is made. The process is repeated n times to give a sample of n units. In this n − 1 i =1
n − 1 i =1
method at the r − th drawing, each of the N − r + 1 units of the population gets the same
1 Theorem: In srswr , the sample mean y is an unbiased estimate of the population mean Y
probability of being included in the sample. Here any unit of the population cannot
N − r +1 N −1 2 σ 2
i.e. E ( y ) = Y and its variance V ( y ) = S = .
occur more than once in the sample (order is ignored). There are N Cn possible samples, and nN n
1 Proof: It is immediately seen that
each such sample has an equal probability of being selected.
N
Cn
1 n 1 n
Example: For a population of size N = 5 with values 1, 3, 6, 8 and 9 make list of all E ( y ) = E ∑ y i = ∑ E ( yi ) . By definition,
n
possible samples of size n = 3 by both the methods [ srswr (unordered) and srswor ]. i =1 n i=1
N
Solution: By the sampling wr , the number of possible samples will be 1 N
E ( y i ) = ∑ Yi Pr ( y i = Yi ) = ∑ Yi = Y , since yi can take any one of the values
N i =1
N
Cn + N (1+ N −1C1 + L + N −1C n − 2 )= 5C3 + 5 (1+ 4C1 ) = 35 , which are as follows: i =1
Y1 ,L , Y N each with probability 1 / N .
(1, 1, 1), (1, 1, 3), (1, 1, 6), (1, 1, 8), (1, 1, 9), (1, 3, 3), (1, 3, 6), (1, 3, 8), (1, 3, 9), (1, 6, 6),
(1, 6, 8), (1, 6, 9), (1, 8, 8), (1, 8, 9), (1, 9, 9), (3, 3, 3), (3, 3, 6), (3, 3, 8), (3, 3, 9), (3, 6, 6), Therefore,
(3, 6, 8), (3, 6, 9), (3, 8, 8), (3, 8, 9), (3, 9, 9), (6, 6, 6), (6, 6, 8),(6, 6, 9), (6, 8, 8), (6, 8, 9),
1 n
(6, 9, 9), (8, 8, 8), (8, 8, 9), (8, 9, 9), (9, 9, 9). E( y) = ∑Y = Y .
n i =1
Simple random sampling 9 10
1 N
E (Yˆ ) = E ( N y ) = N E ( y ) = N Y = N ∑ Yi = Y
N i =1
1 n 1 n
=
2 ∑ E ( yi − Y ) 2 + ∑ E [( yi − Y ) ( y j − Y )] , i ≠ j .
n i =1 n 2 i, j N 2σ 2 N ( N − 1) 2
and V (Yˆ ) = V ( N y ) = N 2 V ( y ) = = S .
n n
1 n 1 n
=
2 ∑ V ( yi ) + ∑ Cov ( yi , y j ) (2.1)
n i =1 n 2 i, j Remarks:
i≠ j
σ N −1
Consider i) The standard error (SE ) of y is SE ( y ) = V ( y ) = =S .
n nN
N
V ( y i ) = E ( y i − Y ) = ∑ (Yi − Y ) Pr ( y i = Yi )
2 2
Nσ N ( N − 1)
i =1
ii) The standard error Yˆ is SE (Yˆ ) = V (Yˆ ) = =S .
n n
1 N
= ∑ (Yi − Y ) 2 , since yi can take any one of the values Y1 ,L, YN each with
N i =1
Theorem: In srswr , sample mean square s 2 is an unbiased estimate of the population
variance σ 2
i.e. E (s 2 ) = σ 2 .
probability 1 / N .
Proof: By definition
N −1 2 1 N
=σ2 =
N
S , since S 2 = ∑ (Yi − Y ) 2
N − 1 i=1
(2.2) 1 n 1 n
E (s 2 ) = E
n − 1
∑ ( yi − y ) 2 =
n − 1
∑ E ( y i ) − n E ( y ) .
2 2
and i =1 i =1
N To obtain E ( yi2 ) and E ( y 2 ) , note that
Cov ( yi , y j ) = E [( yi − Y ) ( y j − Y )] = ∑ (Yi − Y ) (Y j − Y ) Pr ( y i = Yi , y j = Y j ) .
i, j V ( yi ) = E ( y i2 ) − Y 2 , so that
In this case y j can take any one of the values Y1 ,L , Y N with probability 1 / N irrespective of
E ( y i2 ) = σ 2 + Y 2 , since V ( yi ) = ( N − 1) S 2 / N = σ 2 .
the values taken by yi , because old composition of the population remain the same
throughout the sampling process due to the sampling with replacement. In other words for and
i ≠ j , yi and y j are independent, so that
V ( y ) = E ( y 2 ) − Y 2 , so that
Simple random sampling 11 12
σ2 N − 1 2 σ 2 1 n′ 1
E( y 2 ) = + Y 2 , since V ( y ) = S = , for srswr . i) E( y) = ∑ yi = 25 × 165 = 6.6 = Y , where n′ is the number of sample.
n ′ i =1
n nN n
Therefore, 1 n′
ii) E ( N y ) = ∑ N yi = 33 or E ( N y ) = N E ( y ) = 33 .
1 n σ 2 N − 1 2 n ′ i =1
E (s 2 ) = ∑ (σ 2 + Y 2 ) − n + Y 2 = σ 2 = S .
n − 1 i =1 n N
1 n′ 2
iii) V ( y ) = ∑ yi − Y 2 = 4.12 .
n ′ i=1
Example: In a population with N = 5 , the values of Yi are 8, 3, 11, 4 and 7.
b) Enumerate all possible samples of size 2 by the replacement method and verify that ( N − 1) S 2 σ2
= 4.12 , and = 4.12 , therefore,
nN n
i) Sample mean y is unbiased estimate of population mean Y i.e. E ( y ) = Y .
ii) N y is unbiased estimate of population total Y i.e. E ( N y ) = Y . (n − 1) S 2 σ 2
V ( y) = = = 4.12 .
nN n
( N − 1) S 2 σ 2
iii) V ( y ) = = , and 1 n′ 2 1
nN n iv) E (s 2 ) = ∑ si = 25 × 206 = 8.24
n ′ i =1
(1a)
N − 1 2
iv) E (s 2 ) = 2
S =σ .
N ( N − 1) S 2
and = 8.24 (2a)
Solution: N
1 N 1 N 1 N 2 ( N − 1) S 2
E (s 2 ) = = σ 2 = 8.24 .
Y = ∑ Yi = 6.6 , σ 2 = ∑ Yi2 − Y 2 = 8.24 and S 2 =
N i =1 N i =1
∑ Yi − N Y 2 = 10.3 .
N − 1 i =1 N
b) Form a table for calculation as below: Theory of simple random sampling without replacement
Samples Samples Theorem: In srswor , sample mean y is an unbiased estimate of the population mean Y
yi y i2 N yi si2 yi y i2 N yi si2
N −n 2
i.e. E ( y ) = Y and its variance is V ( y ) = S .
(8, 8) 8.0 64.00 40.0 0.0 (11, 4) 7.5 56.25 37.5 24.5 nN
(8, 3) 5.5 30.25 27.5 12.5 (11, 7) 9.0 81.00 45.0 8.0 Proof: As in srswr ,
(8, 11) 9.5 90.25 47.5 4.5 (4, 8) 6.0 36.00 30.0 8.0
1 n 1 n
2 ∑ ∑ Cov ( yi , y j ) ,
(8, 4) 6.0 36.00 30.0 8.0 (4, 3) 3.5 12.25 17.5 0.5 E ( y ) = Y , and V ( y ) = V ( yi ) + (2.4)
n i =1 n 2 i, j
(8, 7) 7.5 56.25 37.5 0.5 (4, 11) 7.5 56.25 37.5 24.5 i≠ j
(3, 8) 5.5 30.25 27.5 12.5 (4, 4) 4.0 16.00 20.0 0.0
N −1 2
(3, 3) 3.0 9.00 15.0 0.0 (4, 7) 5.5 30.25 27.5 4.5 where V ( yi ) = S , for each i .
N
(3, 11) 7.0 49.00 35.0 32.0 (7, 8) 7.5 56.25 37.5 0.5 (2.5)
(3, 4) 3.5 12.25 17.5 0.5 (7, 3) 5.0 25.00 25.0 8.0 Consider
(3, 7) 5.0 25.00 25.0 8.0 (7, 11) 9.0 81.00 45.0 8.0 N
(11, 8) 9.5 90.25 47.5 4.5 (7, 4) 5.5 30.25 27.5 4.5 Cov ( y i , y j ) = E [( yi − Y ) ( y j − Y )] = ∑ (Yi − Y ) (Y j − Y ) Pr ( yi = Yi , y j = Y j ) .
i, j
(11, 3) 7.0 49.00 35.0 32.0 (7, 7) 7.0 49.00 35.0 0.0
(11, 11) 11.0 121.00 55.0 0.0
Simple random sampling 13 14
In this case y j can take any one of the values except Yi , the value which is known to have N −n 2 S2
V (Yˆ ) = V ( N y ) = N 2 2
S = N (1 − f ) .
1 nN n
already been assumed by yi , with equal probability , so that for i ≠ j ,
N −1
Remarks
1 1
Pr ( yi = Yi , y j = Y j ) = Pr ( yi = Yi ) Pr ( y j = Y j | y i = Yi ) = × . N −n 1− f 1 1
N N −1 i) The standard error of y is SE ( y ) = S =S =S − .
nN n n N
Hence,
N N −n 1− f 1 1
1 ii) The standard error Yˆ is SE (Yˆ ) = N S =NS =NS − .
Cov ( yi , y j ) = ∑
N ( N − 1) i, j
(Yi − Y ) (Y j − Y ) nN n n N
For large population fpc = (1 − f ) → 1 , then
1 N N
= ∑
N ( N − 1) i =1
(Yi − Y ) ∑ (Y j − Y ) − (Yi − Y )
i) V ( y ) =
S2
, and SE ( y ) =
S
.
j =1 n n
1 N N N N 2S 2 NS
= ∑ (Yi − Y ) ∑ (Y j − Y ) − ∑ (Yi − Y ) 2 ii) V (Yˆ ) = , and SE (Yˆ ) = .
N ( N − 1) i =1 n n
j =1 i =1
Theorem: In srswor , sample mean square s 2 is an unbiased estimate of the population
1 N
S2
=− ∑
N ( N − 1) i =1
(Yi − Y ) 2 = −
N
(2.6) mean square S 2 i.e. E ( s 2 ) = S 2 .
Proof: By definition,
Substitute the values of equations (2.5) and (2.6) in equation (2.4), we get
1 n 1 n
( N − 1) S 2 1 S 2 ( N − 1) 2 n − 1 2
V ( y) =
1
n + n (n − 1) − = S − S
E (s 2 ) = E ∑ ( yi − y ) 2 = ∑ E ( y i ) − n E ( y ) .
2 2
2 N n2 N nN nN n − 1 i=1 n − 1 i =1
n
n S2 S2
To obtain E ( y i2 ) and E ( y 2 ) , note that
N −n 2
= S = 1 − = (1 − f ) ,
nN N n n V ( y i ) = E ( yi2 ) − Y 2 , so that
n
where f = is called the sampling fraction and the factor (1 − f ) is called the finite N −1 2
N E ( y i2 ) = S + Y 2 , since V ( y i ) = ( N − 1) S 2 / N .
N
population correction ( fpc) . If the population size N is very large or if n is small
n and V ( y ) = E ( y 2 ) − Y 2 , so that
corresponding with N , then f = → 0 and consequently fpc → 1 .
N
N − n 2 N −n 2
Alternative expression E( y 2 ) = 2
S + Y , since V ( y ) = S , for srswr .
nN nN
N −n 2 1 1 2 Therefore,
V ( y) = S = − S .
nN n N
1 n N −1 2 N −n 2
Corollary: Yˆ = N y is an unbiased estimate of the population total Y with its variance E (s 2 ) = ∑ S +Y 2 − n S + Y 2
n − 1 i =1 N nN
V (Yˆ ) = N 2 (1 − f ) S 2 / n .
1 S2 1 S2
Proof: = [n ( N − 1) − ( N − n)] = (n − 1) N = S2.
n −1 N n −1 N
By definition,
Example: A random sample of n = 2 households was drawn from a small colony of N = 5
1 N households having monthly income (in rupees) as follows:
E (Yˆ ) = E ( N y ) = N E ( y ) = N Y = N ∑ Yi = Y
N i =1
Households: 1 2 3 4 5
and