0% found this document useful (0 votes)
45 views24 pages

ST 318 Test 2-3

Sampling

Uploaded by

42b62kbf5m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views24 pages

ST 318 Test 2-3

Sampling

Uploaded by

42b62kbf5m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

UNIVERSITY OF DAR ES SALAAM

College of Social Science (CoSS)


Department of Statistics

POPULATION WITH TREND


GROUP 1 REJECTED QUESTION
a) If the starting random number is drawing the sample 19 with N=23, n=5, units 19, 1, 6, 11, 16
constituting the samples. Find the first and last members are Y1 and Y19 respectively.
Solution
𝑛 𝑁
1 ± 2(𝑁−𝐾) [2𝑖+(𝑛−1)𝑘−(𝑁+1)−2𝑛2 𝑛 ]

but 𝑖 + (𝑛 + 1) 𝑘 > 𝑁
For
𝑁=23 𝑛=𝑘 =5 𝑛2 =4

5 23
1 ± 2(23−5) [(2×19)+(5−1)5−(23+1)−(2×4) 5 ]

=1 ± 2(23−5)
5 14
[− 5 ]

=1 ± 2(18)
5 14
[− 5 ]

=1 ± [−18
7
]

SO,
Y1=1 + [−18
7
] =11
18

Y19=1 − [−18
7
] =25
18

The first number is 11


18

The last number is 25


18
GROUP 1 NEW QUESTION
Question
POPULATION WITH TREND
a) What is population with linear trend?

Answer
Population with linear trend; this refers to the time series population variables changes by a
constant amount of each time period.

b) The human resources department at company ABC has 42 workers in it. In order to find out
some information about the as a whole, we want to take a sample of 7 of those workers to
interview. If we have an order list of workers numbered through 42, and we start at worker
number 3, which workers would be included in our sample?
Solution.
N=42
n=7,
Then, to obtain the k=N/n
k= 42/7
k=6
But starting number is 3, now
3,9,15,21,27,33,39
Therefore, the workers would be included in our sample are; 3,9,15,21,27,33 and 39.
ST 318 GROUP 2 TEST 2

Instructions.

Answer all questions

a) What is probability proportional to size sampling?.What are the methods used in this sampling
technique.

Probability proportional to size sampling (PPS) is a sampling method in which units have
unequal chance of being selected. Samples are obtained depending on their size, in which a
unit with large size have a high probability of being selected compared to units with small size.

The methods used in this sampling technique are:

• Cumulative selection method


• Rejection method

b) Given the dataset of 500 BC population with income (xi) and expenditure (yi) with N = 100,
X= 150, n =5 as shown below.
i xi yi
1 50 100
2 40 90
3 30 80
4 20 70
5 10 60

Estimate Ýpps and it’s variance


GROUP 3 TEST QUESTION
a) In multi stage sampling, what do care mostly when optimally selecting samples from different
groups?
In multi stage sampling, we most care about the minimization of costs of selecting samples
from different groups and the minimization of the variance of the samples obtained to ensure
precision and relevance

b)
GROUP 4: ST 318
a) Define double sampling
Refers to the design in which initially a sample of units is selected for obtaining auxiliary
information only, and then a second sample is selected in which the variable of interest is
observed in addition to the auxiliary information.

b) Given
N=600, n=12, n1=4, n2=8, S1 2 = 696.97 and S2 2 = 4169.70. A large sample of 100 is selected
using SRS then n’=100, n1’=40, n2’=60. Another phase was done and the results were n1=4
from n1’=40 then we get y1i=30,40,60,70 and n2=8 from n2’=60 then we get
y2i=50,70,80,100,110,140,150,180. Estimate yst and its variance.
SOLUTION
From
𝑛ℎ ′
𝑦̅𝑠𝑡 = ∑2ℎ=1 𝑤ℎ 𝑦̅ℎ Where 𝑤ℎ = 𝑛′
40
= 0.4 𝑤1 =
100
60
𝑤2 = = 0.6
100
30+40+60+70
𝑦̅1 = = 50
4
50+70+80+100+110+140+150+180
𝑦̅2 = = 110
8
= 𝑤1 𝑦̅1+ 𝑤2 𝑦̅2 = 0.4 × 50 + 0.6 × 110 = 86
∴ Therefore 𝑦̅𝑠𝑡 = 86
➢ To estimate the variance of the estimate
From
w h 2 Sh 2 w h Sh 2 g′
𝑽𝒂𝒓(𝑦̅𝑠𝑡 ) = ∑2h=1 − ∑2h=1 + n′ ∑2h=1 wh (y̅h − y̅st )2
nh N

Hence
wh 2 Sh 2 w1 2 S1 2 w2 2 S2 2 0.42 ×696.97 0.62 ×4169.70 2155153
➢ ∑2h=1 = + = + =
nh n1 n2 4 8 10000
wh Sh 2 2
w1 S1 +w2 S2 2
1 2780608
➢ ∑2h=1 = = (0.4 × 696.97 + 0.6 × 4169.70) =
N 600 600 600000
g′ N−n′ 600−100 5
➢ = = =
n′ n′ (N−1) 100(600−1) 599
➢ ∑2h=1 wh (y̅h − y̅st )2 = w1 (y̅1 − y̅st )2 + w2 (y̅2 − y̅st )2
= 0.4(50 − 86)2 + 0.6(110 − 86)2 = 864
2155153 2780608 5
Then Var(y̅st ) = 10000 − + 599 × 864 = 218.09
600000

∴ The estimated variance of the estimate is 218.09


𝐵𝑢𝑡 𝑓𝑖𝑠𝑟𝑡 𝑎𝑛𝑠𝑤𝑒𝑟 𝑓𝑟𝑜𝑚 𝑞𝑢𝑒𝑠𝑡𝑖𝑜𝑛 𝑝𝑟𝑜𝑣𝑖𝑑𝑒𝑟 𝑤𝑎𝑠 217.9319
GROUP 5 QUESTION

PART A
Explain the following terms that provide a basis for discussing the concepts of achieving
optimum allocation estimated variance in double sampling for stratification.
i) Neyman allocation: An allocation strategy that minimizes the variance of the estimator
under the assumption of known stratum variances. Neyman allocation is often used as a
benchmark for evaluating the efficiency of alternative allocation strategies

ii) Proportional allocation: A type of allocation where the sample size allocated to each
stratum is proportional to the size of the stratum in the population. Proportional allocation
is often used as a starting point for determining the optimum allocation.

iii) The efficiency of an estimator; refers to its ability to provide accurate estimates using the
minimum amount of resources or sample units. Optimum allocation aims to maximize the
efficiency by minimizing the variance of the estimator.
PART B
In a simple random sample of 374 households from a large district, 292 were occupied by
white families and 82 by nonwhite families. A sample of about one in four households gave
the following data on 0wned rented total ownership.
white 31 43 74
Non- 4 14 18
white
GROUP 6 QUESTION

a) What is probability proportional to size sampling?


Probability proportional to size (PPS) Sampling is a sampling method in which units have
unequal chance of being selected. Samples are obtained depending on their size, in which a
unit with large size have a high probability of being selected compared to units with small size.

OR Is the sampling process where each element of the population has chance Pi to be
selected to the sample when performing one draw?

b) Consider the following information about income (yi) and expenditure (Xi) for 10
Observations from Mitumba Village with 150 people.
S/N 1 2 3 4 5 6 7 8 9 10
yi 78 74 104 88 96 109 102 72 93 84
Xi 26 29 56 31 52 55 71 31 54 40

From the above table estimate ŷ and its variance.

Solution:N=150, X=445

i yi Xi Pi =
𝑋𝑖 yi
⁄pi yi 2
𝑥 ( − ŷ)
Pi
1 78 26 0.058 1344.828 145535.383
2 74 29 0.065 1138.462 30668.766
3 104 56 0.126 825.397 19027.444
4 88 31 0.07 1257.143 86321.966
5 96 52 0.117 820.513 20398.695
6 109 55 0.124 879.032 7107.333
7 102 71 0.16 637.5 106169.751
8 72 31 0.07 1028.571 4255.475
9 93 54 0.121 768.595 37924.447
10 84 40 0.09 933.333 900.24

a) Estimate of ŷ
1 𝑦i 1
Ŷ=𝑛 ∑𝑛𝑖=1 𝑃𝑖 =10(9633.374) =963.337

b) Variance of ŷ

Note The whole variance also has a ^ symbol


1 𝑦𝑖 2
Var(ŷ)= 𝑛(𝑛−1) ∑𝑛𝑖=1 (𝑝𝑖 − ŷ)

1
=10×9 (458,309.498)= 5,092.328
QUESTION GROUP 7: Two stage sampling.
a) What is Multistage sampling and two stage sampling?
Answers
Multistage sampling is simply sampling at more than one stage. It involves randomly selecting
clusters at several stages; in the ultimate stage, that’s where observation is conducted and
estimations from this stage are of the next stage/ bigger cluster from which the elements of the
ultimate stage were selected. For instance, a researcher wants to estimate the average number
of statistics graduates in Dar es Salaam, sampling can be done by first identifying universities in
Dar es Salaam that offer statistics course, then identifying Statistics courses offered in those
universities. For simplicity, the size of the clusters is assumed to be constant, that is all clusters
are said to have the same size.

Two-stage sampling; meaning.


This is a sampling plan that involves randomly selecting clusters from a population, and from
the clusters, a element are randomly selected to constitute the final sample. This is an efficient
way of sampling incase the cluster are so large that collecting data from every unit in the cluster
is very expensive. However, it is a more reliable sampling plan if there is no large variation
between the elements in the clusters, so that the final sample is representative.
The first step involves randomly selecting clusters, these are referred to as primary units. And
then a sample is chosen from each primary unit.

b) A set of 20,000 records are stored in 400 drawers, each containing 50 records. In a two-stage
sampling, five records are drawn at random from each of the 80 randomly selected drawers.
For one item the estimated variance were s12= 362 and s22= 805. Compute the standard error
of the mean per record from this sample

Solution.
Estimated variance of estimated population mean in two stage sampling is:
(1−𝑓1) 𝑓1(1−𝑓2)
v(ӳ) = s12 + s22
𝑛 𝑛𝑚

(ȳ𝑖− ӳ)2 ∑𝑖 ∑𝑗(𝑦𝑖𝑗− ȳ𝑖)2


where s12 = ∑𝑛𝑖=1 s22=
𝑛−1 𝑛(𝑚−1)

N= 400 M=50 n=80 m=5 s12= 362 and s22= 805


f1= n/N = 80/400 = 0.2
f2 = m/M = 5/50 = 0.1
(1−0.2) 0.2(1−0.1)
v(ӳ) = (362) + (805)
80 5∗80

= 3.98225

Standard error = √v(ӳ)

= √3.98225
= 1.9955… ~ 1.996
The standard error of the mean per record is 1.996.
GROUP 8 QUESTION
Question 01; A
➢ AWhat is double sampling?
✓ Refers to the design in which initially a sample of unit is selected for obtaining auxiliary
information only, and then second sample is selected in which the variable of interest
is observed in addition to the auxiliary information.
➢ Provide the reason for conduct double sampling?
✓ The reason for conduct double sampling is to obtain better estimator by using the
relationship between auxiliary variable and the variable of interest.
Question 01; B.
In a survey to estimate average household monthly medical expenses, 500 households were
selected at random from a population of 5000 households. Of the selected households, 336
had children in the household and 164 had no children. A stratified subsample of 112
households with children and 41 households without children was then selected, and monthly
medical expenditure data were collected from households in the subsample. For the
households with children, the sample mean expenditure was $280 with a sample standard
deviation of 160; for the households without children, the respective figures were $110 and
60. Estimate mean monthly medical expenditure for households in the population, and
estimate the variance of the estimate.
Stratum 𝑛ℎ ′ 𝑛ℎ 𝑦̅ℎ 𝑆ℎ
1 336 112 $ 280 160
Solution
2 164 41 $ 110 60
Let that Total 500 153
❖ Stratum (1) represents households with children
❖ Stratum (2) represents households without children
Hence, the information above will be summarized as follow;
➢ To estimate the mean monthly medical expenditure for households in the population.
From
𝑛ℎ ′
𝑦̅𝑠𝑡 = ∑2ℎ=1 𝑤ℎ 𝑦̅ℎ Where 𝑤ℎ =
𝑛′
= 𝑤1 𝑦̅1+ 𝑤2 𝑦̅2 = 0.672 × 280 + 0.328 × 110 = $ 224.24
∴ The mean monthly medical expenditure for households in the population is $ 224.24
➢ To estimate the variance of the estimate
From
wh 2 Sh 2 wh Sh 2 g′
𝑽𝒂𝒓(𝑦̅𝑠𝑡 ) = ∑2h=1 nh
− ∑2h=1 N
+ n′ ∑2h=1 wh (y̅h − y̅st )2

Hence
w h 2 Sh 2 w 1 2 S1 2 w 2 2 S2 2 0.6722 ×1602 0.3282 ×602 70416
➢ ∑2h=1 = + = + =
nh n1 n2 112 41 625
2
w S w1 S1 2 +w2 S2 2 1 2298
➢ ∑2h=1 h h = = 5000 (0.627 × 160 + 0.328 × 602 ) =
2
N 5000 625
g′ N−n′ 5000−500 9
➢ = = 500(5000−1) = 4999
n′ ′
n (N−1)
➢ ∑2h=1 wh (y̅h − y̅st )2 = w1 (y̅1 − y̅st )2 + w2 (y̅2 − y̅st )2
3981264
= 0.672(280 − 224.24)2 + 0.328(110 − 224.24)2 = 625
70416 2298 9 3981264
Then Var(y̅st ) = 625
− 625 + 4999 × 625 = 120.4571
∴ The estimated variance of the estimate is 120.4571
GROUP 9 QUESTION:
a) what is probability proportional of size without replacement

Refers to a sampling method where the probability of selecting a specific item from a population is
directly proportional to its size or weight, and each selection is made without replacing the selected
item back into the population.
b) For a population with N = 3, Zi = ½, 1/3, 1/4 and Yi= 7, 5, 2; two units are drawn without
replacement, the first with probability proportional to Zi the second with probability proportional to
the remaining sizes. For this method of sample selection, compare the variances of YHT and YM. Use the
variance formulas.

Answers;

To compare the variances of YHT and YM using the given method of sample selection, we need to
calculate the variances of the two estimators.

Let's define the variables first:

N = 3 (population size)

Zi = ½, 1/3, 1/4 (sampling probabilities)

Yi = 7, 5, 2 (values of the population units)

YHT represents the estimator for the total sum (population total) and Ym represents the estimator for
the population mean.

YHT = N * (Y1/Z1 + Y2/Z2) / (1/Z1 + 1/Z2)

To calculate the variance of YHT, we'll use the variance formula for two-stage sampling:

Var (YHT) = N2 * (1 - n/N) * (Y12/ Z1 + Y22/ Z2) / (n - 1)

Where n is the size of units drawn.

Substituting the given values:

n = 2, N = 3, Y1 = 7, Y2 = 5, Z1 = ½, Z2 = 1/3

Var (YHT) = 32 * (1 - 2/3) * (72/ (1/2) + 52/ (1/3)) / (2 - 1)

= 9 * (1/3) * (49/ (1/2) + 25/ (1/3))

= 9 * (1/3) * (98 + 75)

= 9 * (1/3) * 173

= 519

Therefore, the variance of YHT is 519.

Now, let's calculate the variance of YM (estimator for the population mean):

YM = N * (Y1/ Z1 + Y2/Z2) / (N/ Z1 + (N-1) / Z2)


To calculate the variance of YM, we'll use the variance formula for two-stage sampling:

Var (YM) = N2 * (1 - (N-n)/N) * (Y12/ Z1 + Y22/ Z2) / (n * (N - n))

Substituting the given values:

n = 2, N = 3, Y1 = 7, Y2 = 5, Z1 = ½, Z2 = 1/3

Var (YM) = 32 * (1 - 1/3) * (72/ (1/2) + 52/ (1/3)) / (1 * (3 - 1))

= 9 * (2/3) * (49/ (1/2) + 25/ (1/3)) / 2

= 9 * (2/3) * (98 + 75) / 2

= 9 * (2/3) * 173 / 2

= 519

Therefore, the variance of YM is also 519.

In conclusion, the variances of YHT and YM are the same, both equal to 519.

GROUP 10
a) What is probability proportional to size (PPS)?
Answer:
Probability proportional to size (PPS), also known as probability proportional to volume (PPV)
or probability proportional to weight (PPW), is a sampling technique used in statistics and
survey research. It is commonly employed when selecting a sample from a population in which
the units vary in size or weight.

In PPS sampling, each unit in the population is assigned a probability of selection that is
proportional to its size or weight relative to the total size or weight of the population. The
larger or heavier units have a higher probability of being selected compared to smaller or
lighter units

b) Consider population of 10 individuals numbered from 1 to 10. The population have varying
size as follows
Element Size
1 10
2 5
3 15
4 8
5 4
6 10
7 6
8 9
9 11
10 7
Using PPS sampling without replacement, if we want to select a sample of size 5 determine
the probability of selecting the following elements 1,3,7,4 and 10

Element Xi Pi
1 10 0.118
2 5 0.057
3 15 0.176
4 8 0.0941
5 4 0.047
6 10 0.117
7 6 0.071
8 9 0.1059
Solution 9 11 0.129
10 7 0.082
∑ 𝑥 = 85 , and P(xi) = 𝑋𝑖⁄∑
𝑋𝑖
So the probability of selecting the elements 1,3,7,4 and 10
is,
∏ 𝑝(𝑋𝑖) =0.118×0.176×0.0941× 0.071×0.082= 0.000011377

QUESTION FROM GROUP 11


a) Define optimum sampling and subsampling fraction, what are the factors that affect optimum
and subsampling fraction
Answer: optimum sampling is the process of determining the optimal sample size and sampling
fraction for a survey. OR

The optimum sampling fraction is the value that minimizes the cost of the survey while still
achieving the desired level of precision.The goal of optimum sampling is to achieve a desired
level of precision (accuracy) at the lowest possible cost.

Subsampling fraction is the proportion of nonrespondents in a survey who are selected for a
more intensive follow-up in a second phase of data collection.
The subsampling fraction is used to increase the response rate of a survey and to improve the
accuracy of the estimates.

The optimal subsampling fraction is the value that maximizes the response rate and accuracy
of the estimates while still being cost-effective.
Here are some factors that can affect the optimum sampling fraction and subsampling fraction:
The cost of data collection: The more expensive it is to collect data, the lower the optimum
sampling fraction will be.
The desired level of precision: The more precise the estimates are desired, the higher the
optimum sampling fraction will be.
The expected response rate: The lower the expected response rate, the higher the optimum
subsampling fraction will be.
b) Find optimum sampling
Given that, C1 = 10C2 and S2 = 1.3Sµ
SOLUTION
By using Cauchy Schwarz inequality formular

S2 C1
Mopt = Sµ
× √C2

Since, S2 = 1.3Sµ and C1 = 10C2


1.3Sµ 10C2
Mopt = Sµ
× √ C2

Mopt = 1.3√10

Mopt = 1.3√10 Therefore the optimum sampling is 4.11


QUESTION FROM GROUP 12
(a) What are the objectives of phase 1 and phase 2 in double sampling stratification.
Answer
Phase 1; Objective is to estimate the weight of each stratum
Phase 2; Objective is to find/estimate the value of population mean.

(b) Given N=600, n=12, 𝑛1 = 4, 𝑛2 = 8.


Phase1, A large sample of 100 is selected from population SRS then 𝑛′ = 100, 𝑛′1 = 40 and
𝑛′ 2 = 60.
Also
Phase 2, pick random sample of 𝑛1 = 4 from 𝑛′1 = 40 then we get 𝑦1𝑖 = 3,4,6,7. Again pick
random
Sample of 𝑛2 = 8 from 𝑛′ 2 = 60 then we get 𝑦2𝑖 = 5,7,8,10,11,14,15,18.
i. Estimate 𝑦̅𝑠𝑡
ii. Variance (𝑦̅𝑠𝑡 )
Solution.

𝑛′ ℎ
From phase 1:objective to estimate the weight of each stratum. From 𝑤ℎ = 𝑛′
𝑛′ 1
Then, 𝑤1 =
𝑛′

But 𝑛′1 = 40, 𝑛′ = 100


40
, 𝑤1 = = 0.4
100
𝑛′ 2
Again, , 𝑤2 =
𝑛′

But 𝑛′ 2 = 60, 𝑛′ = 100


60
𝑤2 = = 0.6
100
Phase 2: Objective is to estimate the population mean
𝑛
ℎ 𝑦
∑𝑖=1 ℎ𝑖
𝑦̅ℎ𝑖 =
𝑛ℎ
𝑛
1 𝑦
∑𝑖=1 ℎ1
𝑦̅ℎ1 =
𝑛1
3+4+6+7
= =5
4
𝑛2 𝑦
∑𝑖=1 ℎ2
𝑦̅ℎ2 =
𝑛2
5+7+8+10+11+14+15+18
= = 11
8

i. ̅𝑦𝑠𝑡 = ∑𝑘ℎ=1 𝑤ℎ 𝑦̅ℎ = ∑2ℎ=1 𝑤ℎ 𝑦̅ℎ


= 𝑤1 𝑦̅1 + 𝑤2 𝑦̅2
= 0.4 × 5 + 0.6 × 11 = 8.6
∑𝑘 2 2
ℎ=1 𝑤 ℎ 𝑠 ℎ ∑𝑘 2
ℎ=1 𝑤ℎ 𝑠 ℎ 𝑔′
ii. Variance ( ̅𝑦𝑠𝑡 ) = − + ∑𝑘ℎ=1 𝑤ℎ (𝑦̅ℎ − 𝑦̅𝑠𝑡 )2
𝑛ℎ 𝑁 𝑛′

′ 𝑁−𝑛′
Where by 𝑔 =
𝑁−1
1
From 𝑠 2 ℎ = [∑ 𝑦 2 ℎ − 𝑛ℎ 𝑦̅ 2 ℎ ]
𝑛−1
1
𝑠 21 = [110 − 4 × 52 ] but [𝑛1 = 4, ∑ 𝑦 2 ℎ = 110, 𝑦̅ 21 = 5 ]
4−1
10
𝑠 21 = = 3.3333
3
1
𝑠22 = [1104 − 8 × 112 ] but [𝑛8 = 4, ∑ 𝑦 2 ℎ = 1104, 𝑦̅ 2 2 =
8−1
11]
136
𝑠22 = = 19.4286
7
600−100
𝑔′ = = 0.8347
600−1
0.42 (3.33) 0.62 (19.4286) 0.4(3.33)+0.6(19.4286) 0.8347
Var(𝑦̅𝑠𝑡 ) = [ + ]−[ ]+ [0.4(5 − 8.6)2 +
4 8 600 100
0.6(11 − 8.6)2 ]
= 1.007487 − 0.0216486 + 0.07211808
= 1.058
GROUP 13 Two-stage cluster sampling
a) Define the following terms as used in sampling;
1 Probability proportional to size (PPS)
Is a method of sampling from a finite population in which a size measure is available
for each unit before sampling and where a probability of selecting a unit proportional
to its size.
2 Effective sample size
Is the number of distinct units in the sample.
3 Sampling error
Is the statistical error that occurs when an analyst does not select a sample that represent
the entire population data.

b) There are 36 departments in a small liberal arts college. One wants to estimate the average
amount of money the students spent on the last semester. Since the size of each department
varies very much, a two-stage cluster sampling using probability proportion to size for the
primary unit is carried out. The results listed in the tables below.
Table 1.
Department Mi mi Textbook expenses in Tshs for last semester
1. 10 4 326,400,423,443
2. 20 8 278,312,450,350,227,438,512,403
3. 30 12 512,256,332,402,512,309,411,610,422,630,550,470
4. 15 6 426,312,512,440,342,533

Find; Table 2.
variable SE Mean StDev Variance
i. Means for each department (ȳ)
Dept1 25.6 51.1 2612.7
ii. Estimate the total sample mean (µ)
Dept2 34.1 96.3 9277.4
iii. Variance of the total sample mean (var (µ))
Dept3 33.9 117.6 13828.8
Note: Use Hansen-Hurwitz Approach
Dept4 36.1 88.4 7815.9
∑ 𝑦𝑖
Solution𝑦̅= =Means for each department (ȳ)
𝑛

• 𝑦̅ for dept1= (326+400+423+443) ÷4 =398.0


• 𝑦̅ for dept2= (278+312+450+350+227+438+512+403) ÷8=371.3
• 𝑦̅ for dept3= (512+256+332+402+512+309+411+610+422+630+550+470) ÷ 12= 451.3
• 𝑦̅ for dept4= (426+312+512+440+342+533) ÷ 6 =427.5
Solution for µ and var (µ) is given by;

i.
ii.
GROUP 14. TEST 2 QUESTION.
Topic 5: Multi–Stage Sampling Ref. Cochran pg274.
5.1 Two-stage sampling, means and variances in two stage sampling and variance of the estimated mean
a) Explain what is meant by two-stage sampling and describe the steps involved.
Suppose that each unit in the population can be divided into a number of smaller units. If
subunits within a selected unit give similar results, it seems uneconomical to measure them all.
A common practice is to select and measure a sample of the subunits in any chosen unit.
This is known as two-stage sampling because the sample is taken in two steps.

• the first is to select a sample of units, often called the primary units,

• the second is to select a sample of second-stage units or subunits from each chosen
primary unit.
b) A garment manufacturer has N = 90 plants located throughout the United States and wants
to estimate the average number of hours that the sewing machines were down for repairs in
the past months. Because the plants are widely scattered, she decides to use cluster sampling,
specifying each plant as a cluster of machines. Each plant contains many machines, and
checking the repair record for each machine would be time-consuming. Therefore, she uses
two-stage cluster sampling. Enough time and money are available to sample n = 10 plants
and approximately 20% of the machines in each plant. The resulting data are given in the
table below.

We want to estimate the average downtime per machine, and we know that the total
number of machines in all plants is K = 4500.
10

∑ 𝑀𝑖𝑦𝑖
̅ = 50 × 5.4 + 65 × 4 + 55 × 5.67 + 48 × 4.8 + 52 × 4.3 + 58 × 3.83 + 42 × 5 + 66 × 3.85 + 40 × 4.88 + 56 × 5 = 2400.59
1
FIRST SOLUTION
The table represent downturn of sewing machine

Using the data in the table above estimate average downtime per machine and its variance.
The manufacturer knows she has combined total of 4500 machines in all plants.
GRROUP 15
a) Define Probability proportion to size (PPS)
PPS is a method of sampling of sampling from a finite population in which a size measure is
available for each population unit before sampling and where the probability of selecting
unit is proportion to it's size

b) There are 36 department in small liberal arts college. One wants to estimate the average
amount of money student spent on text books last semester. Since the size of each
department varies much, a two stage sampling using PPS for the primary is carried out. The
result are below

Find the estimate the population mean using PPS estimator and estimate the variance of that
estimator
SOLN
GROUP 16 QUESTION
1) Given the data from a series of samples, what are three kinds of quantity for which we may
wish estimates?
i. The change in Y from one occasion to the next.
ii. The average value all occasions. of Y over
iii. The average value of Y for the most recent occasion.
2) In a survey to estimate average household monthly medical expenses, 500 households were
selected at random from a population of 5000 households. Of the selected households, 336
had children in the household and 164 had no children. A stratified subsample of 112
households with children and 41 households without children was then selected, and monthly
medical expenditure data were collected from households in the subsample. For the
households with children, the sample mean expenditure was $280 with a sample standard
deviation of 160; for the households without children, the respective figures were $110 and
60. Estimate mean monthly medical expenditure for households in the population, and
estimate the variance of the estimate
Solution
Let that Stratum 𝑛ℎ ′ 𝑛ℎ 𝑦̅ℎ 𝑆ℎ
1 336 112 $ 160
❖ Stratum (1) represents households with 280
children 2 164 41 60 $ 110
❖ Stratum (2) represents households without Total 500 153
children
Hence, the information above will be summarized as follow;
➢ To estimate the mean monthly medical expenditure for households in the population.
From
𝑛ℎ ′
𝑦̅𝑠𝑡 = ∑2ℎ=1 𝑤ℎ 𝑦̅ℎ Where 𝑤ℎ = 𝑛′
= 𝑤1 𝑦̅1+ 𝑤2 𝑦̅2 = 0.672 × 280 + 0.328 × 110 = $ 224.24
∴ The mean monthly medical expenditure for households in the population is $ 224.24
➢ To estimate the variance of the estimate
From
w h 2 Sh 2 w h Sh 2 g′
𝑽𝒂𝒓(𝑦̅𝑠𝑡 ) = ∑2h=1 − ∑2h=1 + n′ ∑2h=1 wh (y̅h − y̅st )2
nh N

Hence
w h 2 Sh 2 w 1 2 S1 2 w 2 2 S2 2 0.6722 ×1602 0.3282 ×602 70416
➢ ∑2h=1 = + = + =
nh n1 n2 112 41 625
w h Sh 2 w1 S1 2 +w2 S2 2 1 2298
➢ ∑2h=1 = = 5000 (0.627 × 1602 + 0.328 × 602 ) =
N 5000 625
g′ N−n′ 5000−500 9
➢ = = 500(5000−1) = 4999
n′ n′ (N−1)
➢ ∑2h=1 wh (y̅h − y̅st )2 = w1 (y̅1 − y̅st )2 + w2 (y̅2 − y̅st )2
3981264
= 0.672(280 − 224.24)2 + 0.328(110 − 224.24)2 = 625

Then
70416 2298 9 3981264
Var(y̅st ) = − + 4999 × = 120.4571
625 625 625

∴ The estimated variance of the estimate is 120.4571


GROUP 17 QUESTION
a. How can one determine the optimal allocation of sample units in double sampling for
stratification?
Solution:

• Neyman Allocation: It involves allocating sample units between the first and second phases
in a way that minimizes the variance of the estimated population parameter under certain
assumptions. Neyman allocation account the variances and covariances of the estimators and
cost associated in both phases.

• Efficiency Criteria: Efficiency Criteria aim to maximize the precision or efficiency of the
estimators. criteria such as Mean square, relative efficiency or design effect can be used to
evaluate allocation strategies, this criterion compares the precision of the estimators under
different allocation scenarios and guide the selection of the optimal allocation.

• Analytical approaches: this can be used to find allocation that maximizes precision. These
methods involve formulating an objective function that represents the desired allocation
goals incorporating relevant constraints and solving the optimization problem to obtain the
optimal allocation.

• Simulations: By simulating the sampling process and estimating the population parameters
under different allocation scenarios one can compare the precision and accuracy of the
estimates, thus helps to identify allocation strategy that yields the best results for a specific
study population.

b. A shoe store wants to estimate the average number of pairs of shoes owned by the students
who live in a certain college town neighborhood. They think that a stratified sample based
on gender is a good approach to take but do not know the makeup of the gender in that
neighborhood. They also do not know the gender of the respondent until after contacting
them. So, they use double sampling by first contacting 160 randomly selected students in that
neighborhood and asking them about their gender. It turns out that 64 are males and 96 are
females. They then randomly sample 8 males and 12 females, and provide them a $10.00
incentive for going home to count the number of pairs of shoes, and report them.
Compute 𝐲̅𝐬𝐭 and its estimated standard deviation.
The data are given in the table below:

Male 5 6 9 5 9 7 5 8
Female 17 19 13 16 8 11 15 19 12 13 33 20

Variable N Mean St.


Dev
Male 8 6.750 1.753
Female 12 16.33 6.37

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy