Ch4 Sol
Ch4 Sol
4)
Tomoki Okuno
Note
• Not all solutions are provided: Exercises that are too simple or not very important to me are skipped.
• Texts in red are just attentions to me. Please ignore them.
Solution.
P
1 −x/θ xi
f (x) = e , x > 0 ⇒ ℓ(θ) = −n log θ − i
θ P θ
P
′ n i xi n(x − θ) ′′ n 2 i xi
ℓ (θ) = − + = , ℓ (θ) = 2 − .
θ θ2 θ2 θ θ3
1
hist(lifetimemotor$lifetime, pr = T)
lines(dgamma(x, shape = 1, scale = theta.mle))},
which implies that the pdf of the Γ(1, θ) has a good fit to the data (histogram).
(c) Obtain the sample median of the data, which is an estimate of the median lifetime of a motor. What
parameter is it estimating (i.e., determine the median of X)?
Solution.
The median is median(lifetimemotor$lifetime) = 55.5. Since the cdf of X is given by
Z x
1 −t/θ
FX (x) = e dt = 1 − e−x/θ , x > 0,
0 θ
(b) Based on these data, obtain the realization of your estimator in part (a). Explain the meaning of this
estimate in terms of the number of customers.
Solution.
mean(c(9, 7, 9, 15, 10, 13, 11, 7, 2, 12)) = 9.5, which is interpreted as an estimated number
of customers that enter the store between 9.00 a.m. and 10:00 a.m based on these data.
4.1.6. Consider the estimator of the pmf in expression (4.1.10). In equation (4.1.11), we showed that this
estimator is unbiased. Find the variance of the estimator and its mgf.
Solution.
Since the Bernoulli distribution
(
1 P (Xi = aj ) = p(aj )
Ij (Xi ) =
0 P (Xi ̸= aj ) = 1 − p(aj )
2
4.1.7. The data set on Scottish schoolchildren discussed in Example 4.1.5 included the eye colors of the
children also. The frequencies of their eye colors are
Blue Light Medium Dark
2978 6697 7511 5175
Solution.
Since the sample size is n = 22, 361, the estimate of the pmf are
Blue Light Medium Dark
Count 2978 6697 7511 5175
pb(aj ) 0.133 0.299 0.336 0.231
4.1.8. Recall that for the parameter η = g(θ), the mle of η is g(θ),
b where θb is the mle of θ. Assuming that
the data in Example 4.1.6 were drawn from a Poisson distribution with mean λ, obtain the mle of λ and then
use it to obtain the mle of the pmf. Compare the mle of the pmf to the nonparametric estimate. Note: For
the domain value 6, obtain the mle of P (X ≥ 6)
Solution.
b = X. Hence the mle of the pmf is
By 4.1.3, mle of λ is λ
−x x
d =e x ,
p(x) x = 0, 1, 2, ....
x!
In this dataset, we obtain X = 64/30. So, P (X \ ≥ 6) = 1 - ppois(5, 64/30) = 0.0219, which is not so
different from the nonparametric estimate of the pmf at X ≥ 6 (0.033).
4.2.2.
Consider the data on the lifetimes of motors given in Exercise 4.1.1. Obtain a large sample 95% confidence
interval for the mean lifetime of a motor.
Solution.
By the following code, we obtain (54.95 147.35).
n = length(lifetimemotor$lifetime)
mean = mean(lifetimemotor\$lifetime)
sd = sd(lifetimemotor$lifetime)
c(mean - 1.96*sd/sqrt(n), mean + 1.96*sd/sqrt(n))}
Note: the answer key of the textbook (51.82, 150,48) should be wrong. This 95% CI is not a large sample
(approximate) CI but the exact CI using t0.025,19 .
4.2.3. Suppose we assume that X1 , X2 , . . . , Xn is a random sample from a Γ(1, θ) distribution.
3
Pn
(a) Show that the random variable (2/θ) i=1 Xi has a χ2 -distribution with 2n degrees of freedom.
Solution.
1 1
X ∼ Γ(1, θ) ⇔ MX (t) = , t< .
1 − θt θ
Pn
Let Y = (2/θ) i=1 Xi , then
1 1
MY (t) = [MX (2t/θ)]n = , t < ⇔ Y ∼ χ2 (2n).
(1 − 2t)n 2
(b) Using the random variable in part (a) as a pivot random variable, find a (1 − α)100% CI for θ.
Solution.
n
! Pn Pn !
2X 2 i=1 Xi 2 i=1 Xi
1−α=P χ2α/2,2n ≤ Xi ≤ χ21−α/2,2n =P ≤θ≤ .
θ i=1 χ21−α/2,2n χ2α/2,2n
Hence, a CI for θ is
" P #
n Pn
2 i=1 Xi 2 i=1 Xi
, .
χ21−α/2,2n χ2α/2,2n
(c) Obtain the confidence interval in part (b) for the data of Exercise 4.1.1 and compare it with the interval
you obtained in Exercise 4.2.2.
Solution.
Consider a 95% confidence interval to compare with the interval obtained in Exercise 4.2.2. The
following code gives us (68.18, 165.60):
sum.x = sum(lifetimemotor$lifetime)
c(2*sum.x/qchisq(0.975, 2*n), 2*sum.x/qchisq(0.025, 2*n))
which has a wider length than (51.82, 150,48) we obtained Exercise 4.2.2. However, this result would
be more reliable than the previous one because Y has exactly a χ2 (2n), while X has approximately a
normal distribution.
4.2.6. Let X be the mean of a random sample of size n from a distribution that is N (µ, 9). Find n such
that P (X − 1 < µ < X + 1) = 0.90, approximately.
Solution. n = (z0.05 σ)2 = [3(1.645)]2 = 24.35. Thus, n = 25 (round-up).
4.2.7. Let a random sample of size 17 from the normal distribution N (µ, σ 2 )yield x = 4.7 and s2 = 5.76.
Determine a 90% confidence interval for µ.
Solution.
" √ #
s 5.76
x ± t0.05,n−1 √ = 4.7 ± 1.746 √ = (3.68, 5.72).
n 17
4.2.8. Let X denote the mean of a random sample of size n from a distribution that has mean µ and variance
σ 2 = 10. Find n so that the probability is approximately 0.954 that the random interval (X − 12 , X + 12 )
includes µ.
Solution.
√ √ √
1 1 n n(X − µ) n
0.954 = P X− <µ<X+ ⇒ 1 − 0.046 = P − < <
2 2 2σ σ 2σ
4
√ √
Since z0.046/2 = z0.023 = 1.995, 1.995 = n/(2 10) ⇒ n = 159.3; take n = 160 (round-up).
4.2.10. Let X1 , X2 ,P..., Xn , Xn+1 be a random sample of size n + 1, n > 1, from a distribution that is
n Pn
N (µ, σ 2 ). Let X = 1 Xi /n and S 2 = 1 (Xi − X)2 /(n − 1). Find the constant c so that the statistic
c(X − Xn+1 )/S has a t-distribution. If n = 8, determine k such that P (X − kS < X9 < X + kS) = 0.80.
The observe interval x̄ − ks, x̄ + ks) is often called an 80% prediction interval for X9 .
Solution.
Since X ∼ N (µ, σ 2 /n) and X and Xn+1 are independent,
σ2
r
n X − Xn+1
X − Xn+1 ∼ N 0, + σ2 ⇒ ∼ N (0, 1).
n n+1 σ
Also, since we have X and S 2 are independent, X − Xn+1 and S 2 are also independent:
(n − 1)S 2
∼ χ2 (n − 1).
σ2
Hence,
p r
n/(n + 1)(X − Xn+1 )/σ n X − Xn+1
p = ∼ tn−1 ,
[(n − 1)S 2 /σ 2 ]/(n − 1) n+1 S
p
which gives us c = n/(n + 1).
If n = 8, since α = 0.2, tn−1,α/2 = t7,0.1 = 1.415. Hence,
r !
8 X − X9
0.80 = P −1.415 < < 1.415
9 S
r r !
9 9
=P X − 1.415 S < X9 < X + 1.415 S ,
8 8
√
so k = 1.415(3/ 8) = 1.50.
4.2.14. Let X denote the mean of a random sample of size 25 from a gamma-type distribution with α = 4
and β > 0. Use the Central Limit Theorem to find an approximate 0.954 confidence interval for µ, the mean
of the gamma distribution.
Solution.
Use the CLT to obtain X is approximately normal distributed N (4β, 4β 2 /25). Since z0.023 = 2,
X − 4β
0.954 = P −2 < <2
2β/5
5X
= P −2 < − 10 < 2
2β
5X
=P 8< < 12
2β
5X 5X
=P < 4β < .
6 4
Hence, an approximate 0.954 confidence interval for µ = 4β is
5X 5X
, .
6 4
Note that the answer key in the textbook is an approximate 0.954 CI for β, which is incorrect.
5
4.2.15. Let x be the observed mean of a random sample of size n from a distribution having mean µ and
known variance σ 2 . Find n so that x − σ/4 to x + σ/4 is an approximate 95% confidence interval for µ.
√
Solution. Since an approximate 95% CI for µ is [x ± 1.96σ/ n], n = [4(1.96)] = 61.5; take n = 62.
4.2.16. Assume a binomial model for a certain random variable. If we desire a 90% confidence interval for
p that is at most 0.02 in length, find n.
Solution.
Let pb denote the point estimate of p. Since z0.05 = 1.645, the length is
r r 2
pb(1 − pb) 1 3.29 1
2(1.645) ≤ 3.29 = 0.02 ⇒ n = ≈ 6465.
n 4n 0.02 4
4.2.17. It is known that a random variable X has a Poisson distribution with parameter µ. A sample of
200 observations from this distribution has a mean equal to 3.4. Construct an approximate 90% confidence
interval for µ.
Solution.
s s r r !
X − 1.645 X X 3.4 3.4
, X + 1.645 = 3.4 − 1.645 , 3.4 + 1.645 = (3.19, 3.61).
n n 200 200
4.2.19. Let X1 , X2 , ..., Xn be a random sample from a gamma distribution with known parameter α = 3 and
unknown β > 0. In Exercise 4.2.14, we obtained an approximate confidence interval for β (note: actually 4β)
based on the CentralPLimit Theorem. In this exercise obtain an exact confidence interval by first obtaining
n
the distribution of 2 1 Xi /β.
Solution.
Pn
As with Exercise 4.2.14, we have 2 1 Xi /β ∼ χ2 (6n). Hence,
n
!
2X
0.95 = P χ20.025,6n < Xi < χ20.975,6n
β 1
Pn Pn !
2 1 Xi 2 1 Xi
=P <β< 2 ,
χ20.975,6n χ0.025,6n
4.2.20. When 100 tacks were thrown on a table, 60 of them landed point up. Obtain a 95% confidence
interval for the probability that a tack of this type lands point up. Assume independence.
Solution.
Let pb denote the point estimate of p,
r ! r !
pb(1 − pb) 0.6(0.4)
pb ± z0.975 = 0.6 ± 1.96 = (0.504, 0.696).
n 100
6
4.2.21. Let two independent random samples, each of size 10, from two normal distributions N (µ1 , σ 2 ) and
N (µ2 , σ 2 ) yield x̄ = 4.8, s21 = 8.64, ȳ = 5.6, s22 = 7.88. Find a 95% confidence interval for µ1 − µ2 .
Solution.
Since the two variances are equal but unknown, we use the pooled estimator of σ 2 , which is given by
(n1 − 1)s21 + (n2 − 1)s22 9(8.64 + 7.88)
s2p = = = 8.26 ⇒ sp = 2.874.
n1 + n2 − 2 18
Thus a 95% confidence interval for µ1 − µ2 is
r r
1 1 1
(x̄ − ȳ) ± t0.025,18 (sp ) + = −0.8 ± 2.10(2.874) = (−3.50, 1.90).
n1 n2 5
Note: The answer in the textbook is incorrect, where z0.025 seems to be used instead of t0.025,18 by mistake.
4.2.22. Let two independent random variables, Y1 and Y2 , with binomial distributions that have parameters
n1 = n2 = 100, p1 , and p2 , respectively, be observed to be equal to y1 = 50 and y2 = 40. Determine an
approximate 90% confidence interval for p1 − p2 .
Solution.
Since the two point estimates are pb1 = 0.5 and pb2 = 0.4, the desired CI is
s
pb1 (1 − pb1 ) pb2 (1 − pb2 ) 0.7
pb1 − pb2 ± z0.05 + = 0.1 ± 1.645 = (−0.015, 0.215).
n1 n2 10
4.2.23. Discuss the problem of finding a confidence interval for the difference µ1 − µ2 between the two means
of two normal distributions if the variances σ12 and σ22 are known but not necessarily equal.
Solution.
When σ12 and σ22 are known, finding a confidence interval for the difference µ1 − µ2 is straightforward:
(X − Y ) − (µ1 − µ2 )
1 − α = P −zα/2 < q < zα/2
σ12 σ22
n + m
r r !
σ12 σ22 σ12 σ22
= P X − Y − zα/2 + < µ1 − µ2 < X − Y + zα/2 + .
n m n m
4.2.24. Discuss Exercise 4.2.23 when it is assumed that the variances are unknown and unequal. This is
a very difficult problem, and the discussion should point out exactly where the difficulty lies. If, however,
the variances are unknown but their ratio σ12 /σ22 is a known constant k, then a statistic that is a T random
variable can again be used. Why?
Solution.
When it is assumed that the variances are unknown and unequal, we cannot eliminate their unknown variances
in a T statistic. If we can assume σ12 = kσ22 instead of σ12 = σ22 , however, then,
2 k 1
X − Y ∼ N 0, σ2 +
n m
(n − 1)S12 (m − 1)S22 (n − 1)S12 /k + (m − 1)S22
+ = ∼ χ2 (n − m − 2),
σ12 σ22 σ22
which implies that we can eliminate σ22 in a T statistic. Accordingly, the pooled estimator of σ22 is given by
(n − 1)S12 /k + (m − 1)S22
s2p = .
n−m−2
7
4.2.26. Let X and Y be the means of two independent random samples, each of size n, from the respective
distributions N (µ1 , σ 2 ) and N (µ2 , σ 2 ), where the common variance is known. Find n such that
P (X̄ − Ȳ − σ/5 < µ1 − µ2 < X̄ − Ȳ + σ/5) = 0.90.
Solution.
0.90 = P (X̄ − Ȳ − σ/5 < µ1 − µ2 < X̄ − Ȳ + σ/5)
= P (−σ/5 < (X̄ − Ȳ ) − (µ1 − µ2 ) < σ/5)
√ √ !
n (X̄ − Ȳ ) − (µ1 − µ2 ) n
=P − √ < p < √ ,
5 2 σ 2/n 5 2
which gives
√
n √
√ = z0.05 = 1.645 ⇒ n = [5 2(1.645)]2 = 135.30.
5 2
Thus, n = 136 suffices.
4.4.7. Let f (x) = 61 , x = 1, 2, 3, 4, 5, 6, zero elsewhere, be the pmf of a distribution of the discrete type.
Show that the pmf of the smallest observation of a random sample of size 5 from this distribution is
5 5
7 − y1 6 − y1
g1 (y1 ) = − , y = 1, 2, 3, 4, 5, 6,
6 6
zero elsewhere. Note that in this exercise the random sample is from a distribution of the discrete type. All
formulas in the text were derived under the assumption that the random sample is from a distribution of the
continuous type and are not applicable. Why?
Solution.
Since the pmf of X is F (x) = x/6, x = 1, 2, ..., 6, the pmf of Y1 is
g1 (y1 ) = P (Y1 ≥ y1 ) − P (Y1 ≥ y1 + 1)
= P (Xi ≥ y1 , i = 1, ..., 5) − P (Xi ≥ y1 + 1, i = 1, ..., 5)
= [P (X ≥ y1 )]5 − [P (X ≥ y1 + 1)]5
= [1 − P (X ≤ y1 − 1)]5 − [1 − P (X ≤ y1 )]5
5
y1 − 1 y1 5
= 1− − 1−
6 6
5 5
7 − y1 6 − y1
= − y = 1, 2, ..., 6.
6 6
8
4.4.8. Let Y1 < Y2 < Y3 < Y4 < Y5 denote the order statistics of a random sample of size 5 from a
distribution having pdf f (x) = e−x , 0 < x < ∞, zero elsewhere. Show that Z1 = Y2 and Z2 = Y4 − Y2 are
independent.
Solution.
Since FX (x) = 1 − e−x , the joint pdf of Y2 and Y4 is
5!
fY2 ,Y4 (y2 , y4 ) = F (y2 )[F (y4 ) − F (y2 )][1 − F (y4 )]f (y2 )f (y4 )
1!1!1!
= 120(1 − e−y2 )(e−y2 − e−y4 )e−y2 e−2y4 .
The inverse functions are y2 = z1 and y4 = z1 + z2 , so the J = 1. Hence, the joint pdf of Z1 and Z2 is
which can be expressed as a product of a marginal function of Z1 and a marginal function of Z2 . Thus, Z1
and Z2 are independent.
4.4.9. Let Y1 < Y2 < · · · < Yn be the order statistics of a random sample of size n from a distribution with
pdf f (x) = 1, 0 < x < 1, zero elsewhere. Show that the kth order statistic Yk has a beta pdf with parameters
α = k and β = n − k + 1.
Solution.
n!
fYk (y) = y k−1 (1 − y)n−k
(k − 1)!(n − k)!
Γ(n + 1)
= y k−1 (1 − y)n−k .
Γ(k)Γ(n − k + 1)
the pdf of Y1 is
n!
fY1 (y1 ) = [1 − FX (y1 )]n−1 fX (y1 )
0!(n − 1)!
(n − 1)cxb+1 cxb+1
b
= n exp − cx exp −
b+1 b+1
b+1
ncx
= ncxb exp − ,
b+1
9
Solution.
4.4.13. Suppose a random sample of size 2 is obtained from a distribution that has pdf (x) = 2(1 − x), 0 <
x < 1, zero elsewhere. Compute the probability that one sample observation is at least twice as large as the
other.
Solution.
Let Y1 < Y2 be the order statistics of X1 , X2 .
Then
Z 1 Z y2 /2
7
P (Y2 ≥ 2Y1 ) = 8(1 − y1 )(1 − y2 )dy1 dy2 = · · · = .
0 0 12
4.4.14. Let Y1 < Y2 < Y3 denote the order statistics of a random sample of size 3 from a distribution with
pdf f (x) = 1, 0 < x < 1, zero elsewhere. Let Z = (Y1 + Y3 )/2 be the midrange of the sample. Find the pdf
of Z.
Solution.
fz,w (z, w) = fY1 ,Y3 (w, 2z − w)|J| = 24(z − w), 0 < w < 2z − w < 1.
and zero elsewhere. Note that the support of Z, then the pdf of Z is
R z
2
R0 24(z − w) = 12z
0 < z < 1/2
z
fz (z) = 24(z − w) = 12(1 − z)2 1/2 < z < 1 .
2z−1
0 otherwise
4.4.15. Let Y1 < Y2 denote the order statistics of a random sample of size 2 from N (0, σ 2 ).
√
(a) Show that E(Y1 ) = −σ/ π.
10
Solution.
Z ∞ Z ∞ Z ∞
E(Y1 ) = y1 f (y1 )dy1 = y1 f (y1 , y2 )dy2 dy1
−∞ −∞ y1
Z ∞ Z
y2
= y1 f (y1 , y2 )dy1 dy2
−∞ −∞
Z ∞ Z y2
y1 − y12 +y2 22
= 2
e 2σ dy1 dy2
−∞ −∞ πσ
Z ∞ y
1 − y12 +y2 22 2
= − e 2σ dy2
−∞ π −∞
Z ∞
1 − y222
=− e σ dy2
−∞ π
Z ∞ 2
σ 1 y2
= −√ p e− σ2 dy2
π −∞ 2π(σ 2 /2)
σ
= −√ .
π
σ2
Cov(Y1 , Y2 ) = E(Y1 Y2 ) − E(Y1 )E(Y2 ) = .
π
4.4.17. Let Y1 < Y2 < Y3 < Y4 be the order statistics of a random sample of size n = 4 from a distribution
with pdf f (x) = 2x, 0 < x < 1, zero elsewhere.
11
(a) Find the joint pdf of Y3 and Y4 .
(b) Find the conditional pdf of Y3 , given Y4 = y4 .
(c) Evaluate E(Y3 |y4 ).
Solution.
(a) fY3 ,Y4 (y3 , y4 ) = (4!/2!)F (y3 )2 f (y3 )f (y4 ) = 12(y32 )2 (2y3 )(2y4 ) = 48y35 y4 , 0 < y3 < y4 < 1.
(b) Since fy4 (y4 ) = 4y46 (2y4 ) = 8y47 ,
(c)
y4 y4
6y 5 6y36
Z Z
6
E(Y3 |y4 ) = y3 63 dy3 = 6 dy3 = y4 .
0 y4 0 y4 7
4.4.18. Two numbers are selected at random from the interval (0, 1). If these values are uniformly and
independently distributed, by cutting the interval at these numbers, compute the probability that the three
resulting line segments can form a triangle.
Solution.
Let X1 and X2 denote the two numbers that are U (0, 1) and Y1 < Y2 denote the order statistics. Then, the
joint pdf of Y1 and Y2 is
fY1 ,Y2 (y1 , y2 ) = (2!/0!0!0!)fX (y1 )fX (y2 ) = 2, 0 < y1 < y2 < 1.
The conditions under which three resulting line segments can form a triangle are
1
y1 < 1 − y1 ⇒ y1 < ,
2
1
y2 − y1 < 1 − (y2 − y1 ) ⇒ y2 − y1 < ,
2
1
y2 > 1 − y2 ⇒ y2 > ,
2
which is the support to compute the probability:
Z 1/2 Z y1 +1/2 Z 1/2
1
2dy2 dy1 = 2y1 dy1 = .
0 1/2 0 4
4.4.19. Let X and Y denote independent random variables with respective probability density functions
f (x) = 2x, 0 < x < 1, zero elsewhere, and g(y) = 3y 2 , 0 < y < 1, zero elsewhere. Let U = min(X, Y ) and
V = max(X, Y ). Find the joint pdf of U and V .
Solution.
Since X and Y are independent, we have the joint pdf of X and Y :
12
Then, the Jacobians are J1 = 1 and J2 = −1, respectively. Thus, the joint pdf of U and V is
4.4.22. Let Y1 < Y2 < · · · < Yn be the order statistics of a random sample of size n from the exponential
distribution with pdf f (x) = e−x , 0 < x < ∞, zero elsewhere.
(a) Show that Z1 = nY1 , Z2 = (n − 1)(Y2 − Y1 ), Z3 = (n − 2)(Y3 − Y2 ),..., Zn = Yn − Yn−1 are independent
and that each Zi has the exponential distribution.
Solution.
The inverse transformation is
z1 z1 z2 z1 z2 z3 z1 z2
y1 = , y2 = + , y3 = + + , . . . , yn = + + · · · + zn ,
n n n−1 n n−1 n−2 n n−1
which implits that J = 1/n!. By theorem 4.4.1, hence, the joint pdf of Zi ’s is
z1 z1 z2 z1 z2
fZ1 ,...,Zn (z1 , . . . , zn ) = fY1 ,...,Yn , + ,..., + + · · · + zn |J|
n n n−1 n n−1
z
1 z1 z2 z1 z2 1
= n!fX fX + · · · fX + + · · · + zn
n n n−1 n n−1 n!
= e−z1 −z2 −···−zn
= fX (z1 )fX (z2 ) · · · fX (zn ),
13
4.5 Introduction to Hypothesis Testing
4.5.1. Show that the approximate power function given in expression (4.5.12) of Example 4.5.3 is a strictly
increasing function of µ. Show then that the test discussed in this example has approximate size α for testing
H0 : µ ≤ µ0 versus H1 : µ > µ0 .
Solution.
Let ϕ(z) be a pdf of a standard normal random variable. The first derivative of γ(µ) with respect to µ,
√ √
n(µ0 − µ) n
γ ′ (µ) = ϕ −zα − >0
σ σ
because ϕ(x) > 0, n > 0, and σ > 0. Hence, γ(µ) is strictly increasing function of µ.
Then, under H0 : µ ≤ µ0 ,
4.5.3. Let X have a pdf of the form f (x; θ) = θxθ−1 , 0 < x < 1, zero elsewhere, where θ ∈ θ : θ = 1, 2. To
test the simple hypothesis H0 : θ = 1 against the alternative simple hypothesis H1 : θ = 2, use a random
sample X1 , X2 of size n = 2 and define the critical region to be C = {(x1 , x2 ) : 43 ≤ x1 x2 }. Find the power
function of the test.
Solution. Since X1 and X2 are independent, f (x1 , x2 ) = θ2 (x1 x2 )θ−1 . Hence the power function is
Z 1 Z 1 θ θ
3 2 θ−1 3 3 3
γC (θ) = Pθ X1 X2 ≥ = θ (x1 x2 ) dx2 dx1 = · · · = 1 − +θ log , θ = 1, 2.
4 3/4 3/(4x1 ) 4 4 4
4.5.4. Let X have a binomial distribution with the number of trials n = 10 and with p either 1/4 or 1/2.
The simple hypothesis 0 : p = 21 is rejected, and the alternative simple hypothesis H1 : p = 14 is accepted, if
the observed value of X1 , a random sample of size 1, is less than or equal to 3. Find the significance level
and the power of the test.
Solution.
4.5.5. Let X1 , X2 be a random sample of size n = 2 from the distribution having pdf f (x; θ) = (1/θ)e−x/θ ,
0 < x < ∞, zero elsewhere. We reject H0 : θ = 2 and accept H1 : θ = 1 if the observed values of X1 , X2 , say
x1 , x2 , are such that
Here Ω = {θ : θ = 1, 2}. Find the significance level of the test and the power of the test when H0 is false.
14
Solution.
f (x1 ; 2)f (x2 ; 2) 1 1 (x1 +x2 )/2 1
≤ ⇔ e ≤ ⇔ x1 + x2 ≤ 2 log 2.
f (x1 ; 1)f (x2 ; 1) 2 4 2
Also, we have X ∼ Γ(1, θ) ⇒ Y = X1 + X2 ∼ Γ(2, θ). Hence,
Z 2 log 2
1 −x/θ
Pθ (Y ≤ 2 log 2) = 2
xe dx
0 θ
Z 2 log 2
1 −x/θ
= [−xe −x/θ
/θ]20 log 2 + e dx
0 θ
2 log 2 2 log 2/θ
=− e + 1 − e−2 log 2/θ
θ
2 log 2 −2 log 2/θ
=1− 1+ e .
θ
Hence,
α = P2 (Y ≤ 2 log 2) = 1 − (1 + log 2)/2 = (1 − log 2)/2 ≈ 0.1534,
1 − β = P1 (Y ≤ 2 log 2) = 1 − (1 + 2 log 2)/4 = (3 − 2 log 2)/4 ≈ 0.403.
4.5.8. Let us say the life of a tire in miles, say X, is normally distributed with mean θ and standard deviation
5000. Past experience indicates that θ = 30, 000. The manufacturer claims that the tires made by a new
process have mean θ > 30, 000. It is possible that θ = 35, 000. Check his claim by testing H0 : θ = 30, 000
against H1 : θ > 30, 000. We observe n independent values of X, say x1 , ..., xn , and we reject H0 (thus
accept H1 ) if and only if x ≥ c. Determine n and c so that the power function γ(θ) of the test has the values
γ(30, 000) = 0.01 and γ(35, 000) = 0.98.
Solution.
We have two equations:
X − 30000 c − 30000 c − 30000
γ(30, 000) = 0.01 ⇒ P √ ≥ √ = 0.01 ⇒ √ = 2.326,
5000/ n 5000/ n 5000/ n
X − 35000 c − 35000 c − 35000
γ(35, 000) = 0.98 ⇒ P √ ≥ √ = 0.98 ⇒ √ = −2.054,
5000/ n 5000/ n 5000/ n
which gives us n ≈ 20 and c ≈ 32661.
4.5.11. Let Y1 < Y2 < Y3 < Y4 be the order statistics of a random sample of size n = 4 from a distribution
with pdf f (x; θ) = 1/θ, 0 < x < θ, zero elsewhere, where 0 < θ. The hypothesis H0 : θ = 1 is rejected and
H1 : θ > 1 is accepted if the observed Y4 ≥ c.
(a) Find the constant c so that the significance level is α = 0.05.
Solution.
4! 4y 3
fY4 (y4 ) = FX (y4 )3 fX (y4 ) = 44 , 0 < y4 < θ.
3!0! θ
Hence,
Z 1
α = 0.05 = Pθ=1 (Y4 ≥ c) = 4y43 dy4 = 1 − c4 ⇒ c = (0.95)1/4 = 0.9872.
c
15
4.5.12. Let X1 , X2 , ..., X8 be a random sample of size n = 8 from a Poisson distribution
P8 with mean µ. Reject
the simple null hypothesis H0 : µ = 0.5 and accept H1 : µ > 0.5 if the observed i=1 xi ≥ 8.
(a) Show that the significance level is 1-ppois(7,8*.5).
Solution.
P8
Since Y = i=1 Xi ∼ Poisson(8µ), α = P0.5 (Y ≥ 8) = 1 − P0.5 (Y ≤ 7) = 1-ppois(7,8*.5) = 0.051.
(b) Use R to determine γ(0.75), γ(1), and γ(1.25).
Solution.
(c) Modify the code in Exercise 4.5.9 to obtain a plot of the power function.
Solution. Skipped.
4.5.13. Let p denote the probability that, for a particular tennis player, the first serve is good. Since
p = 0.40, this player decided to take lessons in order to increase p. When the lessons are completed, the
hypothesis H0 : p = 0.40 is tested against H1 : p > 0.40 based on n = 25 trials. Let Y equal the number of
first serves that are good, and let the critical region be defined by C = {Y : Y ≥ 13}.
(a) Show that α is computed by α = 1 - pbinom(12 , 25, .4).
Solution. α = Pp=0.40 (Y ≥ 13) = 1 − P (Y < 12|p = 0.4) = 1 - pbinom(12 , 25, .4) = 0.154.
(b) Find β = P (Y < 13) when p = 0.60; that is, β = P (Y ≤ 12; p = 0.60) so that 1 − β is the power at
p = 0.60.
Solution. β = Pp=0.6 (Y < 13) = pbinom(12 , 25, .6) = 0.154 ⇒ 1 − β = 0.846.
because of ϕ(z) = ϕ(−z), or ϕ(z)’s symmetry. Also we have the further from the origin z, the smaller ϕ(z)
is. So, if µ < µ0 ,
√ √
n(µ0 − µ) n(µ0 − µ)
zα/2 + > zα/2 −
σ√ σ √
n(µ0 − µ) n(µ0 − µ)
⇒ ϕ zα/2 + < ϕ zα/2 −
σ σ
⇒ γ ′ (µ) < 0,
16
Solution.
√
Since n(X − µ0 )/S ∼ tn−1 , if X > µ0 ,
√ √
n(X − µ0 ) n(X − µ0 )
P ≥ tα/2,n−1 = 2P ≥ tα/2,n−1 = 2(α/2) = α.
S S
If X < µ0 ,
√ √
n(X − µ0 ) n(X − µ0 )
P ≥ tα/2,n−1 = 2P ≤ −tα/2,n−1 = 2(α/2) = α.
S S
4.6.8. Let p equal the proportion of drivers who use a seat belt in a country that does not have a mandatory
seat belt law. It was claimed that p = 0.14. An advertising campaign was conducted to increase this
proportion. Two months after the campaign, y = 104 out of a random sample of n = 590 drivers were
wearing their seat belts. Was the campaign successful?
(a) Define the null and alternative hypotheses.
Solution. H0 : p = 0.14 versus HA : p > 0.14.
(b) Define a critical region with an α = 0.01 significance level.
Solution.
Let pb = 104/590 = 0.176. Then a critical region is
pb − p
Z=p > zα = 2.326.
p(1 − p)/n
Since p < α = 0.01 (or Z = 2.52 > 2.326), H0 is rejected; there is sufficient evidence to show that the
campaign was successful.
4.6.9. In Exercise 4.2.18 we found a confidence interval for the variance σ 2 using the variance S2 of a
random sample of size n arising from N (µ, σ 2 ), where the mean µ is unknown. In testing H0 : σ 2 = σ02
against H1 : σ 2 > σ02 , use the critical region defined by (n − 1)S 2 /σ02 ≥ c. That is, reject H0 and accept H1
if S 2 ≥ cσ02 /(n − 1). If n = 13 and the significance level α = 0.025, determine c.
Solution. Since (n − 1)S 2 /σ02 ∼ χ2n−1 = χ212 , c = qchisq(0.975, 12) = 23.337.
4.6.10. In Exercise 4.2.27, in finding a confidence interval for the ratio of the variances of two normal
distributions, we used a statistic S12 /S22 , which has an F -distribution when those two variances are equal. If
we denote that statistic by F, we can test H0 : σ12 = σ22 against H1 : σ12 > σ22 using the critical region F ≥ c.
If n = 13, m = 11, and α = 0.05, find c.
Solution. Since F ∼ F12,10 , c = qf(0.95, 12, 10) = 2.913.
17
Use the following R code to obtain p = 0.01837:
x = c(20, 30, 92, 105); ps = c(1, 3, 5, 7)/16; chisq.test(x, p = ps)
4.7.2. A number is to be selected from the interval {x : 0 < x < 2} by a random process. Let Ai = {x :
(i − 1)/2 < x ≤ i/2}, i = 1, 2, 3, and let A4 = {x : 23 < x < 2}. RFor i = 1, 2, 3, 4, suppose a certain hypothesis
assigns probabilities pi0 to these sets in accordance with pi0 = Ai ( 21 )(2 − x)dx, i = 1, 2, 3, 4. This hypothesis
(concerning the multinomial pdf with k = 4) is to be tested at the 5% level of significance by a chi-square
test. If the observed frequencies of the sets Ai , i = 1, 2, 3, 4, are respectively, 30, 30, 10, 10, would H0 be
accepted at the (approximate) 5% level of significance? Use R code similar to that of Example 4.7.2 for the
computation.
Solution.
Since
i/2
x2
Z
1 i/2 9 i
pi0 = (2 − x)dx = x − = − ,
(i−1)/2 2 4 (i−1)/2 16 8
p10 = 7/16, p20 = 5/16, p30 = 3/16, p40 = 1/16. Hence, use the following R code:
x = c(30, 30, 10, 10); ps = c(7, 5, 3, 1)/16; chisq.test(x, p = ps)
to obtain χ2 statistic = 8.38 and p = 0.03816 < 0.05; H0 is rejected, which means that the observations have
a lack of fit to the assignsed probabilities.
4.7.3. Define the sets A1 = {x : −∞ < x ≤ 0}, Ai = {x : i − 2 < x ≤ i − 1}, i = 2, ..., 7, and
A8 = {x : 6 < x < ∞}. A certain hypothesis assigns probabilities pi0 to these sets Ai in accordance with
(x − 3)2
Z
1
pi0 = √ exp − dx, i = 1, 2, ..., 7, 8.
Ai 2 2π 2(4)
This hypothesis (concerning the multinomial pdf with k = 8) is to be tested, at the 5% level of significance,
by a chi-square test. If the observed frequencies of the sets Ai , i = 1, 2, ..., 8, are, respectively, 60, 96, 140,
210, 172, 160, 88, and 74, would H0 be accepted at the (approximate) 5% level of significance? Use R
code similar to that discussed in Example 4.7.2. The probabilities are easily computed in R; for example,
p30 = pnorm(2,3,2) - pnorm(1,3,2).
Solution.
Use the R code below:
x = c(60, 96, 140, 210, 172, 160, 88, 74)
p1 = pnorm(0,3,2)
p2 = pnorm(1,3,2) - pnorm(0,3,2)
p3 = pnorm(2,3,2) - pnorm(1,3,2)
p4 = pnorm(3,3,2) - pnorm(2,3,2)
p5 = pnorm(4,3,2) - pnorm(3,3,2)
p6 = pnorm(5,3,2) - pnorm(4,3,2)
p7 = pnorm(6,3,2) - pnorm(5,3,2)
p8 = 1 - pnorm(6,3,2)
ps = c(p1, p2, p3, p4, p5, p6, p7, p8)
chisq.test(x, p = ps)}
18
If we use a chi-square test, for what values of b would the hypothesis that the die is unbiased be rejected at
the 0.025 significance level?
Solution.
Under the null hypothesis that a die is unbiased, pi0 = 1/6 for ∀i, so npi0 = 120(1/6) = 20. Hence the test
statistic is
(b − 20)2 [(40 − b) − 20]2 (b − 20)2
+ = .
20 20 10
Since χ20.025,5 = qchisq(0.975, 5) = 12.83, the null is rejected if
(b − 20)2
> 12.83 ⇒ b − 20 > 11.33 or b − 20 < −11.33 ⇒ b > 31.33 or b < 8.67.
10
If b is an integer, then b ≤ 8 or b ≥ 32.
4.7.5. Consider the problem from genetics of crossing two types of peas. The Mendelian theory states that
the probabilities of the classifications (a) round and yellow, (b) wrinkled and yellow, (c) round and green,
9 3 3 1
and (d) wrinkled and green are 16 , 16 , 16 , and 16 , respectively. If, from 160 independent observations, the
observed frequencies of these respective classifications are 86, 35, 26, and 13, are these data consistent with
9
the Mendelian theory? That is, test, with α = 0.01, the hypothesis that the respective probabilities are 16 ,
3 3 1
16 , 16 , and 16 .
Solution.
This is a table to compute chi-square test statistic:
(a) (b) (c) (d)
Observed 86 35 26 13
Expected 90 30 30 10
The test statistic is
42 52 42 32
X2 = + + + = 2.44.
90 30 30 10
Since X 2 < χ20.01,3 = qchisq(0.99, 3) = 11.34, the null is not rejected; these data would be consistent with
the Mendelian theory, as there is insufficient evidence to show that they are different from the theory.
4.7.6. Two different teaching procedures were used on two different groups of students. Each group contained
100 students of about the same ability. At the end of the term, an evaluating team assigned a letter grade
to each student. The results were tabulated as follows.
Group A B C D F Total
I 15 25 32 17 11 100
II 9 18 29 28 16 100
If we consider these data to be independent observations from two respective multinomial distributions with
k = 5, test at the 5% significance level the hypothesis that the two distributions are the same (and hence
the two teaching procedures are equally effective). For computation in R, use r1=c(15,25,32,17,11);
r2=c(9,18,29,28,16); mat=rbind(r1,r2); chisq.test(mat)
Solution.
This is a χ2 test for independence. The R code shows p = 0.1711 > 0.05; the hypothesis is not rejected; there
is insufficient evidence to show that the two teaching procedures are differently effective.
4.7.8. Let the result of a random experiment be classified as one of the mutually exclusive and exhaustive
ways A1 , A2 , A3 and also as one of the mutually exhaustive ways B1 , B2 , B3 , B4 . Say that 180 independent
trials of the experiment result in the following frequencies:
19
B1 B2 B3 B4 Total
A1 15 − 3k 15 − k 15 + k 15 + 3k 60
A1 15 15 15 15 60
A3 15 + 3k 15 + k 15 − k 15 − 3k 60
Total 45 45 45 45 180
where k is one of the integers 0, 1, 2, 3, 4, 5. What is the smallest value of k that leads to the rejection of
the independence of the A attribute and the B attribute at the α = 0.05 significance level?
Solution.
The expected values are all 45(60)/80 = 15. Hence the chi-square statistic is
In this case, the degrees of freedom is (3 − 1)(4 − 1) = 6. Since qchisq(0.95, 6) = 12.6, the null hypothesis
of the independence A and B is rejected if
8 2
k > 12.6 ⇒ k > 2.17,
3
which gives us the smallest value of integer k = 3.
4.7.9. It is proposed to fit the Poisson distribution to the following data:
x 0 1 2 3 3<x
Frequency 20 40 16 18 6
(a) Compute the corresponding chi-square goodness-of-fit statistic.
Solution.
The mean of the Poisson distribution is computed as
0(20) + 1(40) + 2(16) + 3(18) + 4(6)
= 1.5.
20 + 40 + 16 + 18 + 6
Hence,
Then the following code shows the chi-square goodness-of-fit statistic is 7.23:
x = c(20, 40, 16, 18, 6)
ps = c(dpois(0, 1.5), dpois(1, 1.5), dpois(2, 1.5), dpois(3, 1.5), 1-ppois(3, 1.5))
chisq.test(x, p = ps)
(b) How many degrees of freedom are associated with this chi-square?
Solution. Since this χ2 test is a hi-square goodness-of-fit, the degrees of freedom is 5 − 1 = 4.
(c) Do these data result in the rejection of the Poisson model at the α = 0.05 significance level?
Solution.
The critical values is qchisq(0.95, 4) = 9.49. The statistic obtained in part (a) is less than 9.49.
Thus, the Poisson model is not rejected at 5% level; we were not able to show that the observed data
do not fit the Poisson distribution.
20
4.9 Bootstrap Procedures
4.9.8. Consider the data of Example 4.9.2. The two-sample t-test of Example 4.6.2 can be used to test these
hypotheses. The test is not exact here (why?), but it is an approximate test. Show that the value of the test
statistic is t = 0.93, with an approximate p-value of 0.18.
Solution.
The reason that the test is not exact here is that the distribution is contaminated. By the Example 4.9.2, we
have the two sample variances s2x = 20.4072 and s2y = 18.5852 . Hence, the pooled variance estimator of σ 2 is
4.9.12. For the situation described in Example 4.9.3, show that the value of the one-sample t-test is t = 0.84
and its associated p-value is 0.20.
Solution.
Conduct one-sided one-sample t-test for testing H0 : µ = 90 versus HA : µ > 90.
X <- c(119.7, 104.1, 92.8, 85.4, 108.6, 93.4, 67.1, 88.4, 101.0, 97.2,
95.4, 77.2, 100.0, 114.2, 150.3, 102.3, 105.8, 107.5, 0.9, 94.1)
t.test(X, mu = 90, alternative = ’greater’)
21