0% found this document useful (0 votes)
290 views21 pages

Ch4 Sol

Uploaded by

Nadeem Noman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
290 views21 pages

Ch4 Sol

Uploaded by

Nadeem Noman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Exercises in Introduction to Mathematical Statistics (Ch.

4)

Tomoki Okuno

October 29, 2022

Note
• Not all solutions are provided: Exercises that are too simple or not very important to me are skipped.
• Texts in red are just attentions to me. Please ignore them.

4 Some Elementary Statistical Inferences


4.1 Sampling and Statistics
4.1.1. Twenty motors were put on test under a high-temperature setting. The lifetimes in hours of the
motors under these conditions are given below. Also, the data are in the file lifetimemotor.rda at the site
listed in the Preface. Suppose we assume that the lifetime of a motor under these conditions, X, has a Γ(1, θ)
distribution.
1 4 5 21 22 28 40 42 51 53
58 67 95 124 124 160 202 260 303 363
(a) Obtain a histogram of the data and overlay it with a density estimate, using the code hist(x,pr=T);
lines(density(x)) where the R vector x contains the data. Based on this plot, do you think that the
Γ(1, θ) model is credible?
Solution.
load(".../lifetimemotor.rda")
hist(lifetimemotor$lifetime, pr = T)
lines(density(lifetimemotor$lifetime))}

We see that Γ(1, θ) model is credible.


(b) Assuming a Γ(1, θ) model, obtain the maximum likelihood estimate θb of θ and locate it on your
histogram. Next overlay the pdf of a Γ(1, θ)
b distribution on the histogram. Use the R function
dgamma(x,shape=1,scale=θ)b to evaluate the pdf.

Solution.
P
1 −x/θ xi
f (x) = e , x > 0 ⇒ ℓ(θ) = −n log θ − i
θ P θ
P
′ n i xi n(x − θ) ′′ n 2 i xi
ℓ (θ) = − + = , ℓ (θ) = 2 − .
θ θ2 θ2 θ θ3

Solving ℓ′ (θ) = 0, we obtain θb = X = 101.15. Note that ℓ′′ (θ)


b = ℓ′′ (X) < 0. We can overlay the pdf
using the following code:
theta.mle = mean(lifetimemotor$lifetime)
x = seq(0, 400)

1
hist(lifetimemotor$lifetime, pr = T)
lines(dgamma(x, shape = 1, scale = theta.mle))},

which implies that the pdf of the Γ(1, θ) has a good fit to the data (histogram).
(c) Obtain the sample median of the data, which is an estimate of the median lifetime of a motor. What
parameter is it estimating (i.e., determine the median of X)?
Solution.
The median is median(lifetimemotor$lifetime) = 55.5. Since the cdf of X is given by
Z x
1 −t/θ
FX (x) = e dt = 1 − e−x/θ , x > 0,
0 θ

the median of X is a solution of FX (xm ) = 21 , that is, xm = θ log 2.


(d) Based on the mle, what is another estimate of the median of X?
Solution. theta.mle * log(2) = 70.11.
4.1.3. Suppose the number of customers X that enter a store between the hours 9:00 a.m. and 10:00 a.m.
follows a Poisson distribution with parameter θ. Suppose a random sample of the number of customers that
enter the store between 9:00 a.m. and 10:00 a.m. for 10 days results in the values
9 7 9 15 10 13 11 7 2 12
(a) Determine the maximum likelihood estimate of θ. Show that it is an unbiased estimator.
Solution.
e−θ θx X
f (x) = , x = 0, 1, 2, ... ⇒ ℓ(θ) = −nθ − xi log θ + n log xi !
x! i
P P
′ i xi ′′ xi
ℓ (θ) = −n + , ℓ (θ) = − i2 < 0.
θ θ

Solving ℓ′ (θ) = 0, we obtain θb = X. Since E(θ)


b = E(X) = E(X) = θ, it is unbiased for θ.

(b) Based on these data, obtain the realization of your estimator in part (a). Explain the meaning of this
estimate in terms of the number of customers.
Solution.
mean(c(9, 7, 9, 15, 10, 13, 11, 7, 2, 12)) = 9.5, which is interpreted as an estimated number
of customers that enter the store between 9.00 a.m. and 10:00 a.m based on these data.
4.1.6. Consider the estimator of the pmf in expression (4.1.10). In equation (4.1.11), we showed that this
estimator is unbiased. Find the variance of the estimator and its mgf.
Solution.
Since the Bernoulli distribution
(
1 P (Xi = aj ) = p(aj )
Ij (Xi ) =
0 P (Xi ̸= aj ) = 1 − p(aj )

is independent of each other, the variance and the mgf are


Pn
Var[Ij (Xi )] p(aj )[1 − p(aj )]
Var[bp(aj )] = i=1 2 =
n n
MIj (t) = E(E tIj ) = et p(aj ) + [1 − p(aj )].

2
4.1.7. The data set on Scottish schoolchildren discussed in Example 4.1.5 included the eye colors of the
children also. The frequencies of their eye colors are
Blue Light Medium Dark
2978 6697 7511 5175
Solution.
Since the sample size is n = 22, 361, the estimate of the pmf are
Blue Light Medium Dark
Count 2978 6697 7511 5175
pb(aj ) 0.133 0.299 0.336 0.231

4.1.8. Recall that for the parameter η = g(θ), the mle of η is g(θ),
b where θb is the mle of θ. Assuming that
the data in Example 4.1.6 were drawn from a Poisson distribution with mean λ, obtain the mle of λ and then
use it to obtain the mle of the pmf. Compare the mle of the pmf to the nonparametric estimate. Note: For
the domain value 6, obtain the mle of P (X ≥ 6)
Solution.
b = X. Hence the mle of the pmf is
By 4.1.3, mle of λ is λ
−x x
d =e x ,
p(x) x = 0, 1, 2, ....
x!

In this dataset, we obtain X = 64/30. So, P (X \ ≥ 6) = 1 - ppois(5, 64/30) = 0.0219, which is not so
different from the nonparametric estimate of the pmf at X ≥ 6 (0.033).

4.2 Confidence Intervals


4.2.1. Let the observed value of the mean X and of the sample variance of a random sample of size 20 from a
distribution that is N (µ, σ 2 ) be 81.2 and 26.5, respectively. Find respectively 90%, 95% and 99% confidence
intervals for µ. Note how the lengths of the confidence intervals increase as the confidence increases.
Solution.
n = 20 < 25, which implies that CLT may not be applied. Since the upper critical values are t0.05,19 = 1.73,
t0.025,19 = 2.09 and t0.005,19 = 2.86, respectively, the desired CIs are

r
2
r (79.21, 83.19) α = 0.1
s 26.5 
X ± tα/2,19 = 81.2 ± tα/2,19 = (78.79, 83.61) α = 0.05 .
n 20 
(77.91, 84.49) α = 0.01

4.2.2.
Consider the data on the lifetimes of motors given in Exercise 4.1.1. Obtain a large sample 95% confidence
interval for the mean lifetime of a motor.
Solution.
By the following code, we obtain (54.95 147.35).
n = length(lifetimemotor$lifetime)
mean = mean(lifetimemotor\$lifetime)
sd = sd(lifetimemotor$lifetime)
c(mean - 1.96*sd/sqrt(n), mean + 1.96*sd/sqrt(n))}
Note: the answer key of the textbook (51.82, 150,48) should be wrong. This 95% CI is not a large sample
(approximate) CI but the exact CI using t0.025,19 .
4.2.3. Suppose we assume that X1 , X2 , . . . , Xn is a random sample from a Γ(1, θ) distribution.

3
Pn
(a) Show that the random variable (2/θ) i=1 Xi has a χ2 -distribution with 2n degrees of freedom.
Solution.
1 1
X ∼ Γ(1, θ) ⇔ MX (t) = , t< .
1 − θt θ
Pn
Let Y = (2/θ) i=1 Xi , then
1 1
MY (t) = [MX (2t/θ)]n = , t < ⇔ Y ∼ χ2 (2n).
(1 − 2t)n 2

(b) Using the random variable in part (a) as a pivot random variable, find a (1 − α)100% CI for θ.
Solution.
n
! Pn Pn !
2X 2 i=1 Xi 2 i=1 Xi
1−α=P χ2α/2,2n ≤ Xi ≤ χ21−α/2,2n =P ≤θ≤ .
θ i=1 χ21−α/2,2n χ2α/2,2n

Hence, a CI for θ is
" P #
n Pn
2 i=1 Xi 2 i=1 Xi
, .
χ21−α/2,2n χ2α/2,2n

(c) Obtain the confidence interval in part (b) for the data of Exercise 4.1.1 and compare it with the interval
you obtained in Exercise 4.2.2.
Solution.
Consider a 95% confidence interval to compare with the interval obtained in Exercise 4.2.2. The
following code gives us (68.18, 165.60):
sum.x = sum(lifetimemotor$lifetime)
c(2*sum.x/qchisq(0.975, 2*n), 2*sum.x/qchisq(0.025, 2*n))
which has a wider length than (51.82, 150,48) we obtained Exercise 4.2.2. However, this result would
be more reliable than the previous one because Y has exactly a χ2 (2n), while X has approximately a
normal distribution.
4.2.6. Let X be the mean of a random sample of size n from a distribution that is N (µ, 9). Find n such
that P (X − 1 < µ < X + 1) = 0.90, approximately.
Solution. n = (z0.05 σ)2 = [3(1.645)]2 = 24.35. Thus, n = 25 (round-up).
4.2.7. Let a random sample of size 17 from the normal distribution N (µ, σ 2 )yield x = 4.7 and s2 = 5.76.
Determine a 90% confidence interval for µ.
Solution.
  " √ #
s 5.76
x ± t0.05,n−1 √ = 4.7 ± 1.746 √ = (3.68, 5.72).
n 17

4.2.8. Let X denote the mean of a random sample of size n from a distribution that has mean µ and variance
σ 2 = 10. Find n so that the probability is approximately 0.954 that the random interval (X − 12 , X + 12 )
includes µ.
Solution.
   √ √ √ 
1 1 n n(X − µ) n
0.954 = P X− <µ<X+ ⇒ 1 − 0.046 = P − < <
2 2 2σ σ 2σ

4
√ √
Since z0.046/2 = z0.023 = 1.995, 1.995 = n/(2 10) ⇒ n = 159.3; take n = 160 (round-up).
4.2.10. Let X1 , X2 ,P..., Xn , Xn+1 be a random sample of size n + 1, n > 1, from a distribution that is
n Pn
N (µ, σ 2 ). Let X = 1 Xi /n and S 2 = 1 (Xi − X)2 /(n − 1). Find the constant c so that the statistic
c(X − Xn+1 )/S has a t-distribution. If n = 8, determine k such that P (X − kS < X9 < X + kS) = 0.80.
The observe interval x̄ − ks, x̄ + ks) is often called an 80% prediction interval for X9 .
Solution.
Since X ∼ N (µ, σ 2 /n) and X and Xn+1 are independent,

σ2
  r  
n X − Xn+1
X − Xn+1 ∼ N 0, + σ2 ⇒ ∼ N (0, 1).
n n+1 σ

Also, since we have X and S 2 are independent, X − Xn+1 and S 2 are also independent:

(n − 1)S 2
∼ χ2 (n − 1).
σ2
Hence,
p r  
n/(n + 1)(X − Xn+1 )/σ n X − Xn+1
p = ∼ tn−1 ,
[(n − 1)S 2 /σ 2 ]/(n − 1) n+1 S
p
which gives us c = n/(n + 1).
If n = 8, since α = 0.2, tn−1,α/2 = t7,0.1 = 1.415. Hence,
r   !
8 X − X9
0.80 = P −1.415 < < 1.415
9 S
r r !
9 9
=P X − 1.415 S < X9 < X + 1.415 S ,
8 8

so k = 1.415(3/ 8) = 1.50.
4.2.14. Let X denote the mean of a random sample of size 25 from a gamma-type distribution with α = 4
and β > 0. Use the Central Limit Theorem to find an approximate 0.954 confidence interval for µ, the mean
of the gamma distribution.
Solution.
Use the CLT to obtain X is approximately normal distributed N (4β, 4β 2 /25). Since z0.023 = 2,
 
X − 4β
0.954 = P −2 < <2
2β/5
 
5X
= P −2 < − 10 < 2

 
5X
=P 8< < 12

 
5X 5X
=P < 4β < .
6 4
Hence, an approximate 0.954 confidence interval for µ = 4β is
 
5X 5X
, .
6 4
Note that the answer key in the textbook is an approximate 0.954 CI for β, which is incorrect.

5
4.2.15. Let x be the observed mean of a random sample of size n from a distribution having mean µ and
known variance σ 2 . Find n so that x − σ/4 to x + σ/4 is an approximate 95% confidence interval for µ.

Solution. Since an approximate 95% CI for µ is [x ± 1.96σ/ n], n = [4(1.96)] = 61.5; take n = 62.
4.2.16. Assume a binomial model for a certain random variable. If we desire a 90% confidence interval for
p that is at most 0.02 in length, find n.
Solution.
Let pb denote the point estimate of p. Since z0.05 = 1.645, the length is
r r  2
pb(1 − pb) 1 3.29 1
2(1.645) ≤ 3.29 = 0.02 ⇒ n = ≈ 6465.
n 4n 0.02 4

4.2.17. It is known that a random variable X has a Poisson distribution with parameter µ. A sample of
200 observations from this distribution has a mean equal to 3.4. Construct an approximate 90% confidence
interval for µ.
Solution.
 s s  r r !
X − 1.645 X X 3.4 3.4
, X + 1.645 = 3.4 − 1.645 , 3.4 + 1.645 = (3.19, 3.61).
n n 200 200

4.2.18. Skipped, but for (c), use the fact that


n  2 Xn
X Xi − µ (Xi − µ)2
= 2
∼ χ2 (n).
1
σ 1
σ

4.2.19. Let X1 , X2 , ..., Xn be a random sample from a gamma distribution with known parameter α = 3 and
unknown β > 0. In Exercise 4.2.14, we obtained an approximate confidence interval for β (note: actually 4β)
based on the CentralPLimit Theorem. In this exercise obtain an exact confidence interval by first obtaining
n
the distribution of 2 1 Xi /β.
Solution.
Pn
As with Exercise 4.2.14, we have 2 1 Xi /β ∼ χ2 (6n). Hence,
n
!
2X
0.95 = P χ20.025,6n < Xi < χ20.975,6n
β 1
Pn Pn !
2 1 Xi 2 1 Xi
=P <β< 2 ,
χ20.975,6n χ0.025,6n

which means that the exact 95% CI for β is


" P #
n Pn
2 1 Xi 2 1 Xi
, .
χ20.975,6n χ20.025,6n

4.2.20. When 100 tacks were thrown on a table, 60 of them landed point up. Obtain a 95% confidence
interval for the probability that a tack of this type lands point up. Assume independence.
Solution.
Let pb denote the point estimate of p,
r ! r !
pb(1 − pb) 0.6(0.4)
pb ± z0.975 = 0.6 ± 1.96 = (0.504, 0.696).
n 100

6
4.2.21. Let two independent random samples, each of size 10, from two normal distributions N (µ1 , σ 2 ) and
N (µ2 , σ 2 ) yield x̄ = 4.8, s21 = 8.64, ȳ = 5.6, s22 = 7.88. Find a 95% confidence interval for µ1 − µ2 .
Solution.
Since the two variances are equal but unknown, we use the pooled estimator of σ 2 , which is given by
(n1 − 1)s21 + (n2 − 1)s22 9(8.64 + 7.88)
s2p = = = 8.26 ⇒ sp = 2.874.
n1 + n2 − 2 18
Thus a 95% confidence interval for µ1 − µ2 is
r r
1 1 1
(x̄ − ȳ) ± t0.025,18 (sp ) + = −0.8 ± 2.10(2.874) = (−3.50, 1.90).
n1 n2 5
Note: The answer in the textbook is incorrect, where z0.025 seems to be used instead of t0.025,18 by mistake.
4.2.22. Let two independent random variables, Y1 and Y2 , with binomial distributions that have parameters
n1 = n2 = 100, p1 , and p2 , respectively, be observed to be equal to y1 = 50 and y2 = 40. Determine an
approximate 90% confidence interval for p1 − p2 .
Solution.
Since the two point estimates are pb1 = 0.5 and pb2 = 0.4, the desired CI is
s
pb1 (1 − pb1 ) pb2 (1 − pb2 ) 0.7
pb1 − pb2 ± z0.05 + = 0.1 ± 1.645 = (−0.015, 0.215).
n1 n2 10

4.2.23. Discuss the problem of finding a confidence interval for the difference µ1 − µ2 between the two means
of two normal distributions if the variances σ12 and σ22 are known but not necessarily equal.
Solution.
When σ12 and σ22 are known, finding a confidence interval for the difference µ1 − µ2 is straightforward:
 
(X − Y ) − (µ1 − µ2 )
1 − α = P −zα/2 < q < zα/2 
σ12 σ22
n + m
r r !
σ12 σ22 σ12 σ22
= P X − Y − zα/2 + < µ1 − µ2 < X − Y + zα/2 + .
n m n m

4.2.24. Discuss Exercise 4.2.23 when it is assumed that the variances are unknown and unequal. This is
a very difficult problem, and the discussion should point out exactly where the difficulty lies. If, however,
the variances are unknown but their ratio σ12 /σ22 is a known constant k, then a statistic that is a T random
variable can again be used. Why?
Solution.
When it is assumed that the variances are unknown and unequal, we cannot eliminate their unknown variances
in a T statistic. If we can assume σ12 = kσ22 instead of σ12 = σ22 , however, then,
  
2 k 1
X − Y ∼ N 0, σ2 +
n m
(n − 1)S12 (m − 1)S22 (n − 1)S12 /k + (m − 1)S22
+ = ∼ χ2 (n − m − 2),
σ12 σ22 σ22
which implies that we can eliminate σ22 in a T statistic. Accordingly, the pooled estimator of σ22 is given by
(n − 1)S12 /k + (m − 1)S22
s2p = .
n−m−2

7
4.2.26. Let X and Y be the means of two independent random samples, each of size n, from the respective
distributions N (µ1 , σ 2 ) and N (µ2 , σ 2 ), where the common variance is known. Find n such that
P (X̄ − Ȳ − σ/5 < µ1 − µ2 < X̄ − Ȳ + σ/5) = 0.90.

Solution.
0.90 = P (X̄ − Ȳ − σ/5 < µ1 − µ2 < X̄ − Ȳ + σ/5)
= P (−σ/5 < (X̄ − Ȳ ) − (µ1 − µ2 ) < σ/5)
√ √ !
n (X̄ − Ȳ ) − (µ1 − µ2 ) n
=P − √ < p < √ ,
5 2 σ 2/n 5 2
which gives

n √
√ = z0.05 = 1.645 ⇒ n = [5 2(1.645)]2 = 135.30.
5 2
Thus, n = 136 suffices.

4.4 Order Statistics


4.4.5. Le Y1 < Y2 < Y3 < Y4 be the order statistics of a random sample of size 4 from the distribution
having pdf f (x) = e−x , 0 < x < ∞, zero elsewhere. Find P (Y4 ≥ 3).
Solution.
Since the cdf F (x) = 1 − e−x , x > 0,
(
4! 4(1 − e−y )3 e−y , 0 < y < ∞
fY4 (y) = F (x)3 f (x) =
(4 − 1)!(4 − 4)! 0 elsewhere.
Thus,
Z ∞ Z ∞
P (Y4 ≥ 3) = fY4 (y)dy = 4(1 − e−y )3 e−y dy = [(1 − e−y )4 ]∞
3 = 1 − (1 − e
−3 4
) .
3 3

4.4.7. Let f (x) = 61 , x = 1, 2, 3, 4, 5, 6, zero elsewhere, be the pmf of a distribution of the discrete type.
Show that the pmf of the smallest observation of a random sample of size 5 from this distribution is
 5  5
7 − y1 6 − y1
g1 (y1 ) = − , y = 1, 2, 3, 4, 5, 6,
6 6
zero elsewhere. Note that in this exercise the random sample is from a distribution of the discrete type. All
formulas in the text were derived under the assumption that the random sample is from a distribution of the
continuous type and are not applicable. Why?
Solution.
Since the pmf of X is F (x) = x/6, x = 1, 2, ..., 6, the pmf of Y1 is
g1 (y1 ) = P (Y1 ≥ y1 ) − P (Y1 ≥ y1 + 1)
= P (Xi ≥ y1 , i = 1, ..., 5) − P (Xi ≥ y1 + 1, i = 1, ..., 5)
= [P (X ≥ y1 )]5 − [P (X ≥ y1 + 1)]5
= [1 − P (X ≤ y1 − 1)]5 − [1 − P (X ≤ y1 )]5
 5 
y1 − 1 y1 5
= 1− − 1−
6 6
 5  5
7 − y1 6 − y1
= − y = 1, 2, ..., 6.
6 6

8
4.4.8. Let Y1 < Y2 < Y3 < Y4 < Y5 denote the order statistics of a random sample of size 5 from a
distribution having pdf f (x) = e−x , 0 < x < ∞, zero elsewhere. Show that Z1 = Y2 and Z2 = Y4 − Y2 are
independent.
Solution.
Since FX (x) = 1 − e−x , the joint pdf of Y2 and Y4 is

5!
fY2 ,Y4 (y2 , y4 ) = F (y2 )[F (y4 ) − F (y2 )][1 − F (y4 )]f (y2 )f (y4 )
1!1!1!
= 120(1 − e−y2 )(e−y2 − e−y4 )e−y2 e−2y4 .

The inverse functions are y2 = z1 and y4 = z1 + z2 , so the J = 1. Hence, the joint pdf of Z1 and Z2 is

gZ1 ,Z2 (z1 , z2 ) = fY2 ,Y4 (z1 , z1 + z2 )|J|


= 120(1 − e−z1 )(e−z1 − e−z1 −z2 )e−z1 e−2(z1 +z2 )
= 120(1 − e−z1 )e−z1 (1 − e−z2 )e−3z1 e−2z2
= 120(1 − e−z1 )e−4z1 (1 − e−z2 )e−2z2 ,

which can be expressed as a product of a marginal function of Z1 and a marginal function of Z2 . Thus, Z1
and Z2 are independent.
4.4.9. Let Y1 < Y2 < · · · < Yn be the order statistics of a random sample of size n from a distribution with
pdf f (x) = 1, 0 < x < 1, zero elsewhere. Show that the kth order statistic Yk has a beta pdf with parameters
α = k and β = n − k + 1.
Solution.
n!
fYk (y) = y k−1 (1 − y)n−k
(k − 1)!(n − k)!
Γ(n + 1)
= y k−1 (1 − y)n−k .
Γ(k)Γ(n − k + 1)

which means Yk ∼ Beta(k, n − k + 1).


4.4.10. Let Y + 1 < Y2 < · · · < Yn be the order statistics from a Weibull distribution, Exercise 3.3.26. Find
the distribution function and pdf of Y1 .
Solution.
We have a Weibull distribution: f (x) = cxb exp[−cxb+1 /(b + 1)], x > 0. Since the cdf is
Z x
FX (x) = ctb exp[−ctb+1 /(b + 1)]dt = 1 − exp[−cxb+1 /(b + 1)], x > 0,
0

the pdf of Y1 is
n!
fY1 (y1 ) = [1 − FX (y1 )]n−1 fX (y1 )
0!(n − 1)!
(n − 1)cxb+1 cxb+1
   
b
= n exp − cx exp −
b+1 b+1
b+1
 
ncx
= ncxb exp − ,
b+1

which indicates that Y1 also has a Weibull distribution.


4.4.11. Find the probability that the range of a random sample of size 4 from the uniform distribution
having the pdf f (x) = 1, 0 < x < 1, zero elsewhere, is less than 1/2.

9
Solution.

fY1 ,Y4 (y1 , y4 ) = 12(y4 − y1 )2 , 0 < y1 < y4 < 1

and zero elsewhere. Hence

P (Y4 − Y1 < 1/2) = P (Y4 < Y1 + 1/2)


Z 1/2 Z y1 +1/2 Z 1 Z 1
= 12(y4 − y1 )2 dy4 dy1 + 12(y4 − y1 )2 dy4 dy1
0 y1 1/2 y1
Z 1/2 y1 +1/2
Z 1 1
= 4(y4 − y1 )3 dy1 + 4(y4 − y1 )3 dy1
0 y1 1/2 y1
Z 1/2 Z 1
1
= dy1 + 4(1 − y1 )3 dy1
0 2 1/2
1 1 5
= + = .
4 16 16

4.4.13. Suppose a random sample of size 2 is obtained from a distribution that has pdf (x) = 2(1 − x), 0 <
x < 1, zero elsewhere. Compute the probability that one sample observation is at least twice as large as the
other.
Solution.
Let Y1 < Y2 be the order statistics of X1 , X2 .

fY1 ,Y2 (y1 , y2 ) = 8(1 − y1 )(1 − y2 ).

Then
Z 1 Z y2 /2
7
P (Y2 ≥ 2Y1 ) = 8(1 − y1 )(1 − y2 )dy1 dy2 = · · · = .
0 0 12

4.4.14. Let Y1 < Y2 < Y3 denote the order statistics of a random sample of size 3 from a distribution with
pdf f (x) = 1, 0 < x < 1, zero elsewhere. Let Z = (Y1 + Y3 )/2 be the midrange of the sample. Find the pdf
of Z.
Solution.

fY1 ,Y3 (y1 , y3 ) = 6(y3 − y1 ), 0 < y1 < y3 < 1

Let W = Y1 in addition to Z, which isone to one transformation. Then, since y1 = w, y3 = 2z − w, the


Jacobian is 2. Thus

fz,w (z, w) = fY1 ,Y3 (w, 2z − w)|J| = 24(z − w), 0 < w < 2z − w < 1.

and zero elsewhere. Note that the support of Z, then the pdf of Z is
R z
2
R0 24(z − w) = 12z
 0 < z < 1/2
z
fz (z) = 24(z − w) = 12(1 − z)2 1/2 < z < 1 .
 2z−1
0 otherwise

4.4.15. Let Y1 < Y2 denote the order statistics of a random sample of size 2 from N (0, σ 2 ).

(a) Show that E(Y1 ) = −σ/ π.

10
Solution.
Z ∞ Z ∞ Z ∞ 
E(Y1 ) = y1 f (y1 )dy1 = y1 f (y1 , y2 )dy2 dy1
−∞ −∞ y1
Z ∞ Z
y2 
= y1 f (y1 , y2 )dy1 dy2
−∞ −∞
Z ∞ Z y2 
y1 − y12 +y2 22
= 2
e 2σ dy1 dy2
−∞ −∞ πσ
Z ∞ y
1 − y12 +y2 22 2
= − e 2σ dy2
−∞ π −∞
Z ∞
1 − y222
=− e σ dy2
−∞ π
Z ∞ 2
σ 1 y2
= −√ p e− σ2 dy2
π −∞ 2π(σ 2 /2)
σ
= −√ .
π

(b) Find the covariance of Y1 and Y2 .


Solution.
Z ∞ Z ∞ Z y2 
E(Y2 ) = y2 f (y2 )dy2 = y2 f (y1 , y2 )dy1 dy2
−∞ −∞ −∞
Z ∞ Z ∞ 
= y2 f (y1 , y2 )dy1 dy2
−∞ y1
Z ∞ Z ∞ 
y2 − y12 +y2 22
= 2
e 2σ dy1 dy2
−∞ y1 πσ
Z ∞ ∞
1 y12 +y22
= − e− 2σ2 dy2
−∞ π y1
Z ∞
1 − y122
= e σ dy2
−∞ π
Z ∞ 2
σ 1 y1
=√ p e− σ2 dy2
π −∞ 2π(σ 2 /2)
σ
=√ ,
π
Z ∞ Z y2
E(Y1 Y2 ) = y1 y2 f (y1 , y2 )dy1 dy2
−∞ −∞
Z ∞ Z y2
y1 − y12 +y2 22
= y2 2
e 2σ dy1 dy2
−∞ −∞ πσ
Z ∞
y2 − y222
=− e σ dy2
−∞ π
= 0.

Hence, the covariance of Y1 and Y2 is

σ2
Cov(Y1 , Y2 ) = E(Y1 Y2 ) − E(Y1 )E(Y2 ) = .
π

4.4.17. Let Y1 < Y2 < Y3 < Y4 be the order statistics of a random sample of size n = 4 from a distribution
with pdf f (x) = 2x, 0 < x < 1, zero elsewhere.

11
(a) Find the joint pdf of Y3 and Y4 .
(b) Find the conditional pdf of Y3 , given Y4 = y4 .
(c) Evaluate E(Y3 |y4 ).
Solution.
(a) fY3 ,Y4 (y3 , y4 ) = (4!/2!)F (y3 )2 f (y3 )f (y4 ) = 12(y32 )2 (2y3 )(2y4 ) = 48y35 y4 , 0 < y3 < y4 < 1.
(b) Since fy4 (y4 ) = 4y46 (2y4 ) = 8y47 ,

fY3 ,Y4 (y3 , y4 ) 6y 5


fY3 |Y4 (y3 |y4 ) = = 63 , 0 < y3 < y4 .
fy4 (y4 ) y4

(c)
y4 y4
6y 5 6y36
Z Z
6
E(Y3 |y4 ) = y3 63 dy3 = 6 dy3 = y4 .
0 y4 0 y4 7

4.4.18. Two numbers are selected at random from the interval (0, 1). If these values are uniformly and
independently distributed, by cutting the interval at these numbers, compute the probability that the three
resulting line segments can form a triangle.
Solution.
Let X1 and X2 denote the two numbers that are U (0, 1) and Y1 < Y2 denote the order statistics. Then, the
joint pdf of Y1 and Y2 is

fY1 ,Y2 (y1 , y2 ) = (2!/0!0!0!)fX (y1 )fX (y2 ) = 2, 0 < y1 < y2 < 1.

The conditions under which three resulting line segments can form a triangle are
1
y1 < 1 − y1 ⇒ y1 < ,
2
1
y2 − y1 < 1 − (y2 − y1 ) ⇒ y2 − y1 < ,
2
1
y2 > 1 − y2 ⇒ y2 > ,
2
which is the support to compute the probability:
Z 1/2 Z y1 +1/2 Z 1/2
1
2dy2 dy1 = 2y1 dy1 = .
0 1/2 0 4

4.4.19. Let X and Y denote independent random variables with respective probability density functions
f (x) = 2x, 0 < x < 1, zero elsewhere, and g(y) = 3y 2 , 0 < y < 1, zero elsewhere. Let U = min(X, Y ) and
V = max(X, Y ). Find the joint pdf of U and V .
Solution.
Since X and Y are independent, we have the joint pdf of X and Y :

fX,Y (x, y) = f (x)g(y) = 6xy 2 , 0 < x < 1, 0 < y < 1.

Note that the transformation is not one-to-one:

(1) x = u, y = v and (2) x = v, y = u.

12
Then, the Jacobians are J1 = 1 and J2 = −1, respectively. Thus, the joint pdf of U and V is

fU,V (uv) = fX,Y (u, v)|J1 | + fX,Y (v, u)|J2 |


= 6uv 2 (1) + 6vu2 (1)
= 6uv(u + v), 0 < u < v < 1.

4.4.20. Let the joint pdf of X and Y be f (x, y) = 12


7 x(x + y), 0 < x < 1, 0 < y < 1, zero elsewhere. Let
U = min(X, Y ) and V = max(X, Y ). Find the joint pdf of U and V .
Solution.
As with the previous exercise,

fU,V (uv) = fX,Y (u, v)|J1 | + fX,Y (v, u)|J2 |


12 12
= u(u + v) + v(v + u)
7 7
12 2
= (u + v) , 0 < u < v < 1.
7

4.4.22. Let Y1 < Y2 < · · · < Yn be the order statistics of a random sample of size n from the exponential
distribution with pdf f (x) = e−x , 0 < x < ∞, zero elsewhere.
(a) Show that Z1 = nY1 , Z2 = (n − 1)(Y2 − Y1 ), Z3 = (n − 2)(Y3 − Y2 ),..., Zn = Yn − Yn−1 are independent
and that each Zi has the exponential distribution.
Solution.
The inverse transformation is
z1 z1 z2 z1 z2 z3 z1 z2
y1 = , y2 = + , y3 = + + , . . . , yn = + + · · · + zn ,
n n n−1 n n−1 n−2 n n−1
which implits that J = 1/n!. By theorem 4.4.1, hence, the joint pdf of Zi ’s is
 
z1 z1 z2 z1 z2
fZ1 ,...,Zn (z1 , . . . , zn ) = fY1 ,...,Yn , + ,..., + + · · · + zn |J|
n n n−1 n n−1
z     
1 z1 z2 z1 z2 1
= n!fX fX + · · · fX + + · · · + zn
n n n−1 n n−1 n!
= e−z1 −z2 −···−zn
= fX (z1 )fX (z2 ) · · · fX (zn ),

which is the desired result.


Pn
(b) Demonstrate that all linear functions of Y1 , Y2 , ..., Yn , such as 1 ai Yi , can be expressed as linear
functions of independent random variables.
Solution.
By part (a), we can transform Yi ’s that are dependent to Zi ’s that are independent each other.
n n i n n n
X X X Zj XX aj X
ai Yi = ai = Zi ≡ bi Z i ,
i=1 i=1 j=1
n−j+1 i=1 j=i
n−j+1 i=1

which is a form of linear functions of independent random variables.

13
4.5 Introduction to Hypothesis Testing
4.5.1. Show that the approximate power function given in expression (4.5.12) of Example 4.5.3 is a strictly
increasing function of µ. Show then that the test discussed in this example has approximate size α for testing

H0 : µ ≤ µ0 versus H1 : µ > µ0 .

Solution.
Let ϕ(z) be a pdf of a standard normal random variable. The first derivative of γ(µ) with respect to µ,
 √ √
n(µ0 − µ) n
γ ′ (µ) = ϕ −zα − >0
σ σ

because ϕ(x) > 0, n > 0, and σ > 0. Hence, γ(µ) is strictly increasing function of µ.
Then, under H0 : µ ≤ µ0 ,

max γ(µ) = γ(µ0 ) = Φ(−zα ) = α,


µ≤µ0

which is the desired result.


4.5.2. For the Darwin data tabled in Example 4.5.5, verify that the Student t-test statistic is 2.15.
Solution.
x−0 2.62 − 0
t= √ = √ = 2.149.
sx / n 4.72/ 15

4.5.3. Let X have a pdf of the form f (x; θ) = θxθ−1 , 0 < x < 1, zero elsewhere, where θ ∈ θ : θ = 1, 2. To
test the simple hypothesis H0 : θ = 1 against the alternative simple hypothesis H1 : θ = 2, use a random
sample X1 , X2 of size n = 2 and define the critical region to be C = {(x1 , x2 ) : 43 ≤ x1 x2 }. Find the power
function of the test.
Solution. Since X1 and X2 are independent, f (x1 , x2 ) = θ2 (x1 x2 )θ−1 . Hence the power function is
  Z 1 Z 1  θ  θ
3 2 θ−1 3 3 3
γC (θ) = Pθ X1 X2 ≥ = θ (x1 x2 ) dx2 dx1 = · · · = 1 − +θ log , θ = 1, 2.
4 3/4 3/(4x1 ) 4 4 4

4.5.4. Let X have a binomial distribution with the number of trials n = 10 and with p either 1/4 or 1/2.
The simple hypothesis 0 : p = 21 is rejected, and the alternative simple hypothesis H1 : p = 14 is accepted, if
the observed value of X1 , a random sample of size 1, is less than or equal to 3. Find the significance level
and the power of the test.
Solution.

α = Pp=1/2 (X ≤ 3) = pbinom(3, 10, 0.5) = 0.172,


1 − β = Pp=1/4 (X ≤ 3) = pbinom(3, 10, 0.25) = 0.776.

4.5.5. Let X1 , X2 be a random sample of size n = 2 from the distribution having pdf f (x; θ) = (1/θ)e−x/θ ,
0 < x < ∞, zero elsewhere. We reject H0 : θ = 2 and accept H1 : θ = 1 if the observed values of X1 , X2 , say
x1 , x2 , are such that

f (x1 ; 2)f (x2 ; 2) 1


≤ .
f (x1 ; 1)f (x2 ; 1) 2

Here Ω = {θ : θ = 1, 2}. Find the significance level of the test and the power of the test when H0 is false.

14
Solution.
f (x1 ; 2)f (x2 ; 2) 1 1 (x1 +x2 )/2 1
≤ ⇔ e ≤ ⇔ x1 + x2 ≤ 2 log 2.
f (x1 ; 1)f (x2 ; 1) 2 4 2
Also, we have X ∼ Γ(1, θ) ⇒ Y = X1 + X2 ∼ Γ(2, θ). Hence,
Z 2 log 2
1 −x/θ
Pθ (Y ≤ 2 log 2) = 2
xe dx
0 θ
Z 2 log 2
1 −x/θ
= [−xe −x/θ
/θ]20 log 2 + e dx
0 θ
2 log 2 2 log 2/θ
=− e + 1 − e−2 log 2/θ
θ
 
2 log 2 −2 log 2/θ
=1− 1+ e .
θ
Hence,
α = P2 (Y ≤ 2 log 2) = 1 − (1 + log 2)/2 = (1 − log 2)/2 ≈ 0.1534,
1 − β = P1 (Y ≤ 2 log 2) = 1 − (1 + 2 log 2)/4 = (3 − 2 log 2)/4 ≈ 0.403.

4.5.8. Let us say the life of a tire in miles, say X, is normally distributed with mean θ and standard deviation
5000. Past experience indicates that θ = 30, 000. The manufacturer claims that the tires made by a new
process have mean θ > 30, 000. It is possible that θ = 35, 000. Check his claim by testing H0 : θ = 30, 000
against H1 : θ > 30, 000. We observe n independent values of X, say x1 , ..., xn , and we reject H0 (thus
accept H1 ) if and only if x ≥ c. Determine n and c so that the power function γ(θ) of the test has the values
γ(30, 000) = 0.01 and γ(35, 000) = 0.98.
Solution.
We have two equations:
 
X − 30000 c − 30000 c − 30000
γ(30, 000) = 0.01 ⇒ P √ ≥ √ = 0.01 ⇒ √ = 2.326,
5000/ n 5000/ n 5000/ n
 
X − 35000 c − 35000 c − 35000
γ(35, 000) = 0.98 ⇒ P √ ≥ √ = 0.98 ⇒ √ = −2.054,
5000/ n 5000/ n 5000/ n
which gives us n ≈ 20 and c ≈ 32661.
4.5.11. Let Y1 < Y2 < Y3 < Y4 be the order statistics of a random sample of size n = 4 from a distribution
with pdf f (x; θ) = 1/θ, 0 < x < θ, zero elsewhere, where 0 < θ. The hypothesis H0 : θ = 1 is rejected and
H1 : θ > 1 is accepted if the observed Y4 ≥ c.
(a) Find the constant c so that the significance level is α = 0.05.
Solution.
4! 4y 3
fY4 (y4 ) = FX (y4 )3 fX (y4 ) = 44 , 0 < y4 < θ.
3!0! θ
Hence,
Z 1
α = 0.05 = Pθ=1 (Y4 ≥ c) = 4y43 dy4 = 1 − c4 ⇒ c = (0.95)1/4 = 0.9872.
c

(b) Determine the power function of the test.


Solution.
θ
4y43 c4
Z
0.95
γ(θ) = Pθ (Y4 ≥ c) = dy4 = 1 − =1− 4 .
c θ4 θ4 θ

15
4.5.12. Let X1 , X2 , ..., X8 be a random sample of size n = 8 from a Poisson distribution
P8 with mean µ. Reject
the simple null hypothesis H0 : µ = 0.5 and accept H1 : µ > 0.5 if the observed i=1 xi ≥ 8.
(a) Show that the significance level is 1-ppois(7,8*.5).
Solution.
P8
Since Y = i=1 Xi ∼ Poisson(8µ), α = P0.5 (Y ≥ 8) = 1 − P0.5 (Y ≤ 7) = 1-ppois(7,8*.5) = 0.051.
(b) Use R to determine γ(0.75), γ(1), and γ(1.25).
Solution.

γ(0.75) = 1-ppois(7,8*.75) = 0.256,


γ(1) = 1-ppois(7,8) = 0.547,
γ(1.25) = 1-ppois(7,8*1.25) = 0.780.

(c) Modify the code in Exercise 4.5.9 to obtain a plot of the power function.
Solution. Skipped.
4.5.13. Let p denote the probability that, for a particular tennis player, the first serve is good. Since
p = 0.40, this player decided to take lessons in order to increase p. When the lessons are completed, the
hypothesis H0 : p = 0.40 is tested against H1 : p > 0.40 based on n = 25 trials. Let Y equal the number of
first serves that are good, and let the critical region be defined by C = {Y : Y ≥ 13}.
(a) Show that α is computed by α = 1 - pbinom(12 , 25, .4).
Solution. α = Pp=0.40 (Y ≥ 13) = 1 − P (Y < 12|p = 0.4) = 1 - pbinom(12 , 25, .4) = 0.154.
(b) Find β = P (Y < 13) when p = 0.60; that is, β = P (Y ≤ 12; p = 0.60) so that 1 − β is the power at
p = 0.60.
Solution. β = Pp=0.6 (Y < 13) = pbinom(12 , 25, .6) = 0.154 ⇒ 1 − β = 0.846.

4.6 Additional Comments About Statistical Tests


4.6.2. Consider the power function γ(µ) and its derivative γ(µ) given by (4.6.5) and (4.6.6). Show that γ(µ)
is strictly negative for µ < µ0 and strictly positive for mu > µ0 .
Solution.
Given (4.6.6):
√  √  √ 
n n(µ0 − µ) n(µ0 − µ)
γ ′ (µ) = ϕ + zα/2 − ϕ − zα/2
σ σ σ
√   √   √ 
n n(µ0 − µ) n(µ0 − µ)
= ϕ zα/2 + − ϕ zα/2 −
σ σ σ

because of ϕ(z) = ϕ(−z), or ϕ(z)’s symmetry. Also we have the further from the origin z, the smaller ϕ(z)
is. So, if µ < µ0 ,
√ √
n(µ0 − µ) n(µ0 − µ)
zα/2 + > zα/2 −
 σ√  σ √ 
n(µ0 − µ) n(µ0 − µ)
⇒ ϕ zα/2 + < ϕ zα/2 −
σ σ
⇒ γ ′ (µ) < 0,

indicating γ(µ) is strictly decreasing and vice versa.


4.6.3. Show that the test defined by 4.6.9 has exact size α for testing H0 : µ = µ0 versus H1 : µ = µ0 .

16
Solution.

Since n(X − µ0 )/S ∼ tn−1 , if X > µ0 ,
 √  √ 
n(X − µ0 ) n(X − µ0 )
P ≥ tα/2,n−1 = 2P ≥ tα/2,n−1 = 2(α/2) = α.
S S

If X < µ0 ,
 √  √ 
n(X − µ0 ) n(X − µ0 )
P ≥ tα/2,n−1 = 2P ≤ −tα/2,n−1 = 2(α/2) = α.
S S

4.6.8. Let p equal the proportion of drivers who use a seat belt in a country that does not have a mandatory
seat belt law. It was claimed that p = 0.14. An advertising campaign was conducted to increase this
proportion. Two months after the campaign, y = 104 out of a random sample of n = 590 drivers were
wearing their seat belts. Was the campaign successful?
(a) Define the null and alternative hypotheses.
Solution. H0 : p = 0.14 versus HA : p > 0.14.
(b) Define a critical region with an α = 0.01 significance level.
Solution.
Let pb = 104/590 = 0.176. Then a critical region is
pb − p
Z=p > zα = 2.326.
p(1 − p)/n

(c) Determine the approximate p-value and state your conclusion.


Solution.
!
0.176 − 0.14
p=P Z>p = 1 − Φ(2.52) = 0.00587.
0.14(0.86)/590

Since p < α = 0.01 (or Z = 2.52 > 2.326), H0 is rejected; there is sufficient evidence to show that the
campaign was successful.
4.6.9. In Exercise 4.2.18 we found a confidence interval for the variance σ 2 using the variance S2 of a
random sample of size n arising from N (µ, σ 2 ), where the mean µ is unknown. In testing H0 : σ 2 = σ02
against H1 : σ 2 > σ02 , use the critical region defined by (n − 1)S 2 /σ02 ≥ c. That is, reject H0 and accept H1
if S 2 ≥ cσ02 /(n − 1). If n = 13 and the significance level α = 0.025, determine c.
Solution. Since (n − 1)S 2 /σ02 ∼ χ2n−1 = χ212 , c = qchisq(0.975, 12) = 23.337.
4.6.10. In Exercise 4.2.27, in finding a confidence interval for the ratio of the variances of two normal
distributions, we used a statistic S12 /S22 , which has an F -distribution when those two variances are equal. If
we denote that statistic by F, we can test H0 : σ12 = σ22 against H1 : σ12 > σ22 using the critical region F ≥ c.
If n = 13, m = 11, and α = 0.05, find c.
Solution. Since F ∼ F12,10 , c = qf(0.95, 12, 10) = 2.913.

4.7 Chi-Square Tests


4.7.1. Consider Example 4.7.2. Suppose the observed frequencies of A1 , ..., A4 are 20, 30, 92, and 105,
respectively. Modify the R code given in the example to calculate the test for these new frequencies. Report
the p-value.
Solution.

17
Use the following R code to obtain p = 0.01837:
x = c(20, 30, 92, 105); ps = c(1, 3, 5, 7)/16; chisq.test(x, p = ps)
4.7.2. A number is to be selected from the interval {x : 0 < x < 2} by a random process. Let Ai = {x :
(i − 1)/2 < x ≤ i/2}, i = 1, 2, 3, and let A4 = {x : 23 < x < 2}. RFor i = 1, 2, 3, 4, suppose a certain hypothesis
assigns probabilities pi0 to these sets in accordance with pi0 = Ai ( 21 )(2 − x)dx, i = 1, 2, 3, 4. This hypothesis
(concerning the multinomial pdf with k = 4) is to be tested at the 5% level of significance by a chi-square
test. If the observed frequencies of the sets Ai , i = 1, 2, 3, 4, are respectively, 30, 30, 10, 10, would H0 be
accepted at the (approximate) 5% level of significance? Use R code similar to that of Example 4.7.2 for the
computation.
Solution.
Since
i/2
x2
Z  
1 i/2 9 i
pi0 = (2 − x)dx = x − = − ,
(i−1)/2 2 4 (i−1)/2 16 8

p10 = 7/16, p20 = 5/16, p30 = 3/16, p40 = 1/16. Hence, use the following R code:
x = c(30, 30, 10, 10); ps = c(7, 5, 3, 1)/16; chisq.test(x, p = ps)
to obtain χ2 statistic = 8.38 and p = 0.03816 < 0.05; H0 is rejected, which means that the observations have
a lack of fit to the assignsed probabilities.
4.7.3. Define the sets A1 = {x : −∞ < x ≤ 0}, Ai = {x : i − 2 < x ≤ i − 1}, i = 2, ..., 7, and
A8 = {x : 6 < x < ∞}. A certain hypothesis assigns probabilities pi0 to these sets Ai in accordance with

(x − 3)2
Z  
1
pi0 = √ exp − dx, i = 1, 2, ..., 7, 8.
Ai 2 2π 2(4)

This hypothesis (concerning the multinomial pdf with k = 8) is to be tested, at the 5% level of significance,
by a chi-square test. If the observed frequencies of the sets Ai , i = 1, 2, ..., 8, are, respectively, 60, 96, 140,
210, 172, 160, 88, and 74, would H0 be accepted at the (approximate) 5% level of significance? Use R
code similar to that discussed in Example 4.7.2. The probabilities are easily computed in R; for example,
p30 = pnorm(2,3,2) - pnorm(1,3,2).
Solution.
Use the R code below:
x = c(60, 96, 140, 210, 172, 160, 88, 74)
p1 = pnorm(0,3,2)
p2 = pnorm(1,3,2) - pnorm(0,3,2)
p3 = pnorm(2,3,2) - pnorm(1,3,2)
p4 = pnorm(3,3,2) - pnorm(2,3,2)
p5 = pnorm(4,3,2) - pnorm(3,3,2)
p6 = pnorm(5,3,2) - pnorm(4,3,2)
p7 = pnorm(6,3,2) - pnorm(5,3,2)
p8 = 1 - pnorm(6,3,2)
ps = c(p1, p2, p3, p4, p5, p6, p7, p8)
chisq.test(x, p = ps)}

to obtain p = 0.4368 > 0.05; H0 would be accepted at 5% significance level.


4.7.4. A die was cast n = 120 independent times and the following data resulted:
Spot Up 1 2 3 4 5 6
Frequency b 20 20 20 20 40 − b

18
If we use a chi-square test, for what values of b would the hypothesis that the die is unbiased be rejected at
the 0.025 significance level?
Solution.
Under the null hypothesis that a die is unbiased, pi0 = 1/6 for ∀i, so npi0 = 120(1/6) = 20. Hence the test
statistic is
(b − 20)2 [(40 − b) − 20]2 (b − 20)2
+ = .
20 20 10
Since χ20.025,5 = qchisq(0.975, 5) = 12.83, the null is rejected if

(b − 20)2
> 12.83 ⇒ b − 20 > 11.33 or b − 20 < −11.33 ⇒ b > 31.33 or b < 8.67.
10
If b is an integer, then b ≤ 8 or b ≥ 32.
4.7.5. Consider the problem from genetics of crossing two types of peas. The Mendelian theory states that
the probabilities of the classifications (a) round and yellow, (b) wrinkled and yellow, (c) round and green,
9 3 3 1
and (d) wrinkled and green are 16 , 16 , 16 , and 16 , respectively. If, from 160 independent observations, the
observed frequencies of these respective classifications are 86, 35, 26, and 13, are these data consistent with
9
the Mendelian theory? That is, test, with α = 0.01, the hypothesis that the respective probabilities are 16 ,
3 3 1
16 , 16 , and 16 .

Solution.
This is a table to compute chi-square test statistic:
(a) (b) (c) (d)
Observed 86 35 26 13
Expected 90 30 30 10
The test statistic is
42 52 42 32
X2 = + + + = 2.44.
90 30 30 10
Since X 2 < χ20.01,3 = qchisq(0.99, 3) = 11.34, the null is not rejected; these data would be consistent with
the Mendelian theory, as there is insufficient evidence to show that they are different from the theory.
4.7.6. Two different teaching procedures were used on two different groups of students. Each group contained
100 students of about the same ability. At the end of the term, an evaluating team assigned a letter grade
to each student. The results were tabulated as follows.
Group A B C D F Total
I 15 25 32 17 11 100
II 9 18 29 28 16 100
If we consider these data to be independent observations from two respective multinomial distributions with
k = 5, test at the 5% significance level the hypothesis that the two distributions are the same (and hence
the two teaching procedures are equally effective). For computation in R, use r1=c(15,25,32,17,11);
r2=c(9,18,29,28,16); mat=rbind(r1,r2); chisq.test(mat)
Solution.
This is a χ2 test for independence. The R code shows p = 0.1711 > 0.05; the hypothesis is not rejected; there
is insufficient evidence to show that the two teaching procedures are differently effective.
4.7.8. Let the result of a random experiment be classified as one of the mutually exclusive and exhaustive
ways A1 , A2 , A3 and also as one of the mutually exhaustive ways B1 , B2 , B3 , B4 . Say that 180 independent
trials of the experiment result in the following frequencies:

19
B1 B2 B3 B4 Total
A1 15 − 3k 15 − k 15 + k 15 + 3k 60
A1 15 15 15 15 60
A3 15 + 3k 15 + k 15 − k 15 − 3k 60
Total 45 45 45 45 180
where k is one of the integers 0, 1, 2, 3, 4, 5. What is the smallest value of k that leads to the rejection of
the independence of the A attribute and the B attribute at the α = 0.05 significance level?
Solution.
The expected values are all 45(60)/80 = 15. Hence the chi-square statistic is

(−3k)2 (−k)2 k 2 ) (3k)2


 
8
+ + + × 2 = k2 .
15 15 15 15 3

In this case, the degrees of freedom is (3 − 1)(4 − 1) = 6. Since qchisq(0.95, 6) = 12.6, the null hypothesis
of the independence A and B is rejected if
8 2
k > 12.6 ⇒ k > 2.17,
3
which gives us the smallest value of integer k = 3.
4.7.9. It is proposed to fit the Poisson distribution to the following data:
x 0 1 2 3 3<x
Frequency 20 40 16 18 6
(a) Compute the corresponding chi-square goodness-of-fit statistic.
Solution.
The mean of the Poisson distribution is computed as
0(20) + 1(40) + 2(16) + 3(18) + 4(6)
= 1.5.
20 + 40 + 16 + 18 + 6
Hence,

p00 = P (X = 0) = dpois(0, 1.5) = 0.223


p10 = P (X = 1) = dpois(1, 1.5) = 0.335
p20 = P (X = 2) = dpois(2, 1.5) = 0.251
p30 = P (X = 3) = dpois(3, 1.5) = 0.126
p40 = P (X > 3) = 1 - ppois(3, 1.5) = 0.066.

Then the following code shows the chi-square goodness-of-fit statistic is 7.23:
x = c(20, 40, 16, 18, 6)
ps = c(dpois(0, 1.5), dpois(1, 1.5), dpois(2, 1.5), dpois(3, 1.5), 1-ppois(3, 1.5))
chisq.test(x, p = ps)
(b) How many degrees of freedom are associated with this chi-square?
Solution. Since this χ2 test is a hi-square goodness-of-fit, the degrees of freedom is 5 − 1 = 4.
(c) Do these data result in the rejection of the Poisson model at the α = 0.05 significance level?
Solution.
The critical values is qchisq(0.95, 4) = 9.49. The statistic obtained in part (a) is less than 9.49.
Thus, the Poisson model is not rejected at 5% level; we were not able to show that the observed data
do not fit the Poisson distribution.

20
4.9 Bootstrap Procedures
4.9.8. Consider the data of Example 4.9.2. The two-sample t-test of Example 4.6.2 can be used to test these
hypotheses. The test is not exact here (why?), but it is an approximate test. Show that the value of the test
statistic is t = 0.93, with an approximate p-value of 0.18.
Solution.
The reason that the test is not exact here is that the distribution is contaminated. By the Example 4.9.2, we
have the two sample variances s2x = 20.4072 and s2y = 18.5852 . Hence, the pooled variance estimator of σ 2 is

(15 − 1)s2x + (15 − 1)s2y 14(20.412 + 18.592 )


s2p = = = 380.924 ⇒ sp = 19.52,
30 − 2 28
which gives
y−x 6.63
t= p = p = 0.93,
sp 2/n 19.52 2/15
p = P (t28 > 0.93) = 1 - pt(0.93, 28) = 0.18.

4.9.12. For the situation described in Example 4.9.3, show that the value of the one-sample t-test is t = 0.84
and its associated p-value is 0.20.
Solution.
Conduct one-sided one-sample t-test for testing H0 : µ = 90 versus HA : µ > 90.
X <- c(119.7, 104.1, 92.8, 85.4, 108.6, 93.4, 67.1, 88.4, 101.0, 97.2,
95.4, 77.2, 100.0, 114.2, 150.3, 102.3, 105.8, 107.5, 0.9, 94.1)
t.test(X, mu = 90, alternative = ’greater’)

which shows t = 0.84475, df = 19, p-value = 0.2044.

21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy