0% found this document useful (0 votes)
101 views9 pages

HW12 Sol

Uploaded by

Loy Nas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views9 pages

HW12 Sol

Uploaded by

Loy Nas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Homework 12 Solution

STAT 509 Statistics for Engineers


Summer 2017 Section 001
Instructor: Tahmidul Islam

1. A textile fiber manufacturer is investigating a new drapery yarn, which the company
claims has a mean thread elongation of 12 kilograms with a standard deviation of 0.5
kilograms. The company wishes to test the hypothesis

H0 : µ = 12
Ha : µ 6= 12

using a random sample of four specimens. Suppose the random sample is from a normal
population. (Hint: notice in this question, the population variance is assumed to be known
with σ = 0.5)

(a) Given the sample mean y is 11.3 and the confidence level is 95%, follow the 4-step
procedure to conduct a hypothesis test. What is your conclusion?

• Step 1: State H0 and Ha .

H0 : µ = 12
Ha : µ 6= 12

• Step 2: Calculate the test statistic.

Ȳ − µ0 11.3 − 12
z0 = √ = √ = −2.8
σ/ n 0.5/ 4

• Step 3: Calculate p-value.

p-value = 2P (Z < −| − 2.8|) = 2P (Z < −2.8) = 2 × 0.0026 = 0.0052

• Step 4: Make Conclusion.


Because p-value= 0.0052 < 0.05 = α, we reject H0 . With 95% confidence, we
conclude that the population mean thread elongation is not 12 kilograms.
(b) Using the confidence interval approach to calculate a 95% two-sided confidence inter-
val for µ. Does the confidence interval cover 12? Is the results of confidence interval
consistent to the testing conclusion?

σ
C.I. = Ȳ ± zα/2 √
n
0.5
= 11.3 ± 1.96 × √
4
= 11.3 ± 0.49
= (10.81, 11.79)

The 95% confidence interval does not cover 12, which means we should conclude
that the population mean thread elongation is not 12 kilograms by the confidence
interval. It is consistent to the testing result.

1
(c) What is margin of error and the length of interval in (b)? If we want to control the
length of the confidence interval to be 0.6, how many observations do we need in the
sample?

σ 0.5
Margin of Error = zα/2 √ = 1.96 × √ = 0.49
n 4
Length of CI = 2 × Margin of Error = 2 × 0.49 = 0.98

If we want to control the length of the confidence interval to be 0.6, the margin of
error must be 0.6
2
= 0.3. Therefore, we need to solve the following equation to find
the number observations needed.
σ 0.5
0.3 = zα/2 √ = 1.96 × √
n n

which gives  2
1.96 × 0.5
n= = 10.67 ≈ 11
0.3
2. A manufacturing firm is interested in the mean batteries hours used in their electronic
games. To investigate mean batteries life in hours, say µ. The following data are collected

20,25,21,28,21,30,23,27,26,26,28,31,26,32,33,35

(Hint: the population variance is not given, therefore we assume it is not known)

(a) Is it reasonable to assume that the sample data has come from a normal distribution?
The R code is given below (Hint: use fat pencil test in R.)
battery<-c(20,25,21,28,21,30,23,27,26,26,28,31,26,32,33,35)
qqnorm(battery)
qqline(battery)
By the QQ plot, we see that most of the points are close to the 45 degree line, which
means we can use a “fat pencil” to cover those points. Thus, it is reasonable to
assume that the sample data has come from a normal distribution.

2
(b) Suppose it is reasonable to assume the data has come from a normal distribution,
construct a 99% two-sided confidence interval for µ. The quantile can be found via
R or t-table. The sample mean and the sample standard deviation can be computed
via the following command:
mean(battery)
sd(battery)
By R
> mean(battery)
[1] 27
> sd(battery)
[1] 4.442222
Therefore,
S
C.I. = Ȳ ± tn−1, α2 √
n
4.44
= 27 ± t16−1, 0.01 √
2
16
 
4.44
= 27 ± 2.947
4
= 27 ± 3.27
= (23.73, 30.27)

(c) Using R to test the following hypothesis with the level of significance α = 0.01:

H0 : µ = 24
Ha : µ 6= 24

You need to print out both your R code and testing results.

Based on the R results, p-value is 0.01641, which is greater than α = 0.01. Therefore,
we fail to reject the H0 , and say with 99% confidence, the mean battery life is 24
hours. Since 24 is included in the 99% confidence interval, we get the same conclusion
comparing to the result in (b).
> t.test(battery, alternative="two.sided", mu=24)

One Sample t-test

data: battery
t = 2.7014, df = 15, p-value = 0.01641
alternative hypothesis: true mean is not equal to 24
95 percent confidence interval:
24.63291 29.36709
sample estimates:
mean of x
27

3. Inexperienced data analysts often erroneously place too much faith in qq plots when
assessing whether a distribution adequately represents a data set (especially when the

3
sample size is small). The purpose of this problem is to illustrate to you the dangers
that can arise. In this problem, you will use R to simulate the process of drawing re-
peated random samples from a given population distribution and then creating normal
probability plots (Q-Q plots). Follow the code provided
(a) Generate your own data and create a qq plot for each sample using this R code:
# create 2 by 2 figure
par(mfrow = c(2,2))
B = 4
n = 10
# create matrix to hold all data
data = matrix(round(rnorm(n*B,0,1),4), nrow = B, ncol = n)
# this creates a qq plot for each sample of data
for (i in 1:B){
qqnorm(data[i,],pch=16,main="")
qqline(data[i,])
}

mark the qq plot that appears to violate the normal assumption the most. Note:
In theory, all of these plots should display perfect linearity! Why? Because we are
generating the data from a normal distribution! Therefore, even when we cre-
ate normal qq plots with normally distributed data, we can get plots that
don’t look perfectly linear. This is a byproduct of sampling variability. This is
why you don’t want to rush to discount a distribution as being plausible based on a
single plot, especially when the sample size n is small (e.g. n = 10).

Each student should generate different 4 pieces of qq plots. Based on my data, we


can find that there are several observation points which are pretty far away from the
45 degree line.

4
(b) Increase your sample size to n = 100 and repeat. What happens? What if n = 1000?
Just change n in the R code on the last page and re-run.

The left four qq plots are generated with n = 100 and the right ones are generated
with n = 1000. When n = 100 we still can find a couple of points deviating from
the 45 degree line, however, the proportion of deviated points are decreased. Once
we increase the sample size to 1000, we get almost perfet lines. After all, we sample
observations directly from the normal distribution! This problem also tells us when
we have more information (observations), we can reveal the truth more precise.

(c) Take n = 100, replace

data = matrix(round(rnorm(n*B,0,1),4), nrow = B, ncol = n)

with

data = matrix(round(rexp(n*B,1),4), nrow = B, ncol = n)

and re-run. By doing this, you are changing the underlying population distribution
from N (0, 1) to exponential(1). What do these normal qq plots look like? Are you
surprised?

The shape of the exponential pdf is right skewed. We are supposed to get qq plots,
which are highly deviated from the 45 degree line. By my generated data, we get
what we expected. No surprise at all!

5
4. Two machines are used for filling plastic bottles with a net volume of 16.0 ounces. The fill
volume can be assumed to be normal with standard deviation σ1 = 0.020 and σ2 = 0.025
ounces. A member of the quality engineering staff suspects that both machines fill to the
same mean net volume, whether or not this volume is 16.0 ounces. A random sample of
10 bottles is taken from the output of each machine.
Machine 1: 16.03, 16.04, 16.05, 16.05, 16.02, 16.01, 15.96, 15.98, 16.02,
15.99
Machine 2: 16.02, 15.97, 15.96, 16.01, 15.99, 16.03, 16.04, 16.02, 16.01,
16.00

(a) Do you think the engineer is correct? Conduct a formal 4-step procedure with
α = 0.05. What is your conclusion? (Hint: Sample means can be computed using
R.)

By R, we have ȳ1 = 16.015 and ȳ2 = 16.005.


Step 1: The null and alternative hypothesis

H0 : µ1 − µ2 = 0
Ha : µ1 − µ2 6= 0

Step 2: The test statistic is


ȳ1 − ȳ2 16.015 − 16.005
z0 = q 2 = q = 0.99
σ1 σ22 0.022 0.0252
n1
+ n2 10
+ 10

Step 3: Calculate p-value

p-value = 2P (Z < −|z0 |) = 2P (Z < −0.99) = 2 × 0.1611 = 0.3222

Step 4: Because p-value = 0.3222 > 0.05 = α, we fail to reject the null, which means
we are 95% confident that both machines fill to the same mean net volume.
(b) Calculate a 95% confidence interval on the difference in population means. Provide
a practical interpretation of this interval.

s
σ12 σ22
C.I. = ȳ1 − ȳ2 ± zα/2 +
n1 n2
r
0.022 0.0252
= 16.015 − 16.005 ± 1.96 +
10 10
= (−0.010, 0.030)

The 95% confidence interval on µ1 − µ2 is (−0.010, 0.030), in which 0 is included.


We have sufficient evidence that both macines fill to the same mean net volume.

6
5. Data on pH for 16 random batches of low and high volt electrolyte were collected. The
data are given by
Low volt: 7.78, 5.77, 7.08, 6.75, 7.09, 8.27, 6.5, 5.16, 6.81, 7.28, 7.88,
7.87, 7.2, 5.95, 6.58, 6.99
high volt: 4.54, 5.04, 5.07, 6.18, 8.62, 6.28, 7.41, 6.17, 6.25, 4.25, 6.08,
7.23, 4.68, 6.19, 5.85, 5.83

(a) Population variances σ12 and σ22 are unknown. Do you think it is reasonable to assume
σ12 = σ22 ? Let’s figure it out! First, draw a side-by-side boxplot in R. Based on the
boxplot, do you believe σ12 = σ22 ?
low <- c(7.78,5.77,7.08,6.75,7.09,8.27,6.5,5.16,6.81,7.28,7.88,7.87,7.2,
5.95,6.58,6.99)
high <- c(4.54,5.04,5.07,6.18,8.62,6.28,7.41,6.17,6.25,4.25,6.08,7.23,
4.68,6.19,5.85,5.83)
boxplot(low,high,names=c("low","high"),col="grey")
The width of two boxes are similar, which indicates that their variances might be
the same.

(b) Now, let’s conduct a formal test of

H0 : σ12 /σ22 = 1
Ha : σ12 /σ22 6= 1

using the following R code.


var.test(low, high)

7
What is the p-value of the testing result? With significance level 0.05, do you reject
H0 or fail to reject H0 ? Is the result consist to the one you get from (a)?

Here is the R output:


F test to compare two variances

data: low and high


F = 0.5338, num df = 15, denom df = 15, p-value = 0.2356
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.1865103 1.5278126
sample estimates:
ratio of variances
0.5338097
p-value is 0.2356, which is greater than 0.05. This means that we fail to reject the
H0 , therefore, we have sufficient evidence to say two variances are the same. The
result is consist to (a).
(c) Assuming the two samples are independent. The engineer want to test that the low
volt average pH is greater than the high volt average pH. Let µL be the average pH
of low volt electrolyte and µH be the average pH of high volt electrolyte. State the
null and alternative hypotheses.

H0 : µL − µH = 0
Ha : µL − µH > 0

(d) Calculate the appropriate test statistic for the test. The sample means and sample
variances can be computed using R.

> mean(low)
[1] 6.935
> var(low)
[1] 0.6896
> mean(high)
[1] 5.979375
> var(high)
[1] 1.291846

R output tells us that ȳ1 = 6.9350, ȳ2 = 5.9793, S12 = 0.6896, S22 = 1.2918. We need
to calculate the pooled variance Sp2 first, because we assume two variances are the
same.
(n1 − 1)S12 + (n2 − 1)S22 15 × 0.6896 + 15 × 1.2918
Sp2 = = = 0.9907
n1 + n2 − 2 16 + 16 − 2
The test statistic is
ȳ − ȳ2 6.9350 − 5.9793
t0 = q1 =√ q = 2.716
Sp n11 + n12 0.9907 × 16 1 1
+ 16

8
(e) Use R to calculate the p-value of the test. (Hint: P (Tn1 +n2 −2 > t0 ) can be calculated
using R with command 1 - pt(t0 , n1 + n2 − 2)).

> 1 - pt(2.716, 16+16-2)


[1] 0.005428769

By R result, p-value is approx. 0.005.


(f) Make decision and state your conclusion at a 0.05 level of significance.

Since 0.005 < 0.05, we reject the H0 , and conclude with 95% confidence that low
volt average pH is greater than the high volt average pH.
(g) Use t.test in R to check your work.
t.test(low,high,alternative="greater",paired = FALSE, var.equal = TRUE)
We achieve the same p-value as the R output.
Two Sample t-test

data: low and high


t = 2.7155, df = 30, p-value = 0.005435
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
0.3583425 Inf
sample estimates:
mean of x mean of y
6.935000 5.979375
(h) Construct a two-sided 95% confidence interval of µL − µH by hand. Provide a
practical interpretation of this interval.

r
1 1
C.I. = (ȳ1 − ȳ2 ) ± tα/2,n1 +n2 −2 s2p ( + )
n n2
r 1
1 1
= (6.9350 − 5.9793) ± 2.042 0.9907( + )
16 16
= (0.237, 1.674)

The 95% confidence interval of µL − µH is (0.237, 1.674), in which 0 is not included, we


claim that we have sufficient evidence to say the low volt average pH is greater than the
high volt average pH.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy