Chap 8
Chap 8
One important problem of statistical inference is the estimation of unknown population parameters from the
corresponding sample statistics. Here the parameter of interest is the population mean,µ ,which is to be estimated
.we take a simple random sample of size n and get observations x1,x2 ……xn. Then the quantity statistics is
∑ni 1 xi
̅ an estimator of the population mean µ. The sample mean ̅ is unbiased estimator since the mean of the
n
sampling distribution of ̅ is equal to the population mean; E ( ̅ ) = µ
For most part, the sample mean will be somewhat different from the population mean due to sampling error. There
for we can ask a question “how good is a point estimate?”The answer is that there is no way of knowing how close
the estimate is to the population mean. This answer places some doubt on the accuracy of point estimates and to
overcome such a problem we deal with interval estimates; i.e. we can accompany a point estimate by an interval
estimate.
An interval estimate of population mean is the range of values used to estimate the population mean. When an
interval estimate is made certain probability statement is done. The confidence interval for the mean is a specific
interval estimate of the population mean which is determine by using data obtained from a sample and a specific
confidence level of the estimate.
The confidence level of an interval estimate of µ is the probability that the interval estimate will contain the
parameter. That is, p (aL< µ<aU) =1-α, where 1-α is confidence level, aL and aU are lower and upper confidence
limits respectively. The interval aL< µ<aU computed from the selected sample is the called a (1-α) 100%confidence
interval. When we try to give an interval estimate for µ we need to consider several conditions on the sample size n
and the population variance. That is, whether the sample size is large or small (n≥30 or n ), or where the
population variance is known or not known.
1
The sampling distribution of the mean ̅ can be transformed into the standared normal distribution by:
̅
Z= the magnitude of the error ( ̅ -µ) is less than the maximum error where = . That is,
√ √
µ= ̅ ±Error ( ) then – ≤ ̅ - µ ≤
√
≤ ̅- µ ≤ √
̅
√
≤ µ ≤̅ √
̅ and ̅ are known as the lower and upper confidence limits respectively. The
√ √
interval ̅
√
,̅ ) is called the (1-α) 100% confidence interval (Interval estimate) of µ.
√
The most commonly used α’s 0.1, 0.05 and 0.01so that (1-α) 100% gives the 90%, 95% and 99% interval estimate,
respectively.
Example 1: the president of a certain university wishes to estimate the average age of the students
currently enrolled. From past studies the standard deviation is known to be 2 years. A random sample of 50
students is selected and the mean is found to be 23.2 years. Find the 95%confidence interval for the population
mean.
Conclusion: we are 95% confident or sure that the true average age of the students in this university will be
contained between 22.65 years and 23.75 years.
Example 2: in a certain study, the sample mean ̅ =18.85, the sample size n=80 and standard deviation
.construct the 90% confidence interval for µ.
For 90% confidence interval for µ, we have (1-α) =0.9, α=0.1, α/2=0.05 and =1.645
2
We are 90% sure that the true but unknown population mean will be contained within the interval (17.83, 19.87).
Exercise: construct the 95% and 99% confidence interval for the above example and compare the results.
In general, one should note that as the level of confidence increase the interval gets wider and wider.
The (1-α) 100%confidence interval for µ, when is unknown and n≥30, is given by
a) Find the 90% confidence interval for the true average expenditure.
b) If an average of 18.6 minutes per visit with standard deviation of 5 minutes, find the 90% confidence
interval for the true mean time a female spends in grocery shopping.
a) Let µ1 be the true average expenditure, a female spends in grocery shopping per visit.
The (1-α) 100% confidence interval for µ1 is given by ̅ . For the 90% confidence interval, we
√
have α=0.1, α/2=0.05, =1.645 then ̅ =23.45±0.658=(22.79,24.11)
√
We are 90% sure (confidence) that true mean average expenditure of a female per visit will be contained in
the interval (22.79, 24.11).
b) Let the true mean time a female spends in grocery per visit be µ2.the respective point estimate is given
by ̅ =18.6 minutes. The 90% confidence interval will be
̅ z =18.6±1.645 =18.6±1.175= (17.42, 19.78).
√ √
The true mean time will be contained within the interval (17.42, 19.78) minutes in 90% of the time.
When the sample size is less than 30 the central limit theorem does not apply. But we can make one basic
assumption. That is, the parent population is normal, which means we are sampling from a normal population.
̅
If this assumption that is met, then the sampling distribution of the mean is normal, but the quality will
⁄
√
still have the standard normal distribution.
̅
If X N (µ, 2) and 2is known and n<30, then Z= N(0,1)
⁄
√
Example: the pulse rate of 12 patients increased on the average by 22.33 beats per minute .from previous
study it is known that for this population is 4.28.construct the 99% confidence interval for the mean.
Solution: given: n=12<30, =4.28, ̅=22.33,α=0.01 ,Z0.005=2.575
µ= ̅ 22.3±2.575x ⁄ =22.33±3.18 19.15< µ <25.51
√ √
3
IV). for small sample case and when is unknown
̅
if n<30 and we are sampling from a normal population whose variance is unknown, then the quantity will
⁄
√
have a t-distribution with (n-1) degrees of freedom and the(1-α)100% confidence interval for the population mean
will be given by ̅±tα/2,n-1 ⁄ .
√
Example: The IQ’S of 16 students from a certain class showed a mean of 107 with standard deviation of
10.construct the 90% confidence interval for the mean.
Given: n=16, ̅ =107,s=10,α=0.1 , tα/2,n-1=t0.05,15=1.753
µ= ̅ ±tα/2,n-1 ⁄ =107±1.753x 10⁄ 107±4.3825=(102.6,111.4)
√ √
8.2 point and interval estimation of the proportion: large sample size
Sometimes the need would be to estimate the population or percentage. The sample proportion ̂ is a sample
statistic, and it possesses a sampling distribution. We know that for large samples:
The sampling distribution of ̂ is approximately normal.
The mean µ ̂ of the sampling distribution of ̂ is equal to the population proportion p.
̂ is given as√ q⁄ ,
̂̂
The standard deviation ̂ of the sampling distribution of the sample proportion
where q= 1-p.
The sample considered to be large if np and nq are both greater than 5. When estimating the value of a population
proportion, we don’t know the values of p and q, so we cannot compute ̂ . We use the values of s ̂ as an estimate
of the ̂ , where s ̂ is calculated as
s ̂ =√ ̂ q̂⁄ .the value of the sample proportion ̂ computed from a sample is a point estimate of the population
proportion p.
Calculate s ̂ =√ q⁄ = √
̂̂
0.032 .the confidence interval for p is
Example 2: a survey of voters was conducted and 52% said that they would prefer a candidate from party A’S.
Assuming that the sample size for this study was 1500, construct a 99% confidence interval for the proportion of
all voters who hold this view.
Solution: let p is proportion of all voters who prefer a candidate from party A’s and ̂ is the sample
proportion. From the given information,
4
Calculate s ̂ =√ q⁄ =√
̂̂
⁄ =0.013 the confidence interval for p is
p= ̂ ±z s ̂ =0.52±2.58x0.013=0.52±0.034= (0.486, 0.554) or 48.6% to 55.4%.
We have observed that =Z is the maximum error of estimate for µ. Suppose we predetermine the size of the
√
maximum error and want to determine the size of the sample that will yield this maximum error. Given the
confidence level and the standard deviation of the population, the sample size that will produce a predetermined
maximum error of the confidence interval estimate of µ is n= . IF we do not know find the sample standard
deviation s and substitute s for in the formula.
Example: a university dean wishes to estimate the average number of hours his part-time instructors teach per
week. The standard deviation from a previous study is 2.6 hours. How large a sample must be selected if he wants
to be 99% confident of finding whether the true mean differs from the sample mean by 1 hour?
The maximum error E of the interval estimation of the population proportion is = z ̂ =Z√ . given the
confidence level and the value of p and q,the sample size that will produce a predetermined maximum error of the
confidence interval estimate of p is n= . In most cases, the value of p and q are not known to us. In such a
situation, we can choose one of the following alternatives.
Use the most conservative estimate of the sample size n by using p=0.5 and q=0.5 since the product
of these two is greater than the product of any other pair of values for p and q.
Use a preliminary sample and calculate ̂ and ̂ for this sample. Then we use these values of ̂ and ̂
to find.
Example: The EZ Company wants to estimate the proportion of defective items produced by a machine with 0.02 of
the population proportion for a 95% confidence level. Suppose a preliminary sample of 200 items showed that 7
percent of the items produced on this machine are defective, how large a sample should EZ company select?
Solution: the value of z for 95% confidence level is 1.96,E=0.02, ̂ =0.07, ̂=1- ̂ 1-0.07=0.93
̂̂ 1 96 0 07 0 93
n= =625.22
0 02
5
8.4 Hypothesis Testing about the Mean
Introduction:
To establish and to investigate the relationship between the sample measure and population characteristics
(parameter), we make use of hypothesis testing.
Definition: hypothesis testing is a rule or a procedure for determining whether or not an assertion or a statement
about a population parameter (in this case the mean) is true.
Suppose we have a certain sample taken from the population and the sample mean is ̅, x we set up an assertion
that it came from a population with mean µ.This implies that the discrepancy between x̅and µ is only due to
chance: i.e. in the long run, repeated sampling will produce data which will result in a mean discrepancy between x̅
and µ of zero.
We can try to determine the probability of statistical probability of getting a discrepancy between x̅ and µ as
large as or larger than the actual one .this can be done from the knowledge of the sampling distribution ofx̅. this
probability is preferred to as the level of significance. Then we can conclude that either the assertion (hypothesis)
is true or the hypothesis is false.
Steps to perform hypothesis testing:
1. Write the original claim and identify whether it is the null hypothesis or the alternative hypothesis.
2. Write the null and alternative hypothesis .use the alternative hypothesis to identify the type of test.
3. Write down all information from the problem.
4. Compute the test statistic
5. Find the critical value using the tables
6. Make a decision to reject or fail to reject the null hypothesis. A picture showing the critical value and test
statistic may be useful.
7. Write the conclusion.
If - Z < Zcal < Z , we accept Ho. this means that we accept Ho if Zcal falls in the acceptance region; we reject HO
if Zcal falls in the rejection regions.
If Zcal=Zcritical, we reserve our judgment of accepting or rejecting HO: we have to increase the sample size in order
to come up with the conclusion of accepting or rejecting HO.
̅
When n 30 and is unknown Z= N (0, 1)
⁄
√
Therefore the calculated value of the test statistics will be compared with ±Z - Z or Z depending on whether
we have two sided or one sided tests respectively.hwe follow the same procedure as given above in (a).
Examples: 1) according to norms established for a reading comprehension test, 8th grader should have an
average of 83.2 with standard deviation of 8.6.if 36 randomly selected students from Tepi School averaged
7
88.7,test the null hypothesis that µ=83.2 against µ>83.2 at α=0.01 and thus check the directress of claim that her
8th grade students are above average.
Given: n=35(large sample case) and is unknown in which case we substitute it by S=1, ̅
µ0=25 Years (claim)
Conclusion: the average age of the customers is different from 25 years; i.e., the shopkeeper’s claim
is not true at 5% level of significance.
We use standard normal distribution (Z) as long as the variable is normally distributed and is known, which is
similar to (a) above.
c) Small sample test(n<30):when is unknown:
When X N (µ, ), n is small, is unknown in testing HO against any alternative, the calculated value of the test
̅
statistic is Tcal= tn-1 .here since we have estimated by s and therefore the degree of freedom will be
⁄
√
(n-1). Otherwise the critical values are ±tα/2,n-1 ; - tα,n-1 ; tα,n-1 depending on whether H1: µ ≠ µ0; H1: µ < µ0; H1: µ
> µ0;respectively.
Example: a job placement director claims that the average starting salary for statistics graduates is birr
24000(yearly).a random sample of 10 statistics graduates had a mean of birr 23450 and standard deviation of birr
400.test HO:µ=birr 24000 versus the alternative H1:µ≠24000 at α=0.05.
Test statistic is t with n-1 d.f., since n<30 and since δ is unknown.
8
̅
tcal= =-4.345
⁄ ⁄
√ √
If the observations on various items or objects are categorized into two classes (binomial population) we often
want to test the hypothesis, whether the proportion of items in a particular class is Po or not. Thus for binomial
population .the hypothesis
Ho: P=Po versus H1: P≠ Po or H1: P>Po or H1: P<Po
The value of the test statistics Z for the sample proportion ̂ is computed as
̂
Z= ̂
~N (0, 1) where ̂ √ . The value of p is used in this formula is the one used in the null hypothesis .the
value of q=1-p
Example: to test the conjecture (guess) of the management that 60% of employees favor a new bonus scheme, a
sample of 150 employees was drawn and their opinion was taken whether they favored it or not. Only 55
employees out of 150 favored the new bonus scheme. Does this finding indicate that the proportion of employees
favoring the new bonus scheme is different from the management’s claim? (Use )
Solution: let p denote the population proportion o employees favoring the new bonus scheme.
̂
Z= ̂
where δ ̂ =√ =√ =0.04 and ̂ = =0.367
Z= =-5.825
9
B B1 B2 . . . Bc Total
A
A1 n11 n12 . . . n1c n1.
A2 n21 n22 . . . n2c n2.
. . . . . . . .
. . . . . . . .
. . . . . . . .
Ar nr1 nr2 . . . nrc nr.
Total n.1 n.2 . . . n.c n
The following notation will be used
eij= x n.j is called the expected frequency of cell (i , j), i=1,2,…, r and j=1,2,3…, c the appropriate test
statistics for the null hypothesis of association is
= ∑ ∑ the value of becomes small if the discrepancy between nij and eij is small
and it becomes large if the discrepancy is large to test the hypothesis at α level of significance, we compute
the value from sample observation and compare it with (α) where v=(r-1)(c-1) degree of
freedom. And the hypothesis is given by
Ho: two variables are not associated each other.
H1: two variables are associated.
Then, we reject Ho at α level of significance if > (α) otherwise we do not reject Ho.
Example: a researcher wishes to determine whether there is a relation between the gender of an
individual and the amount of alcohol consumed a sample of 68 people is selected and the following data are
obtained.
Alcohol consumption
gender low moderate high Total
male 10 9 8 27
female 13 16 12 41
Total 23 25 20 68
At α=0.1 can the researcher conclude that alcohol consumption is related to gender.
Solution: state the hypothesis and identify the claim
Ho: the amount of alcohol that a person consumes is not associated with individual gender.
H1: The amount of alcohol that a person consumes is associated with individual gender
Critical value ( )= ( ) = 4.605
Compute the test statistic. First compute the expected value.
e11= =9.13 e12= =9.93 e13= 7.94
e21= =15.07 e23= =12.06
= ∑ ∑ = + +……. + =0.283
Make the decision: the decision is not reject the null hypothesis since =0.285< ( )
=4.6o5
Summarize the result: there is no enough evidence to support the claim that the amount of alcohol a
person consumes is associated with the individual’s gender.
10