0% found this document useful (0 votes)
357 views15 pages

Unit 2 Statistical Estimation

- The document discusses statistical estimation methods used to estimate unknown population parameters based on sample data. - There are two main types of estimates: point estimates, which are single values, and interval estimates, which provide a range of values within which the parameter is expected to lie. - The values used to estimate population parameters are called estimators, while the actual values obtained from applying estimators to sample data are called estimates.

Uploaded by

Ebsa Ademe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
357 views15 pages

Unit 2 Statistical Estimation

- The document discusses statistical estimation methods used to estimate unknown population parameters based on sample data. - There are two main types of estimates: point estimates, which are single values, and interval estimates, which provide a range of values within which the parameter is expected to lie. - The values used to estimate population parameters are called estimators, while the actual values obtained from applying estimators to sample data are called estimates.

Uploaded by

Ebsa Ademe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 15

UNIT TWO

STATISTICAL ESTIMATIONS
2.1 INTRODUCTION
Let X1, X2, ……Xn be a random sample of size n from a population of N units and x 1, x2,
……xn be the corresponding observed values. The values of the unknown parameter(s)
should be estimated from the set of the sample observations. These estimates are often single
valued which are known as point estimates.
estimates. Also, many times an interval is to be estimated in
which the parameter is expected to lie with a certain level of confidence.
In most statistical analysis, population parameters are usually unknown and have to be
estimated form a sample. As such the methods for estimating the population parameters
assume an important role in statistical analysis. The random variables (such as and )

used to estimate population parameters such as  and .

2.2 POINT AND INTERVAL ESTIMATION


The estimate of a population parameter may be one single value or it could be a range of
values. In the former case it is referred as point estimate,
estimate, whereas in the latter case it is
termed as interval estimate.
estimate. The investigator usually makes these two types of estimates
through sampling analysis. While making estimates of population parameters, the researcher
can give only the best point estimate or else he shall have to speak in terms of intervals and
probabilities for he can never estimate with certainty the exact values of population
parameters. In short:
Point estimate:
estimate: - One value (called a point) that is used to estimate a population parameter.
For example, the sample mean is the point estimator of the population mean .
Interval estimate:
estimate: - States the range within which a population parameter probably lies. The
interval with in which a population parameter is expected to occur is called confidence
interval.
interval.
2.3 DISTINCTION BETWEEN ESTIMATOR AND ESTIMATE

In the theory of estimation, the population with which our final interest is associated contains
certain quantities whose values are unknown. For example, a manufacturing process is
producing a certain machinery part. The manufacturer likes to know what percent of his

17
product is defective. In this example, the unknown is the proportion of defectives P, which is
the unknown characteristic of the population known as parameter. Since the particular
parameter under question is unknown we choose a sample of suitable size and form a suitable
function of the observations. This function is known as an estimator or a statistic. The
unknown parameter is estimated by the estimator obtained from the sample. The numerical
value of the estimator is known as an estimate.
estimate. For example, the random variables (such as
and ) used to estimate population parameters such as  and are conventionally

called as ‘estimators’, while specific values of these (such as = 110 or =15) are
referred to as ‘estimates’ of the population parameters. The estimate of a population
parameter may be one single value or it could be a range of values.
2.4 PROPERTIES OF ESTIMATORS
In estimation theory, one must know the various properties of a good estimator so that he can
select appropriate estimators for his study. One must know that a good estimator possesses
the following properties:
i. Unbiasedness:
Unbiasedness: An estimator is said to be unbiased if the expected value of the
estimator is equal to the parameter being estimated. An estimator should on the average
be equal to the value of the parameter being estimated. This is popularly known as the
property of unbiasedness. The sample mean is the most widely used estimator
because of the fact that it provides an unbiased estimate of the population mean ( ). i.e.
E( )=
ii. Consistency:
Consistency: The notation of consistency is mainly concerned with infinite population.
An estimator tn = t(x1, x2, ……, xn) computed from a random sample of n values, is said
to be a consistent estimator for a population parameter , if it converges in probability
to  as n tends to infinity. i.e., t n is said to be a consistent estimator of  if for every
>0, the following condition holds:
lim P{ -  < tn <  + } = 1
n
or lim P{|+ n - | > }= 0
n

18
The property of consistency ensures that the difference between t n and  would
become smaller in probability sense as n increases indefinitely. In other words, it
would give increasing accuracy with the increasing of the size of the sample
iii. Efficiency:
Efficiency: - Let for large samples two consistent estimators t n and be both

distributed asymptotically normally about the true value of the parameter  and with
variance and respectively. This will usually be the case in virtue of the
central limit theorem. Then tn is said to be more efficient estimator than if .

i.e. if var (tn) < var ( ) for all n, then tn will be said to be more efficient than for all
n. The estimator with the smaller variance will be grouped more closely round the true
value and on the average, will deviated less from the true value than the estimator with
large variance and thus it may be reasonably regarded as more efficient than the other.

If we can find a consistent estimator tn whose variance is less than that of all other
consistent estimators for all n, then tn will be said to be the most efficient and the
efficiency E, of any other estimator is defined as the ratio of the variance of the most
efficient estimator to the variance of the given estimator. The efficiency of statistic
represents, in large samples, the fraction of the relevant information available in the
sample, which is utilized by the statistic in question.

iv. Sufficiency:
Sufficiency: - An estimator tn is said to be sufficient for estimating a population
parameter , if it contains all the information in the samples about the parameter. i.e.
the estimator tn should use as much as possible the information available from the
sample.

2.5 CONFIDENCE INTERVAL ESTIMATION


As defined earlier, interval estimate states the range within which a population parameter
probably lies. The interval within which a population parameter is expected to occur is called
a confidence interval.
interval. Suppose we want to estimate a population parameter  on the basis of
the sample observations we compute two numbers L and R, corresponding to a pre assigned
probability 1-  so that,
P(L  R) = 1 - .

19
This procedure of estimating  is known as interval estimation.
estimation. The pre assigned probability
1 -  indicates the degree of confidence we like to place in the estimation procedure and is
known as confidence co-efficient.
co-efficient. The quantities L and R are known as lower and upper
confidence limits respectively. The quantity W = R – L is known as the length of the
confidence interval. For example, the confidence interval for the population mean is the
interval that has a high probability of containing the population mean, . Two confidence
intervals are used extensively: the 95 percent confidence interval and the 99 percent
confidence interval. Other confidence intervals may also be used.
The 95 percent confidence interval means that about 95 percent of the similarly constructed
intervals will contain the parameter being estimated. If we use the 99 percent level of
confidence, then we expect about 99 percent of the intervals to contain the parameter being
estimated.
Another interpretation of the 95 percent confidence interval is that 95 percent of the sample
means for a specified sample size will be within 1.96 standard deviations of the hypothesized
population mean. Similarly, for a 99 percent confidence interval, 99 percent of the sample
means will be within 2.58 standard deviations of the hypothesized population mean.

-2.58 -1.96 0 1.96 2.58 Z value


95%
99%

Where the values 1.96 and 2.58 are the Z values corresponding to the 95 percent and 99%
level of confidence. i.e. p{-1.96Z1.96} = 0.95 and P{-2.58  Z  2.58} = 0.99 or the area
under the standard normal between –1.96 and 1.96 equals 0.95 and the area between –2.58
and 2.58 is 0.99. Since normal distribution is symmetric, the area to the right and left of the
mean is equal. For a 95 percent confidence level, the area to the right and left of the mean is

= 0.4750.

20
As mentioned earlier, in interval estimation we want to find an interval with lower end point
L and upper end point R so that the probability or confidence level is (1 - ) that it will
contain the unknown population parameter  i.e. P.(L    R) = 1 -
Where L and R are the lower and upper confidence limits and 1 -  is the associated
probability or confidence level of estimation. For a 95 percent confidence level, the
probability is 0.95 that the population parameter is between L and R. i.e. P(L    R) = 1 -
= 0.95.
If we solve for , 1 -  = 0.95   = 0.05. Which is the luck of confidence. i.e. if we are 95
percent confident then  = 0.05 = 5 percent the luck of confidence which is also known as
level of significance. For a 99 percent confidence interval, equating 1-  = 0.99  = 0.01.
Thus for a 99 percent confidence interval,  (which is level of significance) = 0.01.
Let Z be a standard normal variate. Let Z  be a number so that for a given probability ,
P(Z>Z) = . Then Z will be referred to as the upper  percent point.
For example, if  = 0.05 we have,
P(Z>Z0.05 ) = 0.05

area =  = 0.05

0 Z0.055
Knowing that the area under the normal curve equals 1 and the area to right of Z = 0 equals
0.5, then the area between Z = 0 and Z0.05 = 0.5 – 0.05 = 0.45
thus P(Z>Z0.05) = 0.05
 0.5 – A (Z0.05) = 0.05
 A (Z0.05) = 0.45
Thus from the table of standard normal variate Z0.05 = 1.64.
A (Z0.05) is the area under the curve form 0 to Z0.05.
: . If  = 0.05 then Z = 1.64.
Similarly, if  = 0.05, then /2 = 0.025

21
And Z/2 = Z0.025 = P(Z>Z0.025) = 0.025

 - A(Z0.025) = 0.025

 A (Z0.025) = 0.4750.
i.e. the area between 0 and Z0.025 = 0.4750 and reading form the table, if the area under the
curve = 0.4750, then the Z value = 1.96. Thus Z/2 = Z0.025 = 1.96 for  = 0.05 and if  = 0.01,
then Z/2 = Z0.005 is the P(Z > Z/2) = /2
 P(Z > Z0.005) = 0.005

- A(Z0.005) = 0.005

 A(Z0.005) = 0.4950
i.e the area between Z = 0 and Z0.005 = 0.4950 thus reading from the table of standard
normal, Z0.005 = 2.58

As a summary;
If  = 5% = 0.05 then confidence level is 95% and
/2 = 0.025 and Z/2 = Z0.025 = 1.96
If  = 1% = 0.01 then confidence level is 99% and
/2 = 0.005 and Z/2 = Z0.005 = 2.58

2.5 CONFIDENCE INTERVAL FOR A POPULATION MEAN

Case I. Sampling from a normally distributed population with known population


variance
As already discussed in the previous unit, the sampling distribution of the mean taken
form a normal distribution is also normal with mean  and variance .

Hence the variate is distributed as a standard normal with mean O and variance 1.

i.e. ~ N(0, 1).

From the area under the standard normal curve, one can observe that the probability that the
value of the sample statistic falls in the range –2.58 and 2.58 us 0.99. i.e

22
P{-2.58 2.58} = 0.99

Or P{ -2.58 . } = 0.99

This means that it is very likely, the probability being 0.99 that the interval ( - 2.58
, + 2.58 ) will include . In other words, if a very large number of samples, each
of size n, are taken from the population and if for each such sample the above interval is
determined, then in about 99% of the cases the interval will include , while in the
remaining 1% it will fail to do so. One will therefore, be justified in saying, on the basis of a
given sample, that lies between - 2.58 and + 2.58 . , the limits being
computed form the observations in hand. These are called 99% confidence limits to , 0.99
being the confidence coefficient.
Recalling that Z denotes the value of Z for which the area under the standard normal curve
to its right is equal to , Analogously, Z/2 denotes the value of Z for which the area to its
right is /2, and -Z/2 denotes the value of which the area to its left is /2.

Thus P(-Z/2 < Z/2) = 1-

 P{-Z/2 < < Z/2} = 1-

 P{ - Z/2 < < + Z/2 } = 1-


Thus the (1-) 100% confidence interval for the population mean is given by:
Z/2 .

For example the 95% confidence interval for the population mean is given by:
P{-Z/2 < Z < Z/2} = 1- = 0.95
i.e 1- = 0.95   = 0.05 and /2 = 0.025
Thus the confidence interval is
Z/2 .  Z0.025 and from the standard normal table, Z0.025 = 1.96 thuse

the confidence interval is 1.96 . i.e. the population mean is expected in the
interval ( 1.96 , 1.96 ) with probability 0.95. i.e. we are 95%
confident that lies in the interval ( - 1.96 , - 1.96 )

23
Case II.
II. Large sample confidence interval for the population mean and the
distribution of the population from which the sample is drawn is unknown.
1) It is known, if the sample size is large say n > 30, then the distribution of the static

is expected to follow the standard normal variate (central limit theorem), thus

the (1-) 100% confidence interval is given by


Z/2 for  is known

2) If  is not known, estimating the population standard deviation by the sample

standard deviation S = , then the (1-) 100% confidence interval for

the population mean is given by Z/2

Case III. Small sample confidence interval for the population mean: Sampling from a
normal distribution and 2 is unknown and n < 30.

In this case estimating the population variance 2 by the sample variance S2 = ,

the static follows a student ‘t’ distribution with n-1 degree of freedom. i.e.

~ tn-1 and as the t –distribution is also symmetric as the normal distribution, the (1- )

100% confidence interval for the population in this case is given by:
t/2(n-1) . i.e. is expected in the interval ( - t/2(n-1) ,

+ t/2(n-1) ) with a probability of 1-.

Case IV. Finite population,  is not known and n is large > 30.

In this case, the population variance 2 is estimated by the sample variance S2 and the

standard error of the mean is given by where the term is called

the finite population correction and the (1-) 100% confidence interval for the population
mean is given by:

24
Z/2{ } - means sample variance

Example 1.
1. From a random sample of 36 Addis Ababa civil service personnel, the mean age
and the sample standard deviation were found to be 40 years and 4.5 years respectively.
Construct a 95 percent confidence interval for the mean age of civil servants in Addis.

Solution: The given information can be written as under:


n = 36, = 40 years, = 4.5 years.
95% confidence interval means  = 0.05 and
/2= 0.025 and Z/2 = Z0.025 = 1.96 (as per the normal curve area table).
Thus, 95% confidence interval for the mean age of population is:

Z/2

or 40 1.96

or 40 (1.96) (0.75)
or 40 1.47 years

i.e we are 95% confident that the population mean is expected in the range (38.53, 41.47)

Example 2:
2: The foreman if ABC Mining Company has estimated the average quantity of
iron ore extracted to be 36.8 tons per shift and the sample standard deviation to be 2.8 tons
per shift, based upon a random selection of 4 shifts. Construct a 90% confidence interval
around this estimate.

Solution: As the standard deviation of population is not known and the sample size is small,
we shall use the t-distribution for finding the required confidence interval about the
population mean. The given information can be written as:
= 36.8 tons per shift, S or = 2.8 tons per shift n = 4 degrees of freedom = n-1 = 4-1 = 3
and the critical value of ‘t’ for 90 percent confidence interval or at 10% level of significance
is  = 10% = 0.1 and /2 = 0.05 and t/2 = t0.05 = 2.353 (as per the table of t-distribution)
Thus, 90% confidence interval for population mean is

25
t/2(n-1) = 36.8 2.353 x

= 36.8 3.294 tons per shift


= (33.506, 40.094)

Example 3:
3: In a random selection of 64 of the 2400 intersections in a city, the mean number
of car accidents per year was 3.2 and the sample standard deviation was 0.8. Obtain the 90%
confidence interval for the mean number of accidents per intersection per year.
Solution: Given
N = 2400 (this means the population is finite)
n = 64 (large)
= 3.2
= 0.8,  = 0.1  /2 = 0.05 and Z0.05 = 1.64
using the confidence interval formula for large sample and finite population given by:

Z/2{ }

Substituting the given values, one can obtain the interval as 3.2 0.16 accidents per
intersection.
2.7 DETERMINATION OF SAMPLE SIZE FOR ESTIMATING MEANS

The sample sizes in the previous problems were always given. Now we are going to
determine an appropriate sample size.
In the interval estimation of the population mean , we have seen that the confidence
interval is either Z/2 if is known or Z/2 if is not
known. In this case the deviation from given by Z/2 or Z/2 is
known as the maximum allowable error for (1-) 100% confidence interval. If we denote this
error by E, then
E = Z/2 if is known and
E = Z/2 if is not known.
From these, solving for n,

26
in the first case or n = and n should always be

rounded up to the next integer or one can also use the following relation in solving for n.

i.e.

For a finite population: - The sample size estimating the mean is obtained form the
confidence interval for the population mean given by

Z/2

where the total allowable error E equals

E=

From this solving for the sample size n, we get Determination of


n= sample size for a
finite population

Example 1: We want to estimate the population mean within 5, with a 99% level of
confidence. The population standard deviation is estimated to be 15. How large a
sample is required?
Solution: Given
= 15
E=5
1- = 0.99   = 0.01  /2 = 0.005
n=?

n=

n = 60

Example 2: Determine the size of the sample for estimating the true weight of a cereal
containers for the universe with N = 5000 on the basis of the following information.

27
1) The variance of weight = 4 ounces on the basis of past records.
2) Estimate should be within 0.8 ounces of the true average weight with 99%
probability.
Given
N = 5000 – finite population
=2
E = 0.8
 = 0.01
Z/2 = Z0.005 = 2.58

n=

n = 41

2.8 CONFIDENCE INTERVAL FOR A POPULATION PROPORTIONS

The theory and procedure for determining a point estimator and an interval estimator for a
population proportion are quite similar to those described in the previous section. A point
estimate for the population proportion is found by dividing the number of success in the
sample by the total number sampled. Suppose 100 of 400 sampled said they liked a new cola
they tested better than their regular cola. The best estimate of the population proportion

favoring the new cola is .25 or 25 percent, found by . Note that a proportion is based on

a count of the number of successes relative to the total number sampled.

In case we want to construct confidence interval to estimate a population proportion, we


should use the binomial distribution with the mean of population ( ) = n.p where n =
number of trials, p = probability of success in any of the trials and population standard
deviation = where q is the probability of failure = 1-p. As the sample size increases,

28
the binomial distribution approaches normal distribution, which we can use for our purpose
of estimating a population proportion. The mean of the sampling distribution of the
proportion of success ( ) is taken as equal to p and the standard deviation for the
proportion of successes, also known as the standard error of proportion, is taken as equal to
. But when population proportion is unknown, then we can estimate the population
parameters by substituting the corresponding sample statistics p and q in the formula for the
standard error of proportion to obtain the estimated standard error of the proportion as shown
below:

Using the above estimated standard error of proportion we can work out the confidence
interval for population proportion as:

Where p = sample proportion of success;


q = 1-p probability of failure
n = number of trials (size of the sample);
Z/2 = standard variate for given confidence level (as per normal curve area table)
If population is finite, then the standard error of the population proportion is given by

where is the finite population correction factor.

Example: A market research survey in which 64 consumers were contacted states that 64%
of all consumers of a certain product were motivated by the product’s advertising. Find the
confidence limits for the proportion of consumers motivated by advertising in the population,
given a confidence level equal to 0.95.

Solution: - Given
n = 64, p = 64% = 0.64, q = 1-p = 1-.64 = .36
and  = 0.05 = /2 = 0.025 and Z/2 = 1.96

29
Thus, 95% confidence interval for the proportion, of consumers motivated by advertising in

the population is = .64 1.96 = (.5224, .7576)

2.9 SAMPLE SIZE DETERMINATION IN ESTIMATING PROPORTION

The procedure just used is applicable to determining the sample size when proportions are
involved. Three things must be specified:
1) You, the researcher, must decide on the level of confidence- usually .95 or .99
2) You must indicate how precise the estimate of the population proportion must be
3) the population proportion p, must be either approximated from past experience or
approximated from a small pilot survey of, say, 50 or 100.

From the confidence interval estimation of population proportion

, The maximum allowable error E = . Solving for n, we get

n = pq .

where p – is the estimated proportion based on the


Sample past
size experience or a pilot survey.
determination
n = p(1-p)
Z/2 – is the Z value associated with for
the proportion
degree of confidence selected.
E – is the maximum allowable error the researcher will tolerate.

2.11 SUMMARY

The process of going from the known sample to the unknown population has been called
statistical inference. The basic problem of sampling theory usually presents itself in one of
two forms: (a) some feature of the population in which an enquirer is interested may be
completely unknown to him, and he may want to make a guess about this feature completely
on the basis of a random sample form the population. (b) Some information as to the feature
of the population may be available to the enquirer, and he may want to see whether the
information is tenable in the light of the random sample taken from the population. The first
type of problem is the problem of estimation which is discussed in this section so far and the
second one is the problem of testing of hypotheses, which will be dealt with in the next unit.

30
2.12 GLOSSARY

Bias – A possible consequence if certain members of the population are denied the chance to
be selected for the sample. As a result, the sample may not be representative of the
population.
Confidence co-efficient – The value 1 - 
Confidence interval – The interval in which the parameter lies
Estimator – A function of the random samples used to estimate the parameter
Proportion – A fraction or percentage of a sample or a population having a particular trait.
Estimate – The numerical value of the statistic.

31

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy