Unit 10
Unit 10
(EE3006/3006A)
1. Basics of Statistics
=> Statistics is the study of the collection, organization, analysis,
interpretation, and presentation of data.
Descriptive
(Graphs, tables, charts, ...)
Statistics
Inferential
Statistics
Hypothesis
Testing
(Statistics is inductive: particular -> general)
2
1
(EE3006/3006A)
(EE3006/3006A)
2
(EE3006/3006A)
ii.
(EE3006/3006A)
n( X ) i 1 ( X i )
n
Then,
Since all Xi have the same probability distribution, σXi2= σ2 and σX 2= σ2/n.
3
(EE3006/3006A)
(EE3006/3006A)
Example 1:
Find the standard deviation of the sample 3, 4, 5, 6, 6, and 7,
representing the number of trout caught by a random sample of 6
fishermen on June 19, 1996, at Lake Muskoka.
4
(EE3006/3006A)
σX
Sample mean tends towards the standard normal distribution even if the original
variables themselves are not normally distributed.
It is a key concept implying that the probabilistic and statistical methods for
normal distributions can be applicable to many problems involving other types of
distributions.
9
(EE3006/3006A)
Properties:
10
5
(EE3006/3006A)
Example 2:
An electrical firm manufactures light bulbs that have a length of life that
is approximately normally distributed, with the mean of 800 hours and a
standard deviation of 40 hours. Find the probability that a random
sample of 36 bulbs will have an average life of less than 785 hours.
6.67
(EE3006/3006A)
12
6
(EE3006/3006A)
2. Estimation Problems
Sampling: each
member of the
population has the
same chance of being
Population selected in the sample
Parameters
Random sample
Estimation
Statistics
13
(EE3006/3006A)
14
7
(EE3006/3006A)
15
(EE3006/3006A)
θL θU
8
(EE3006/3006A)
The probabilistic methods for normal distributions studied in the last Unit can be
applied to these estimation problems. 17
(EE3006/3006A)
Therefore, for the mean x of a random sample of the size n from a population with
the known variance σ2, the 100(1-α)% confidence interval for µ can be deduced as
18
9
(EE3006/3006A)
e
X
P ( z / 2 z / 2 ) 1
/ n
19
(EE3006/3006A)
Example 3:
The average zinc concentration recovered from a sample of measurements taken
in 36 different locations in a river is found to be 2.6 grams per milliliter.
Assume that the population standard deviation is 0.3 gram per milliliter. Find
the 95% confidence intervals for the mean zinc concentration in the river.
Solution:
20
10
(EE3006/3006A)
21
(EE3006/3006A)
t-distribution
=> A family of continuous probability distributions that arises when estimating
the mean of a normally distributed population in situations where the sample
size is small and population standard deviation is unknown.
22
11
Table A.4 Critical Values of the t-Distribution tα
(EE3006/3006A)
23
24
12
(EE3006/3006A)
25
(EE3006/3006A)
* Example 4:
The contents of seven similar containers of sulfuric acid are 9.8, 10.2,
10.4, 9.8, 10.0, 10.2, and 9.6 liters. Find a 95% confidence interval for
the mean contents of all such containers, assuming an approximately
normal distribution.
s2=[(9.8-10)2+(10.2-10)2+(10.4-10)2
+(9.8-10)2+…+(9.6-10)2]/6
26
13
(EE3006/3006A)
3.Testing Hypothesis
Scientific knowledge
27
(EE3006/3006A)
Acceptance of a hypothesis merely implies that the data does not give sufficient
evident to refute it.
Rejection means that sample evidence refutes it.
28
14
(EE3006/3006A)
29
(EE3006/3006A)
Rejection
Region
Nonrejection
Region or
Critical Value
• Rejection region : The set of all values of the test statistic that would cause a
rejection of the null hypothesis.
• Nonrejection region : The set of all values of the test statistic that would cause an
acceptance of the null hypothesis.
• Critical value : The value or values that separate the rejection region from the
values of the test statistics that do not lead to a rejection of the null hypothesis.
30
15
(EE3006/3006A)
31
(EE3006/3006A)
<
zα zα
<
32
16
(EE3006/3006A)
Example 5:
A random sample of 100 recorded deaths in the United States during the past
year showed an average life span of 71.8 years. Assuming a population
standard deviation of 8.9 years, does this seem to indicate that the mean life
span today is greater than 70 years? Use a 0.05 level of significance.
Solution:
The null and alterative hypotheses are
H0: μ= 70 years
0.05
H1: μ> 70 years
With the level of significance 0.05, one has 1.645
zα= 1.645.
From the observed values from sample, i.e., x = 71.8 years and σ=8.9 years, one has
71.8 70
z 2.02 > zα
8.9 / 100
Decision: Reject H0 and conclude that the mean life span today is greater than 70 years.
33
(EE3006/3006A)
P(z>a)
c P(z<-a)
c
P(z>a)
c P(z<-a)
c
c
zx
34
17
(EE3006/3006A)
Example 5:
A random sample of 100 recorded deaths in the United States during the past
year showed an average life span of 71.8 years. Assuming a population
standard deviation of 8.9 years, does this seem to indicate that the mean life
span today is greater than 70 years? Use a 0.05 level of significance.
Solution:
The null and alterative hypotheses are
H0: μ= 70 years
H1: μ> 70 years
With the observed values from sample, i.e. x = 71.8 years, σ=8.9 years and n=100,
one has
71.8 70 0.02117
zx 2.02
8.9 / 100
The P-value is P(z>2.02)=1-0.9783 = 0.02117. 2.02
The P-value is smaller than the level of significance 0.05.
Decision: Reject H0 and conclude that the mean life span today is greater than 70 years.
35
(EE3006/3006A)
< -tα
<-
-tα/2,n-1 tα/2,n-1
36
18
(EE3006/3006A)
* Example 6:
The Edison Electric Institute has published figures on the number of
kilowatt-hours used annually by various home appliances. It is claimed
that a vacuum cleaner uses an average of 46 kilowatt hours per year. If a
random sample of 12 homes included in a planned study indicates that
vacuum cleaners use an average of 42 kilowatt hours per year with a
standard deviation of 11.9 kilowatt hours, does this suggest at the 0.05
level of significance that vacuum cleaners use, on average, less than 46
kilowatt hours annually? Assume the population of kilowatt hours to be
normal.
37
(EE3006/3006A)
Solution:
The null and alterative hypotheses are
H0: μ= 46 kilowatt hour
H1: μ< 46 kilowatt hours
With the level of significance α=0.05, one has
t0.05,11= -1.796 (for the degree of freedom 11)
Using the observed values from sample, i.e. x = 42 kilowatt hours, s=11.9 kilowatt
hours, and n=12, the calculated t is
x 0 42 46
t 1.16 > t0.05,11
s/ n 11.9 / 12
Decision: Do not reject H0 and conclude that the average number of kilowatt hours
used annually by home vacuum cleaners is not significantly less than 46.
38
19
(EE3006/3006A)
Classwork 10:
Q1. The average input impedance measured from a sample of 38 transistors
produced by a company is 9.95 M. Assume that the standard deviation
is known as 1.1 M. Find the 95% confidence intervals for the average
input impedance of the transistors produced by this company.
Q2. Test the hypothesis that the average content of containers of a particular
lubricant is 10 liters if the contents of a random sample of 10 containers
are 10.2, 9.7, 10.1, 10.3, 10.1, 9.8, 9.9, 10.4, 10.3, and 9.8 liters. Use
0.01 as the level of significance and assume that the distribution of
contents is normal.
39
(EE3006/3006A)
Reference:
( R. E. Walpole, R. H. Myers, S. L. Myers, K. Ye, Probability & Statistics for
engineers & scientists, Prentics Hall, Inc., 2002)
40
20