0% found this document useful (0 votes)
10 views20 pages

Unit 10

The document covers statistical analysis, including the basics of statistics, estimation problems, and hypothesis testing. It explains concepts such as random sampling, sampling distributions, point estimation, confidence intervals, and the use of t-distribution for small samples. Additionally, it discusses the formulation of null and alternative hypotheses, types of errors in hypothesis testing, and provides examples to illustrate these concepts.

Uploaded by

Alan Leung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views20 pages

Unit 10

The document covers statistical analysis, including the basics of statistics, estimation problems, and hypothesis testing. It explains concepts such as random sampling, sampling distributions, point estimation, confidence intervals, and the use of t-distribution for small samples. Additionally, it discusses the formulation of null and alternative hypotheses, types of errors in hypothesis testing, and provides examples to illustrate these concepts.

Uploaded by

Alan Leung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

(EE3006/3006A)

Unit 10. Statistical Analysis


1. Basics of Statistics
 Random sampling
 Sampling distributions
 Statistical inference
2. Estimation Problems
 Basic concepts
 Estimation of a single mean
3. Testing Hypothesis
 Basic concepts
 Test on a single mean

(EE3006/3006A)

1. Basics of Statistics
=> Statistics is the study of the collection, organization, analysis,
interpretation, and presentation of data.

Descriptive
(Graphs, tables, charts, ...)
Statistics

(Probability is deductive: general -> particular)


Statistical
Methods
Estimation

Inferential
Statistics
Hypothesis
Testing
(Statistics is inductive: particular -> general)
2

1
(EE3006/3006A)

1.1 Random Sampling


Population
• A population consists of the
totality of the observations
with which we are concerned. SampleSample
Population
• A sample is a subset of a population.

• Let X1, X2, . . . , Xn be n independent random variables having


the same probability distribution f (x). We can then define X1, X2,
..., Xn to be a random sample of size n from the population, and
writes its joint probability distribution as
f (x1, x2, …, xn) = f (x1) f (x2) … f (xn)

(EE3006/3006A)

 Statistics of random sample


=> Any function of the random variables constituting a random sample
is called a statistic.
1. Sample mean

2. (Unbiased) Sample variance *:

3. Sample standard deviation:

* The use of the term n-1 is called Bessel’s correction. 4

2
(EE3006/3006A)

 Why need Bessel’s correction?


i. Its counterpart is biased sample variance:

ii.

the population mean µ is :

(EE3006/3006A)

n( X   )   i 1 ( X i   )
n

Then,

Since all Xi have the same probability distribution, σXi2= σ2 and σX 2= σ2/n.

3
(EE3006/3006A)

(EE3006/3006A)

Example 1:
Find the standard deviation of the sample 3, 4, 5, 6, 6, and 7,
representing the number of trout caught by a random sample of 6
fishermen on June 19, 1996, at Lake Muskoka.

4
(EE3006/3006A)

1.2 Sampling Distributions


=> The probability distribution of a statistic is called a
sampling distribution.

σX

 Sample mean tends towards the standard normal distribution even if the original
variables themselves are not normally distributed.
 It is a key concept implying that the probabilistic and statistical methods for
normal distributions can be applicable to many problems involving other types of
distributions.
9

(EE3006/3006A)

 Properties:

10

5
(EE3006/3006A)

Example 2:
An electrical firm manufactures light bulbs that have a length of life that
is approximately normally distributed, with the mean of 800 hours and a
standard deviation of 40 hours. Find the probability that a random
sample of 36 bulbs will have an average life of less than 785 hours.

6.67

The probability of the average life less 785 800

than 785 hours is 1.22%。


11

(EE3006/3006A)

1.3 Statistical Inference


=> Deals with methods for making statements about a population based on
a sample drawn from the population.

i. Point estimation: Estimate an unknown population parameter


Example: estimate the mean package weight of a cereal box filled during a
production process
ii. Confidence interval estimation: Find an interval that contains the
parameter with pre-assigned probability
Example: find an interval [L,U] based on the data that includes the mean weight of
the cereal box with a specified probability
iii. Hypothesis testing: Testing hypothesis about an unknown population
parameter
Example: Do the cereal boxes meet the minimum mean weight specification of
16 oz?

12

6
(EE3006/3006A)

2. Estimation Problems
Sampling: each
member of the
population has the
same chance of being
Population selected in the sample

Parameters

Random sample

Estimation
Statistics
13

(EE3006/3006A)

2.1 Basic Concepts


I. Point estimation
=> Estimation of an unknown population parameter by using a single
statistic, i.e., estimator, calculated from the sample data.

14

7
(EE3006/3006A)

 Unbiased sample variance <=> Biased sample variance

15

(EE3006/3006A)

II. Confidence interval estimation


θL θU
θL θU

θL θU

 Point Estimation vs Interval Estimation


Point estimation give us a particular value as an estimate of the population
parameter, while interval estimation gives us a range of values which is likely to
contain the population parameter.
16

8
(EE3006/3006A)

2.2 Estimation of a Single Mean


 Fist Case: sample size is large, and variance is known.
i. Point estimation (i.e., estimation of mean)
X 
From the central limit theorem (CLT), one know that Z  can be
/ n
approximated by a standard normal distribution.

 The probabilistic methods for normal distributions studied in the last Unit can be
applied to these estimation problems. 17

(EE3006/3006A)

ii. Confidence interval


From the CLT, for a confidence coefficient (1-α),
one has
X 
P(  z / 2   z / 2 )  1   P (  z / 2  Z  z / 2 )  1  
/ n
where zα/2 (or - zα/2) is the z-value that can be known from standard normal
distribution table above which (or below which) one can find an area of α/2.

Therefore, for the mean x of a random sample of the size n from a population with
the known variance σ2, the 100(1-α)% confidence interval for µ can be deduced as

18

9
(EE3006/3006A)
e
X 
P (  z / 2   z / 2 )  1  
/ n

19

(EE3006/3006A)

Example 3:
The average zinc concentration recovered from a sample of measurements taken
in 36 different locations in a river is found to be 2.6 grams per milliliter.
Assume that the population standard deviation is 0.3 gram per milliliter. Find
the 95% confidence intervals for the mean zinc concentration in the river.
Solution:

20

10
(EE3006/3006A)

 What if n is small and/or σ is unknown? (Second Case)

• When the sample is small (n < 30)


– Cannot assume normality (according to CLT);
– Cannot use s as a good approximation of σ.

• Then, use t-distribution:


X 
T
s/ n
with the degree of freedom v = n-1.

21

(EE3006/3006A)

 t-distribution
=> A family of continuous probability distributions that arises when estimating
the mean of a normally distributed population in situations where the sample
size is small and population standard deviation is unknown.

 The t-distribution is symmetric and


bell-shaped, like the normal
distribution, but has heavier tails.
 The percentage points of the t-
distribution are given in Table A.4

22

11
Table A.4 Critical Values of the t-Distribution tα
(EE3006/3006A)

23

Table A.4 (continued) Critical Values of the t-Distribution


(EE3006/3006A)

24

12
(EE3006/3006A)

 Second Case: variance is unknown and/or sample size is small

For σ known we exploited the Central Limit Theorem, whereas for σ


unknown we could make use of the sampling distribution of the
random variable T.

25

(EE3006/3006A)

* Example 4:
The contents of seven similar containers of sulfuric acid are 9.8, 10.2,
10.4, 9.8, 10.0, 10.2, and 9.6 liters. Find a 95% confidence interval for
the mean contents of all such containers, assuming an approximately
normal distribution.

s2=[(9.8-10)2+(10.2-10)2+(10.4-10)2
+(9.8-10)2+…+(9.6-10)2]/6

26

13
(EE3006/3006A)

3.Testing Hypothesis
Scientific knowledge

Reason and intuition Empirical observation

Formulate Collect data to


hypotheses test hypotheses

Accept hypothesis Reject hypothesis

27

(EE3006/3006A)

3.1 Basic Concepts


=> A statistical hypothesis is an assertion or conjecture concerning one or
more populations.

 How to define hypothesis


• Null Hypothesis: Statement about value of population parameter is stated
to specify an exact value. (e.g., H0: = 98.6 )
• Alternative Hypothesis : the statement that the parameter allows for the
possibility of several values. (e.g., H1:  98.6, or H1: < 98.6 )

 Acceptance of a hypothesis merely implies that the data does not give sufficient
evident to refute it.
 Rejection means that sample evidence refutes it.

28

14
(EE3006/3006A)

Example: A manufacturer of a certain brand of rice cereal claims that the


average saturated fat content does not exceed 1.5 grams per serving.
=> How to state the null and alternative hypotheses to be used in testing this claim
and determine where the critical region is located?
 The manufacturer’s claim should be rejected only if μ is greater than 1.5 grams and
should not be rejected if μ is less than or equal to 1.5 grams.
 Since the null hypothesis always specify a single value of the parameter, we can have
H0: μ = 1.5 grams,
H1: μ > 1.5 grams.

29

(EE3006/3006A)

 Classification of hypothesis testing

Rejection
Region
Nonrejection
Region or

Critical Value

Type I: One-tailed test Type II: Two-tailed test

• Rejection region : The set of all values of the test statistic that would cause a
rejection of the null hypothesis.
• Nonrejection region : The set of all values of the test statistic that would cause an
acceptance of the null hypothesis.
• Critical value : The value or values that separate the rejection region from the
values of the test statistics that do not lead to a rejection of the null hypothesis.

30

15
(EE3006/3006A)

 Testing a statistical hypothesis


Possible situations for testing a statistical hypothesis:

• Rejection of the null hypothesis when it is true is called a type I error:


α = P(type I error)
• Acceptance of the null hypothesis when it is false is called a type II error:
β = P(type II error)
 The probability of committing a type I error is called the level of significance, or
size of the test;
 The power of a test (which can be computed as 1-β) is the probability of rejecting
H0 given that a specific alternative is true;

31

(EE3006/3006A)

3.2 Test on a Single Mean


 Fist Case: variance is known, and sample size is large
Rejection
Nonrejection Region
Region

<
zα zα
<

32

16
(EE3006/3006A)

Example 5:
A random sample of 100 recorded deaths in the United States during the past
year showed an average life span of 71.8 years. Assuming a population
standard deviation of 8.9 years, does this seem to indicate that the mean life
span today is greater than 70 years? Use a 0.05 level of significance.
Solution:
The null and alterative hypotheses are
H0: μ= 70 years
0.05
H1: μ> 70 years
With the level of significance 0.05, one has 1.645
zα= 1.645.
From the observed values from sample, i.e., x = 71.8 years and σ=8.9 years, one has
71.8  70
z  2.02 > zα
8.9 / 100
Decision: Reject H0 and conclude that the mean life span today is greater than 70 years.
33

(EE3006/3006A)

 The use of P-value for decision making


=> A P-value is the lowest level of significance at which the observed
value of the test statistic is significant.

P(z>a)
c P(z<-a)
c

P(z>a)
c P(z<-a)
c
c

zx

34

17
(EE3006/3006A)

Example 5:
A random sample of 100 recorded deaths in the United States during the past
year showed an average life span of 71.8 years. Assuming a population
standard deviation of 8.9 years, does this seem to indicate that the mean life
span today is greater than 70 years? Use a 0.05 level of significance.
Solution:
The null and alterative hypotheses are
H0: μ= 70 years
H1: μ> 70 years
With the observed values from sample, i.e. x = 71.8 years, σ=8.9 years and n=100,
one has
71.8  70 0.02117
zx   2.02
8.9 / 100
The P-value is P(z>2.02)=1-0.9783 = 0.02117. 2.02
The P-value is smaller than the level of significance 0.05.
Decision: Reject H0 and conclude that the mean life span today is greater than 70 years.
35

(EE3006/3006A)

 Second Case: variance is unknown and/or sample size is small

< -tα
<-

-tα/2,n-1 tα/2,n-1

36

18
(EE3006/3006A)

* Example 6:
The Edison Electric Institute has published figures on the number of
kilowatt-hours used annually by various home appliances. It is claimed
that a vacuum cleaner uses an average of 46 kilowatt hours per year. If a
random sample of 12 homes included in a planned study indicates that
vacuum cleaners use an average of 42 kilowatt hours per year with a
standard deviation of 11.9 kilowatt hours, does this suggest at the 0.05
level of significance that vacuum cleaners use, on average, less than 46
kilowatt hours annually? Assume the population of kilowatt hours to be
normal.

37

(EE3006/3006A)

Solution:
The null and alterative hypotheses are
H0: μ= 46 kilowatt hour
H1: μ< 46 kilowatt hours
With the level of significance α=0.05, one has
t0.05,11= -1.796 (for the degree of freedom 11)
Using the observed values from sample, i.e. x = 42 kilowatt hours, s=11.9 kilowatt
hours, and n=12, the calculated t is
x  0 42  46
t   1.16 > t0.05,11
s/ n 11.9 / 12
Decision: Do not reject H0 and conclude that the average number of kilowatt hours
used annually by home vacuum cleaners is not significantly less than 46.

38

19
(EE3006/3006A)

Classwork 10:
Q1. The average input impedance measured from a sample of 38 transistors
produced by a company is 9.95 M. Assume that the standard deviation
is known as 1.1 M. Find the 95% confidence intervals for the average
input impedance of the transistors produced by this company.

Q2. Test the hypothesis that the average content of containers of a particular
lubricant is 10 liters if the contents of a random sample of 10 containers
are 10.2, 9.7, 10.1, 10.3, 10.1, 9.8, 9.9, 10.4, 10.3, and 9.8 liters. Use
0.01 as the level of significance and assume that the distribution of
contents is normal.

39

(EE3006/3006A)

Reference:
( R. E. Walpole, R. H. Myers, S. L. Myers, K. Ye, Probability & Statistics for
engineers & scientists, Prentics Hall, Inc., 2002)

8. Fundamental Sampling Distributions and Data Descriptions


(pp.194-219)
9. One- and Two-Sample Estimation Problems (pp.230 – 239)
10. One- and Two-Sample Tests of Hypotheses (pp.284-306)

40

20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy